Signal processing

mlpack provides several signal processing techniques that allows to extract features from stationary and non-stationary signals:

Decomposition:
- EMD() (Empirical Mode Decomposition): adaptively decomposes a 1D signal into a set of Intrinsic Mode Functions (IMFs) plus a residue.
- EEMD(): (Ensemble EMD): wraps EMD() to output more robust IMFs by using ensemble approach.
Feature Extraction:
- MFE() (Mel-Frequency Energy): log-scaled energies from a bank of triangular filters spaced on the mel scale
- MFCC() (Mel-Frequency Cepstral Coefficients): a compact representation obtained by applying a Discrete Cosine Transform (DCT) to the MFE output.

🔗 EMD

mlpack provides Empirical Mode Decomposition (EMD) to process signals for training and testing. This can be used for signal monitoring pipelines in nonlinear and nonstationary problems.

The EMD() function can be used to extract Intrinsic Mode Functions (IMFs) from a uniformly sampled periodic signal:

EMD(signal, imfs, residue, maxImfs = 10, maxSiftIter = 50, tol = 1e-3)
- signal is a column vector containing the 1D signal data (e.g. arma::vec); the sequence must be uniformly sampled.
- imfs is a matrix that will be modified to contain the extracted IMFs. It will have shape N x K, where N is the length of signal and K is the number of extracted IMFs.
- residue is a column vector (of length N) that will be modified to contain the final residual signal after extracting the IMFs.
- maxImfs (of type size_t) is the maximum number of IMFs to extract.
- maxSiftIter (of type size_t) is the maximum sifting iterations per IMF.
- tol (double) is the stopping tolerance used on sifting iterations.
NOTES:
- The original signal can be reconstructed as the sum of the IMFs and residue.
- The stopping criterion is based on the normalized mean envelope magnitude.
- Sifting will terminate when zero-crossings and extrema are equal or differing by at most one. Specifically, the S-number is set to S=1.
- A smaller tolerance sets a stricter stopping criterion. The algorithm will terminate when either maxSiftIter is reached or the tol is satisfied, whichever occurs first.

The figures below show the signal decomposition process with EMD:

(a) original signal;
(b) original signal with envelopes about the local minima and maxima;
(c) the first IMF extracted from the original signal via sifting.

signal with envelopes

Example using EMD on a time-varying signal S.

const arma::uword N = 400;
const double tMin = 0.0;
const double tMax = arma::datum::pi;
arma::vec time = arma::linspace(tMin, tMax, N);

// signal = sin(20*T*(1 + 0.2*T)) + T**2 + sin(13*T)
// see figure above 
arma::vec signal =
    arma::sin( 20.0 * time % (1.0 + 0.2 * time) ) +
    arma::square(time) + arma::sin(13.0 * time);

arma::mat imfs;
arma::vec residue;

// Use up to 5 IMFs, 50 sifts per IMF, tol = 1e-6
mlpack::EMD(signal, imfs, residue, 5, 50, 5e-6);

// Print dominant frequency for the first 3 IMFs 
// to check if EMD is extracting the correct IMFs
const size_t numToShow = std::min<size_t>(3, imfs.n_cols);
const double dt = time(1) - time(0);
const double fs = 1.0 / dt;
for (size_t k = 0; k < numToShow; ++k)
{
  arma::cx_vec spectrum = arma::fft(imfs.col(k));
  // Use only the first half of the spectrum
  arma::vec mag = arma::abs(spectrum.rows(0, spectrum.n_elem / 2));
  arma::uword idx = mag.index_max();
  const double peakHz = (double) idx * fs / spectrum.n_elem;
  std::cout << "IMF " << k << " peak freq: " << peakHz << " Hz" << std::endl;
}

🔗 EEMD

The EEMD() function wraps EMD() to produce more robust IMFs by running EMD many times on the signal with independent white noise added to each trial, then averaging the results. The added noise reduces mode mixing and cancels out in the average.

EEMD(signal, imfs, residue, ensSize = 100, noiseStrength = 0.2, maxImfs = 10, maxSiftIter = 50, tol = 1e-3)
- ensSize (of type size_t) is the number of members in the ensemble (that is the number of EMD() runs to be averaged).
- noiseStrength (of type double) is the fraction of the signal standard deviation used as the standard deviation of the added white noise in each EMD() run (i.e. 0.1 means 10% of signal standard deviation).
- signal, imfs, residue, maxImfs, maxSiftIter, and tol are defined in the classical EMD() implementation.
NOTES:
- The original signal cannot be reconstructed as the sum of the IMFs and residue, as in EMD().
- Number of extracted IMFs will be the minimum number of IMFs extracted by by EMD() across all ensSize runs (<=maxImfs).
- EEMD may produce low-energy leading IMFs due to injected noise and ensemble averaging. Depending on the application, users may want to discard negligible IMFs in post-processing (e.g., using an energy-fraction threshold).
- The number of returned IMFs is bounded between 0 and maxImfs. The algorithm returns only as many IMFs as can actually be extracted from the input signal.

Example using EEMD on a time-varying signal S (shown in above EMD figure).

const arma::uword N = 3000;
const double tMin = 0.0;
const double tMax = arma::datum::pi;
arma::vec time = arma::linspace(tMin, tMax, N);

// signal = sin(20*T*(1 + 0.2*T)) + T**2 + sin(13*T)
// see figure above in the EMD documentation
arma::vec signal =
    arma::sin( 20.0 * time % (1.0 + 0.2 * time) ) +
    arma::square(time) + arma::sin(13.0 * time);

arma::mat imfs;
arma::vec residue;

// Use 100 ensemble members, 0.15 noise strength, 10 IMFs, 50 sifts per IMF, tol = 1e-2
mlpack::EEMD(signal, imfs, residue, 100, 0.15, 10, 50, 1e-2);

🔗 MFE

mlpack provides the MFE() function to extract standard audio features from raw PCM data loaded with Load(). These features can be used as input to machine learning models for speech recognition, speaker identification, keyword spotting, and other tasks.

MFE(signals, mfe, sampleRate, numMelFilters=40, windowLength=25, windowStep=10, nFFT=0, lowFreq=0.0, highFreq=0.0)
- Extract log-Mel filterbank energies from windows of audio in signals, storing the result in mfe.
- signals should be a matrix with floating-point elements and shape numSamples x numSignals, and mfe will be set to shape numMelFilters x (numWindows * numSignals).
  - To perform MFE() on non-floating point audio signals, first convert the data to a floating-point matrix type (e.g. using arma::conv_to).
- sampleRate can be set using the members of an AudioOptions populated during a call to Load().
NOTES:
- If the audio signal has multiple channels, the channels must be de-interleaved before calling MFE() with each column representing a separate channel. This preserves spatial information (e.g., which sounds come from the left vs. right). If spatial information is not needed, the channels can be mixed down to mono before processing (e.g., mono = (left + right) / 2).

MFE is computed using the following steps. The figure below shows the application of MFE to a 440 Hz sine wave signal.

(a) Each individual input signal should have amplitude in the range [-1.0, 1.0].
(b) Cut the signal into overlapping frames tapered by a Hamming window to prevent spectral leakage.
(c) Apply the Short-Time Fourier Transform (STFT) to each data window to produce a short-term spectrum.
(d) Multiply each STFT frame with the Mel filterbank, which consists of log-spaced triangular filters.
(e) Take the log of the filter bins to produce the Mel filter energies.

MFE implementation

Function Parameters:

name	type	default	description
`signals`	`arma::mat` or other floating-point matrix	(n/a)	Raw PCM audio samples.
`mfe`	`arma::mat` or other floating point matrix	(n/a)	Output matrix of shape `(numMelFilters x numWindows)`.
`sampleRate`	`size_t`	(n/a)	Sample rate of the audio in Hz (e.g. `16000`, `44100`).
`numMelFilters`	`size_t`	`40`	Number of Mel-spaced triangular filters. Typical range (`20` to `100`)
`windowLength`	`float`	`25.0`	Window length in milliseconds. Typical range `20` to `40`.
`windowStep`	`float`	`10.0`	Window hop (step) in milliseconds. Typical range `5` to `20`.
`nFFT`	`size_t`	`0`	FFT size; `0` will use the next power of 2 greater than or equal to the window length. Powers of 2 are more efficient. Typical range `256` to `4096`.
`lowFreq`	`float`	`0.0`	Low frequency bound for the Mel filterbank in Hz. Typical range `0` to `300`.
`highFreq`	`float`	`0.0`	High frequency bound in Hz; `0` means `sampleRate / 2`. Typical range `4000` to `sampleRate / 2`.

Apply MFE filter with default parameters on voice signals:

// See https://datasets.mlpack.org/sine.wav
arma::mat signal;
mlpack::AudioOptions opts = mlpack::Fatal + mlpack::WAV;
mlpack::Load("sine.wav", signal, opts);

arma::mat mfe;
mlpack::MFE(signal, mfe, opts.SampleRate());

std::cout << "MFE shape: " << mfe.n_rows << " x " << mfe.n_cols << std::endl;

Specifying a custom number of Mel filters and frequency range:

// See https://datasets.mlpack.org/sine.wav
arma::fmat signal;
mlpack::AudioOptions opts = mlpack::Fatal + mlpack::WAV;
mlpack::Load("sine.wav", signal, opts);

arma::fmat mfe;
// 80 mel filters, default window size, frequency range 300–8000 Hz.
mlpack::MFE(signal, mfe, opts.SampleRate(), 80, 25.0, 10.0, 0, 300.0,
    8000.0);

std::cout << "MFE shape: " << mfe.n_rows << " x " << mfe.n_cols
        << " (filters x windows)" << std::endl;

A detailed example showing the average extracted energy per frequency bin:

arma::mat signal;
mlpack::AudioOptions opts = mlpack::Fatal + mlpack::WAV;
mlpack::Load("sine.wav", signal, opts);

std::cout << "Loaded: " << signal.n_rows << " samples, "
  << opts.SampleRate() << " Hz, "
  << opts.Channels() << " channel(s)." << std::endl;

arma::mat mfe;
mlpack::MFE(signal, mfe, opts.SampleRate());

std::cout << "MFE shape: " << mfe.n_rows << " x " << mfe.n_cols
        << " (filters x windows)." << std::endl;

arma::vec meanMFE = arma::mean(mfe, 1);

size_t peakFilter = meanMFE.index_max();
float peakVal = meanMFE.max();

size_t nFFT = 512;
float highFreq = opts.SampleRate() / 2.0f;
float melLow = mlpack::HzToMel(0.0f);
float melHigh = mlpack::HzToMel(highFreq);
arma::vec melPoints = arma::linspace<arma::vec>(melLow, melHigh, 42);
arma::vec hzPoints(42);
for (size_t i = 0; i < 42; ++i)
hzPoints(i) = mlpack::MelToHz(melPoints(i));

std::cout << std::endl;
std::cout << "Filter   Freq range (Hz)       Avg energy" << std::endl;
std::cout << "------   -------------------   ----------" << std::endl;

for (size_t i = 0; i < 40; ++i)
{
  std::cout << "  " << std::setw(2) << i << "     "
            << std::setw(5) << std::fixed << std::setprecision(0)
            << hzPoints(i) << " – "
            << std::setw(5) << hzPoints(i + 2) << "       "
            << std::setw(8) << std::setprecision(2) << meanMFE(i);

  if (i == peakFilter)
    std::cout << "   <<< peak";
  else if (i == peakFilter - 1 || i == peakFilter + 1)
    std::cout << "   <<< adjacent";

  std::cout << std::endl;
}

🔗 MFCC

mlpack provides the MFCC() function to extract standard audio features from raw PCM data loaded with Load(). Similar to MFE, MFCC output can be used as input to machine learning models. Note that, since MFCC coefficients are decorrelated, they can be used with distance-based machine learning algorithms (e.g., KNN, KMeans) or probabilistic algorithms (e.g., GMM, HMM).

MFCC(signals, mfcc, sampleRate, numCoeffs, numMelFilters=40, windowLength=25, windowStep=10, nFFT=0, lowFreq=0.0, highFreq=0.0)
- Extract Mel-Frequency Cepstral Coefficients from windows of audio in signals, storing the result in mfcc.
- signals should be a matrix with floating-point elements and shape numSamples x numSignals, and mfcc will be set to shape numCoeffs x (numWindows * numSignals).
  - To perform MFE() on non-floating point audio signals, first convert the data to a floating-point matrix type (e.g. using arma::conv_to).
- sampleRate can be set using the members of an AudioOptions populated during a call to Load().
NOTES:
- If the audio signal has multiple channels, the channels must be de-interleaved before calling MFCC() with each column representing a separate channel. This preserves spatial information (e.g., which sounds come from the left vs. right). If spatial information is not needed, the channels can be mixed down to mono before processing (e.g., mono = (left + right) / 2).
- numCoeffs must be less than or equal to numMelFilters. The DCT compresses numMelFilters log-mel energies down to numCoeffs cepstral coefficients; it cannot produce more coefficients than there are input values.

MFCC is computed using the following steps. The figure below shows the application of MFCC to a 440 Hz sine wave signal.

(a) Each individual input signal should have amplitude in the range [-1.0, 1.0].
(b) Cut the signal into overlapping frames tapered by a Hamming window to prevent spectral leakage.
(c) Apply the Short-Time Fourier Transform (STFT) to each data window to produce a short-term spectrum.
(d) Multiply each STFT frame with the Mel filterbank, which consists of log-spaced triangular filters.
(e) Take the log of the filter bins to produce the Mel filter energies.
(f) Multiplying by a discrete cosine transform (DCT) matrix gives MFCC.

MFCC implementation

Function Parameters:

name	type	default	description
`signals`	`arma::mat` or other floating-point matrix	(n/a)	Raw PCM audio samples.
`mfcc`	`arma::mat` or other floating point matrix	(n/a)	Output matrix of shape `(numMelFilters x numWindows)`.
`sampleRate`	`size_t`	(n/a)	Sample rate of the audio in Hz (e.g. `16000`, `44100`).
`numCoeff`	`size_t`	(n/a)	Number of cepstral coefficients.
`numMelFilters`	`size_t`	`40`	Number of Mel-spaced triangular filters. Typical range `20` to `100`.
`windowLength`	`float`	`25.0`	Window length in milliseconds. Typical range `20` to `40`.
`windowStep`	`float`	`10.0`	Window hop (step) in milliseconds. Typical range `5` to `20`.
`nFFT`	`size_t`	`0`	FFT size; `0` will use the next power of 2 greater than or equal to the window length. Powers of 2 are more efficient. Typical range `256` to `4096`.
`lowFreq`	`float`	`0.0`	Low frequency bound for the Mel filterbank in Hz. Typical range `0` to `300`.
`highFreq`	`float`	`0.0`	High frequency bound in Hz; `0` means `sampleRate / 2`. Typical range `4000` to `sampleRate / 2`.

Extract 13 MFCCs from a WAV file.

// See https://datasets.mlpack.org/sine.wav
arma::mat signal;
mlpack::AudioOptions opts = mlpack::Fatal + mlpack::WAV;
mlpack::Load("sine.wav", signal, opts);

arma::mat mfcc;
mlpack::MFCC(signal, mfcc, opts.SampleRate());

// mfcc has a shape of 13 x numWindows.
std::cout << "MFCC shape: " << mfcc.n_rows << " x " << mfcc.n_cols
    << "." << std::endl;

Extract 20 MFCCs with 80 Mel filters from an MP3 file:

// See https://datasets.mlpack.org/fifths.mp3
arma::fmat signal;
mlpack::AudioOptions opts = mlpack::Fatal + mlpack::MP3;
mlpack::Load("fifths.mp3", signal, opts);

arma::fmat mfcc;
mlpack::MFCC(signal, mfcc, opts.SampleRate(), 20, 80);

MFCC with custom parameters:

// See https://datasets.mlpack.org/sine.wav
arma::fmat signal;
mlpack::AudioOptions opts = mlpack::Fatal + mlpack::WAV;
mlpack::Load("sine.wav", signal, opts);

arma::fmat mfcc;
mlpack::MFCC(signal, mfcc, opts.SampleRate(), 13, 26, 25.0, 10.0, 0, 300.0,
    3400.0);

Analyse the 13 MFCC coefficients extracted from loading a sine wav signal and what each one captures:

arma::mat signal;
mlpack::AudioOptions opts = mlpack::Fatal + mlpack::WAV;
mlpack::Load("sine.wav", signal, opts);

std::cout << "Loaded: " << signal.n_rows << " samples, "
  << opts.SampleRate() << " Hz, "
  << opts.Channels() << " channel(s)." << std::endl;

arma::mat mfcc;
mlpack::MFCC(signal, mfcc, opts.SampleRate());

std::cout << "MFCC shape: " << mfcc.n_rows << " x " << mfcc.n_cols
  << " (coefficients x windows)." << std::endl;

arma::vec meanMFCC = arma::mean(mfcc, 1);

std::cout << std::endl;
std::cout << "Coeff   Avg value    What it captures (roughly)" << std::endl;
std::cout << "-----   ---------    ----------------" << std::endl;

const char* descriptions[] = {
    "Overall energy (loudness)",
    "Spectral tilt (low vs high freq balance)",
    "Spectral curvature (middle vs edges)",
    "Two-bump structure (formant separation)",
    "Finer spectral shape",
    "Finer spectral shape",
    "Finer spectral shape",
    "Finer spectral shape",
    "Finer spectral shape",
    "Finer spectral shape",
    "Finer spectral shape",
    "Fine spectral detail",
    "Finest spectral detail"
};

for (size_t i = 0; i < mfcc.n_rows; ++i)
{
  std::cout << "c[" << std::setw(2) << i << "]  "
      << std::setw(10) << std::fixed << std::setprecision(2) << meanMFCC(i)
      << "    " << descriptions[i] << std::endl;
}

Signal processing

🔗 EMD

See also:

🔗 EEMD

See also:

🔗 MFE

Function Parameters:

🔗 MFCC

Function Parameters:

See also: