mlpack

Signal processing

mlpack provides several signal processing techniques that allows to extract features from stationary and non-stationary signals:

🔗 EMD

mlpack provides Empirical Mode Decomposition (EMD) to process signals for training and testing. This can be used for signal monitoring pipelines in nonlinear and nonstationary problems.

The EMD() function can be used to extract Intrinsic Mode Functions (IMFs) from a uniformly sampled periodic signal:

The figures below show the signal decomposition process with EMD:

signal with envelopes

Example using EMD on a time-varying signal S.

const arma::uword N = 400;
const double tMin = 0.0;
const double tMax = arma::datum::pi;
arma::vec time = arma::linspace(tMin, tMax, N);

// signal = sin(20*T*(1 + 0.2*T)) + T**2 + sin(13*T)
// see figure above 
arma::vec signal =
    arma::sin( 20.0 * time % (1.0 + 0.2 * time) ) +
    arma::square(time) + arma::sin(13.0 * time);

arma::mat imfs;
arma::vec residue;

// Use up to 5 IMFs, 50 sifts per IMF, tol = 1e-6
mlpack::EMD(signal, imfs, residue, 5, 50, 5e-6);

// Print dominant frequency for the first 3 IMFs 
// to check if EMD is extracting the correct IMFs
const size_t numToShow = std::min<size_t>(3, imfs.n_cols);
const double dt = time(1) - time(0);
const double fs = 1.0 / dt;
for (size_t k = 0; k < numToShow; ++k)
{
  arma::cx_vec spectrum = arma::fft(imfs.col(k));
  // Use only the first half of the spectrum
  arma::vec mag = arma::abs(spectrum.rows(0, spectrum.n_elem / 2));
  arma::uword idx = mag.index_max();
  const double peakHz = (double) idx * fs / spectrum.n_elem;
  std::cout << "IMF " << k << " peak freq: " << peakHz << " Hz" << std::endl;
}

See also:

🔗 EEMD

The EEMD() function wraps EMD() to produce more robust IMFs by running EMD many times on the signal with independent white noise added to each trial, then averaging the results. The added noise reduces mode mixing and cancels out in the average.

Example using EEMD on a time-varying signal S (shown in above EMD figure).

const arma::uword N = 3000;
const double tMin = 0.0;
const double tMax = arma::datum::pi;
arma::vec time = arma::linspace(tMin, tMax, N);

// signal = sin(20*T*(1 + 0.2*T)) + T**2 + sin(13*T)
// see figure above in the EMD documentation
arma::vec signal =
    arma::sin( 20.0 * time % (1.0 + 0.2 * time) ) +
    arma::square(time) + arma::sin(13.0 * time);

arma::mat imfs;
arma::vec residue;

// Use 100 ensemble members, 0.15 noise strength, 10 IMFs, 50 sifts per IMF, tol = 1e-2
mlpack::EEMD(signal, imfs, residue, 100, 0.15, 10, 50, 1e-2);

See also:

🔗 MFE

mlpack provides the MFE() function to extract standard audio features from raw PCM data loaded with Load(). These features can be used as input to machine learning models for speech recognition, speaker identification, keyword spotting, and other tasks.

MFE is computed using the following steps. The figure below shows the application of MFE to a 440 Hz sine wave signal.

MFE implementation

Function Parameters:

name type default description
signals arma::mat or other floating-point matrix (n/a) Raw PCM audio samples.
mfe arma::mat or other floating point matrix (n/a) Output matrix of shape (numMelFilters x numWindows).
sampleRate size_t (n/a) Sample rate of the audio in Hz (e.g. 16000, 44100).
numMelFilters size_t 40 Number of Mel-spaced triangular filters. Typical range (20 to 100)
windowLength float 25.0 Window length in milliseconds. Typical range 20 to 40.
windowStep float 10.0 Window hop (step) in milliseconds. Typical range 5 to 20.
nFFT size_t 0 FFT size; 0 will use the next power of 2 greater than or equal to the window length. Powers of 2 are more efficient. Typical range 256 to 4096.
lowFreq float 0.0 Low frequency bound for the Mel filterbank in Hz. Typical range 0 to 300.
highFreq float 0.0 High frequency bound in Hz; 0 means sampleRate / 2. Typical range 4000 to sampleRate / 2.

Apply MFE filter with default parameters on voice signals:

// See https://datasets.mlpack.org/sine.wav
arma::mat signal;
mlpack::AudioOptions opts = mlpack::Fatal + mlpack::WAV;
mlpack::Load("sine.wav", signal, opts);

arma::mat mfe;
mlpack::MFE(signal, mfe, opts.SampleRate());

std::cout << "MFE shape: " << mfe.n_rows << " x " << mfe.n_cols << std::endl;

Specifying a custom number of Mel filters and frequency range:

// See https://datasets.mlpack.org/sine.wav
arma::fmat signal;
mlpack::AudioOptions opts = mlpack::Fatal + mlpack::WAV;
mlpack::Load("sine.wav", signal, opts);

arma::fmat mfe;
// 80 mel filters, default window size, frequency range 300–8000 Hz.
mlpack::MFE(signal, mfe, opts.SampleRate(), 80, 25.0, 10.0, 0, 300.0,
    8000.0);

std::cout << "MFE shape: " << mfe.n_rows << " x " << mfe.n_cols
        << " (filters x windows)" << std::endl;

A detailed example showing the average extracted energy per frequency bin:

arma::mat signal;
mlpack::AudioOptions opts = mlpack::Fatal + mlpack::WAV;
mlpack::Load("sine.wav", signal, opts);

std::cout << "Loaded: " << signal.n_rows << " samples, "
  << opts.SampleRate() << " Hz, "
  << opts.Channels() << " channel(s)." << std::endl;

arma::mat mfe;
mlpack::MFE(signal, mfe, opts.SampleRate());

std::cout << "MFE shape: " << mfe.n_rows << " x " << mfe.n_cols
        << " (filters x windows)." << std::endl;

arma::vec meanMFE = arma::mean(mfe, 1);

size_t peakFilter = meanMFE.index_max();
float peakVal = meanMFE.max();

size_t nFFT = 512;
float highFreq = opts.SampleRate() / 2.0f;
float melLow = mlpack::HzToMel(0.0f);
float melHigh = mlpack::HzToMel(highFreq);
arma::vec melPoints = arma::linspace<arma::vec>(melLow, melHigh, 42);
arma::vec hzPoints(42);
for (size_t i = 0; i < 42; ++i)
hzPoints(i) = mlpack::MelToHz(melPoints(i));

std::cout << std::endl;
std::cout << "Filter   Freq range (Hz)       Avg energy" << std::endl;
std::cout << "------   -------------------   ----------" << std::endl;

for (size_t i = 0; i < 40; ++i)
{
  std::cout << "  " << std::setw(2) << i << "     "
            << std::setw(5) << std::fixed << std::setprecision(0)
            << hzPoints(i) << " – "
            << std::setw(5) << hzPoints(i + 2) << "       "
            << std::setw(8) << std::setprecision(2) << meanMFE(i);

  if (i == peakFilter)
    std::cout << "   <<< peak";
  else if (i == peakFilter - 1 || i == peakFilter + 1)
    std::cout << "   <<< adjacent";

  std::cout << std::endl;
}

🔗 MFCC

mlpack provides the MFCC() function to extract standard audio features from raw PCM data loaded with Load(). Similar to MFE, MFCC output can be used as input to machine learning models. Note that, since MFCC coefficients are decorrelated, they can be used with distance-based machine learning algorithms (e.g., KNN, KMeans) or probabilistic algorithms (e.g., GMM, HMM).

MFCC is computed using the following steps. The figure below shows the application of MFCC to a 440 Hz sine wave signal.

MFCC implementation

Function Parameters:

name type default description
signals arma::mat or other floating-point matrix (n/a) Raw PCM audio samples.
mfcc arma::mat or other floating point matrix (n/a) Output matrix of shape (numMelFilters x numWindows).
sampleRate size_t (n/a) Sample rate of the audio in Hz (e.g. 16000, 44100).
numCoeff size_t (n/a) Number of cepstral coefficients.
numMelFilters size_t 40 Number of Mel-spaced triangular filters. Typical range 20 to 100.
windowLength float 25.0 Window length in milliseconds. Typical range 20 to 40.
windowStep float 10.0 Window hop (step) in milliseconds. Typical range 5 to 20.
nFFT size_t 0 FFT size; 0 will use the next power of 2 greater than or equal to the window length. Powers of 2 are more efficient. Typical range 256 to 4096.
lowFreq float 0.0 Low frequency bound for the Mel filterbank in Hz. Typical range 0 to 300.
highFreq float 0.0 High frequency bound in Hz; 0 means sampleRate / 2. Typical range 4000 to sampleRate / 2.

Extract 13 MFCCs from a WAV file.

// See https://datasets.mlpack.org/sine.wav
arma::mat signal;
mlpack::AudioOptions opts = mlpack::Fatal + mlpack::WAV;
mlpack::Load("sine.wav", signal, opts);

arma::mat mfcc;
mlpack::MFCC(signal, mfcc, opts.SampleRate());

// mfcc has a shape of 13 x numWindows.
std::cout << "MFCC shape: " << mfcc.n_rows << " x " << mfcc.n_cols
    << "." << std::endl;

Extract 20 MFCCs with 80 Mel filters from an MP3 file:

// See https://datasets.mlpack.org/fifths.mp3
arma::fmat signal;
mlpack::AudioOptions opts = mlpack::Fatal + mlpack::MP3;
mlpack::Load("fifths.mp3", signal, opts);

arma::fmat mfcc;
mlpack::MFCC(signal, mfcc, opts.SampleRate(), 20, 80);

MFCC with custom parameters:

// See https://datasets.mlpack.org/sine.wav
arma::fmat signal;
mlpack::AudioOptions opts = mlpack::Fatal + mlpack::WAV;
mlpack::Load("sine.wav", signal, opts);

arma::fmat mfcc;
mlpack::MFCC(signal, mfcc, opts.SampleRate(), 13, 26, 25.0, 10.0, 0, 300.0,
    3400.0);

Analyse the 13 MFCC coefficients extracted from loading a sine wav signal and what each one captures:

arma::mat signal;
mlpack::AudioOptions opts = mlpack::Fatal + mlpack::WAV;
mlpack::Load("sine.wav", signal, opts);

std::cout << "Loaded: " << signal.n_rows << " samples, "
  << opts.SampleRate() << " Hz, "
  << opts.Channels() << " channel(s)." << std::endl;

arma::mat mfcc;
mlpack::MFCC(signal, mfcc, opts.SampleRate());

std::cout << "MFCC shape: " << mfcc.n_rows << " x " << mfcc.n_cols
  << " (coefficients x windows)." << std::endl;

arma::vec meanMFCC = arma::mean(mfcc, 1);

std::cout << std::endl;
std::cout << "Coeff   Avg value    What it captures (roughly)" << std::endl;
std::cout << "-----   ---------    ----------------" << std::endl;

const char* descriptions[] = {
    "Overall energy (loudness)",
    "Spectral tilt (low vs high freq balance)",
    "Spectral curvature (middle vs edges)",
    "Two-bump structure (formant separation)",
    "Finer spectral shape",
    "Finer spectral shape",
    "Finer spectral shape",
    "Finer spectral shape",
    "Finer spectral shape",
    "Finer spectral shape",
    "Finer spectral shape",
    "Fine spectral detail",
    "Finest spectral detail"
};

for (size_t i = 0; i < mfcc.n_rows; ++i)
{
  std::cout << "c[" << std::setw(2) << i << "]  "
      << std::setw(10) << std::fixed << std::setprecision(2) << meanMFCC(i)
      << "    " << descriptions[i] << std::endl;
}

See also: