mlpack uses Armadillo matrices for matrix support. Armadillo is a fast C++ matrix library which makes use of advanced template techniques to provide the fastest possible matrix operations.
Documentation on Armadillo can be found on their website:
Nonetheless, there are a few further caveats for mlpack Armadillo usage.
Armadillo matrices are stored in a column-major format; this means that on disk, each column is located in contiguous memory.
This means that, for the vast majority of machine learning methods, it is faster to store observations as columns and dimensions as rows. This is counter to most standard machine learning texts!
Major implications of this are for linear algebra. For instance, the covariance of a matrix is typically
but for a column-wise matrix, it is
and this is very important to keep in mind! If your mlpack code is not working, this may be a factor in why.
Most machine learning data is stored in row-major format; a CSV, for example, will generally have one observation per line and each column will correspond to a dimension.
is actually loaded with 5 rows and 13 columns, not 13 rows and 5 columns like the CSV is written. More information on mlpack's loading functionality can be found in File formats and loading data in mlpack.
This is important to remember!