Data loading and I/O
mlpack provides the Load() and Save() functions to load and save
Armadillo matrices (e.g. numeric and categorical datasets) and
any mlpack object via the cereal
serialization toolkit. A number of other utilities related to loading and
saving data and objects are also available. The Load() and
Save() functions have numerous options to configure load/save behavior
and format detection/selection.
π Load()
Load(path, X)- Load
Xfrom the given local file or remote URLpathwith default options:- the format of the file is auto-detected based on the extension of the file, and
- an exception is not thrown on an error.
- Returns a
boolindicating whether the load was a success. Xcan be any supported load type.
- Load
Load(path, X, Option1 + Option2 + ...)- Load
Xfrom the given local file or remote URLpathwith the given options. - Returns a
boolindicating whether the load was a success. Xcan be any supported load type.- The given options must be from the
list of standalone operators and be appropriate for
the type of
X.
- Load
Load(path, X, opts)- Load
Xfrom the given local file or remote URLpathwith the given options specified inopts. - Returns a
boolindicating whether the load was a success. Xcan be any supported load type.optsis aDataOptionsobject whose subtype matches the type ofX.
- Load
For some types of data, it is also possible to load multiple images at once from a set of files:
Load(paths, X)Load(paths, X, Option1 + Option2 + ...)Load(paths, X, opts)- Load data from
paths(astd::vector<std::string>) into the matrixX.- For numeric data, data loaded from each file is concatenated into
X. - For image data, each image is flattened into one column of
X.
- For numeric data, data loaded from each file is concatenated into
- Metadata (e.g. image size, number of columns, etc.) in all files in
pathsmust match or loading will fail. - Loading options can be specified by either standalone options or an
instantiated
DataOptionsobject.
- Load data from
Simple example:
// See https://datasets.mlpack.org/iris.csv.
arma::mat x;
mlpack::Load("iris.csv", x);
std::cout << "Loaded iris.csv; size " << x.n_rows << " x " << x.n_cols << "."
<< std::endl;
Among other things, the file format can be easily specified:
// See https://datasets.mlpack.org/iris.csv.
arma::mat x;
mlpack::Load("iris.csv", x, mlpack::CSV);
std::cout << "Loaded iris.csv; size " << x.n_rows << " x " << x.n_cols << "."
<< std::endl;
Another simple example, loading the file from a URL:
arma::mat data;
bool success = mlpack::Load("http://datasets.mlpack.org/iris.csv",
data, mlpack::NoFatal);
if (!success)
std::cout << "Error loading dataset" << std::endl;
See also the other examples for each supported load type:
- Numeric data
- Loading from remote URLs
- Categorical data
- Image data
- Audio data
- mlpack models and objects
π Save()
Save(filename, X)- Save
Xto the given filefilenamewith default options:- the format of the file is auto-detected based on the extension of the file, and
- an exception is not thrown on an error.
- Returns a
boolindicating whether the save was a success. Xcan be any supported save type.
- Save
Save(filename, object, Option1 + Option2 + ...)- Save
Xto the given filefilenamewith the given options. - Returns a
boolindicating whether the save was a success. Xcan be any supported save type.- The given options must be from the
list of standalone options and be appropriate for the type
of
X.
- Save
Save(filename, object, opts)- Save
Xto the given filefilenamewith the given options specified inopts. - Returns a
boolindicating whether the save was a success. Xcan be any supported save type.optsis aDataOptionsobject whose subtype matches the type ofX.
- Save
Note: when saving images, it is possible to save
into multiple images from one matrix X. See image data for more
details.
Simple example:
// Generate a 5-dimensional matrix of random data.
arma::mat dataset(5, 1000, arma::fill::randu);
mlpack::Save("dataset.csv", dataset);
std::cout << "Saved random data to 'dataset.csv'." << std::endl;
Among other things, the file format can be easily specified manually:
// Generate a 5-dimensional matrix of random data.
arma::mat dataset(5, 1000, arma::fill::randu);
mlpack::Save("dataset.csv", dataset, mlpack::CSV);
std::cout << "Saved random data to 'dataset.csv'." << std::endl;
See also the other examples for each supported save type:
π Types
Support is available for loading and saving several kinds of data. Given an
object X to be loaded or saved:
- For numeric data,
Xshould have typearma::mator any supported matrix type (e.g.arma::fmat,arma::umat, etc.).- Supported formats are CSV, TSV, text, binary, ARFF, and others; see the table of format options.
- Additional options can be specified with a
DataOptions,MatrixOptions, orTextOptionsobject. - See numeric data examples for example usage.
- For mixed categorical data (data where not
all columns are numeric),
Xshould have typearma::mator any supported matrix type (e.g.arma::fmat,arma::umat, etc.).- Columns of
Xthat are categorical are represented as integer values starting from 0. - Information about categorical dimensions is stored in a
DatasetInfoobject, which is held inside of aTextOptionsobject. - Supported formats are CSV, TSV, text, and ARFF; see the table of format options.
- See categorical data examples for example usage.
- For image data,
Xshould have typearma::mator any supported matrix type (e.g.arma::fmat,arma::umat, etc.).- Images are represented in a vectorized form; see image data for details.
- An
ImageOptionsobject is used for representing metadata specific to image formats. - Supported formats are PNG, JPEG, TGA, BMP, PSD, GIF, PIC, and PNM; see the table of format options.
- See image data examples for example usage.
- For audio data,
Xshould have typearma::mator any supported matrix type (e.g.arma::fmat,arma::umat, etc.).- Audio files are represented in a vectorized form; see audio data for details.
- An
AudioOptionsobject is used for representing metadata specific to audio formats. - Supported formats are MP3 and WAV; see the table of format options.
- See audio data examples for example usage.
- For mlpack models and objects,
Xcan have type equivalent to any mlpack class or type (e.g.mlpack::RandomForest,mlpack::KDTree,mlpack::Range, etc.).- Supported formats for model/object serialization are binary, text, and JSON; see the table of format options.
- See mlpack model and object examples for example usage.
π DataOptions types
The Load() and Save() functions allow
specifying options in a standalone manner or with an instantiated DataOptions
object. Standalone options provide convenience:
// Individual standalone options can be combined with the + operator.
mlpack::Load("filename.csv", X, mlpack::CSV + mlpack::Fatal);
The use of an instantiated DataOptions (or a child class relevant to the type
of data being loaded) allows more complex options to be configured and for
metadata resulting from a load or save operation to be stored:
// Different data types will use DataOptions, MatrixOptions, TextOptions,
// ImageOptions, or other types. See the documentation for each class below.
mlpack::ImageOptions opts;
opts.Channels() = 1; // Force loading in grayscale.
mlpack::Load("filename.png", X, opts);
// Now, `opts.Width()` and `opts.Height()` will store the size of the loaded
// image.
The set of allowed standalone options differs depending on the
type of data being loaded or saved; if using an instantiated options
object, so does the type of opts:
- Numeric data:
MatrixOptionsand its standalone options, orTextOptionsand its standalone options for plaintext formats; - Mixed categorical data:
TextOptionsand its standalone options; - Image data:
ImageOptionsand its standalone options; - Audio data:
AudioOptionsand its standalone options; - mlpack models and objects:
DataOptionsand its standalone options.
π DataOptions
The DataOptions class is the base class from which all options classes
specific to data types are derived. It is default-constructible and
provides the .Fatal() and .Format() members.
Any members or standalone operators available in DataOptions are also
available when using other options types (e.g. TextOptions,
ImageOptions, etc.).
π DataOptions standalone operators and members
The options below can be used as standalone operators to the
Load() and Save() functions, or as
calls to set members of an instantiated DataOptions object.
| Standalone operator | Member function | Available for: | Description | Β |
|---|---|---|---|---|
| Load/save behavior. | Β | Β | Β | Β |
Fatal |
opts.Fatal() = true; |
All data types. | A std::runtime_error will be thrown on failure. |
Β |
NoFatal (default) |
opts.Fatal() = false; |
All data types. | false will be returned on failure. A warning will also be printed if MLPACK_PRINT_WARN is defined. |
Β |
| Formats. | Β | Β | Β | Β |
AutoDetect (default) |
opts.Format() = mlpack::FileType::AutoDetect; |
All data types. | The format of the file is autodetected using the extension of the filename and (if loading) inspecting the file contents. | Β |
| For loading mlpack models and objects. | Β | Β | Β | Β |
BIN |
opts.Format() = FileType::BIN |
.bin |
mlpack models and objects | Load/save the object using an efficient packed binary format. |
JSON |
opts.Format() = FileType::JSON |
.json |
mlpack models and objects | Load/save the object using human- and machine-readable JSON. |
XML |
opts.Format() = FileType::XML |
.xml |
mlpack models and objects | Load/save the object using XML (warning: may be very large). |
π MatrixOptions
The MatrixOptions class represents options specific to matrix types
(numeric and categorical data).
MatrixOptions is derived from DataOptions and thus any
standalone operators or member functions from DataOptions
(e.g. Fatal, NoFatal, and AutoDetect) can also be used with
MatrixOptions.
Note: closely related is the TextOptions class,
specifically for loading numeric or categorical data from plaintext formats.
MatrixOptions is used for non-plaintext numeric data formats.
π MatrixOptions standalone operators and members
The options below can be used as standalone operators to the
Load() and Save() functions, or as
calls to set members of an instantiated MatrixOptions object.
If an option is given that does not match the type of data being loaded or
saved, if Fatal() is set,
then an exception will be thrown; otherwise, a warning will be printed if
MLPACK_PRINT_WARN
is set.
| Standalone operator | Member function | Available for: | Description |
|---|---|---|---|
| Load/save behavior. | Β | Β | Β |
Transpose (default) |
opts.Transpose() = true; |
Numeric and categorical data. | The matrix will be transposed to/from column-major form on load/save. |
NoTranspose |
opts.Transpose() = false; |
Numeric and categorical data. | The matrix will not be transposed to column-major form on load/save. |
| Formats. | Β | Β | Β |
PGM |
opts.Format() = mlpack::FileType::PGMBinary; |
Numeric data. | Load/save in the PGM image format; data should have values in the range [0, 255]. The size of the image will be the same as the size of the matrix (after any transpose is applied). |
PPM |
opts.Format() = mlpack::FileType::PPMBinary; |
Numeric data. | Load/save in the PPM image format; data should have values in the range [0, 255]. The size of the image will be the same as the size of the matrix (after any transpose is applied). |
HDF5 |
opts.Format() = mlpack::FileType::HDF5Binary; |
Numeric data. | Load/save in the HDF5 binary format; only available if Armadillo is configured with HDF5 support. |
ArmaBin |
opts.Format() = mlpack::FileType::ArmaBinary; |
Numeric data. | Load/save in the space-efficient arma_binary format (packed binary data). |
RawBinary |
opts.Format() = mlpack::FileType::RawBinary; |
Numeric data. | Load/save as packed binary data with no header and no size information; data will be loaded as a single column vector (not recommended). |
π TextOptions
The TextOptions class represents options specific to matrix types stored in
plaintext formats (numeric and categorical
data). TextOptions is a child class and thus any standalone operators or members from its parent classes are also available:
DataOptionsprovides:Fatal,NoFatal, andAutoDetectstandalone operatorsopts.Fatal()andopts.Format()members- See the
DataOptionsoperator and member documentation
MatrixOptionsprovides:Transpose,NoTranspose,PGM,PPM,HDF5,ArmaBin, andRawBinarystandalone operatorsopts.Transpose()member- See the
MatrixOptionsoperator and member documentation
π TextOptions standalone operators and members
The options below can be used as standalone operators to the
Load() and Save() functions, or as
calls to set members of an instantiated TextOptions object.
If an option is given that does not match the type of data being loaded or
saved, if Fatal() is set,
then an exception will be thrown; otherwise, a warning will be printed if
MLPACK_PRINT_WARN
is set.
| Standalone operator | Member function | Available for: | Description |
|---|---|---|---|
| Load/save behavior. | Β | Β | Β |
HasHeaders |
opts.HasHeaders() = true; |
Numeric and categorical data, only for the CSV format.. | If true, the first row of the file contains column names instead of data. See note below. |
Categorical |
opts.Categorical() = true; |
Categorical, only for the CSV or ARFF formats. | If true, the data to be loaded or saved is mixed categorical data. See note below. |
Semicolon |
opts.Semicolon() = true; |
Numeric and categorical data, only for the CSV, CoordASCII, and RawASCII formats. |
If true, the field separator in the file is a semicolon instead of a comma. |
MissingToNan |
opts.MissingToNan() = true; |
Numeric and categorical data. | If true, any missing data elements will be represented as NaN instead of 0. |
| Formats. | Β | Β | Β |
CSV |
opts.Format() = mlpack::FileType::CSVASCII; |
Numeric and categorical data. | CSV format. If loading a sparse matrix and the CSV has three columns, the data is interpreted as a coordinate list. |
TSV |
opts.Format() = mlpack::FileType::TSVASCII; |
Numeric and categorical data. | TSV format. If loading a sparse matrix and the TSV has three columns, the data is interpreted as a coordinate list. |
ArmaASCII |
opts.Format() = mlpack::FileType::ArmaASCII; |
Numeric data. | Space-separated values as saved by Armadillo with the arma_ascii format. |
RawASCII |
opts.Format() = mlpack::FileType::RawASCII; |
Numeric data. | Space-separated values with no header. If loading a sparse matrix and the file has three columns, the data is interpreted as a coordinate list. |
CoordASCII |
opts.Format() = mlpack::FileType::CoordASCII; |
Numeric data where X is a sparse matrix (e.g. arma::sp_mat). |
Coordinate list format for sparse data (see coord_ascii). |
ARFF |
opts.Format() = mlpack::FileType::ARFFASCII; |
Categorical data. | ARFF filetype. Used specifically to load mixed categorical dataset. See ARFF documentation. Only for loading. |
| Metadata. | Β | Β | Β |
| (n/a) | opts.Headers() |
Numeric and categorical data. | Returns a std::vector<std::string> with headers detected after loading a CSV. |
| (n/a) | opts.DatasetInfo() |
Categorical data. | Returns a DatasetInfo with dimension information after loading, or that will be used for dimension information during saving. |
Notes:
-
When
opts.HasHeaders()istruewhile loading, the parsed headers from the CSV file are stored into theopts.Headers()member, which has typestd::vector<std::string>. In order to access the headers after loading, an instantiatedTextOptionsmust be passed toLoad(); ifHasHeadersis passed as a standalone option, the parsed headers will not be accessible after loading. -
When
opts.Categorical()istruewhile loading with the CSV format, any fields where a value cannot be interpreted as numeric will be automatically converted to a categorical dimension with values between0and the number of unique values in the field/dimension. See categorical data for more information on this representation. -
When
opts.Categorical()istruewhile loading, aDatasetInfooption is populated with information about each of the dimensions in the dataset and stored inopts.DatasetInfo(). In order to access this after loading, an instantiatedTextOptionsmust be passed toLoad(); ifCategoricalis passed as a standalone option, theDatasetInfoobject will not be accessible after loading. -
When
opts.Categorical()istruewhile saving, the values inopts.DatasetInfo()(which has typeDatasetInfo) will be used to map any categorical dimensions back to their original values. IfCategoricalwas passed as a standalone option, then noDatasetInfocan be set before saving, and all dimensions of the data will be saved as numeric data.
π ImageOptions
The ImageOptions class represents options specific to images.
ImageOptions is a child class of DataOptions and thus any
standalone operators or member functions from DataOptions
(e.g. Fatal, NoFatal, and AutoDetect) can also be used with
ImageOptions.
π ImageOptions standalone operators and members
The options below can be used as standalone operators to the
Load() and Save() functions, or as
calls to set members of an instantiated MatrixOptions object.
If an option is given that does not match the type of data being loaded or
saved, if Fatal() is set,
then an exception will be thrown; otherwise, a warning will be printed if
MLPACK_PRINT_WARN
is set.
| Standalone operator | Member function | Available for: | Description |
|---|---|---|---|
| Formats. | Β | Β | Β |
Image |
opts.Format() = mlpack::FileType::ImageType; |
Image data. | Load in the image format detected by the header of the file; save in the image format specified by the filenameβs extension. |
PNG |
opts.Format() = mlpack::FileType::PNG; |
Image data. | Load/save as a PNG image. |
JPG |
opts.Format() = mlpack::FileType::JPG; |
Image data. | Load/save as a JPEG image. |
TGA |
opts.Format() = mlpack::FileType::TGA; |
Image data. | Load/save as a TGA image. |
BMP |
opts.Format() = mlpack::FileType::BMP; |
Image data. | Load/save as a BMP image. |
PSD |
opts.Format() = mlpack::FileType::PSD; |
Image data. | Load/save as a PSD (Photoshop) image. Only for loading. |
GIF |
opts.Format() = mlpack::FileType::GIF; |
Image data. | Load/save as a GIF image. Only for loading. |
PIC |
opts.Format() = mlpack::FileType::PIC; |
Image data. | Load/save as a PIC (PICtor) image. Only for loading. |
PNM |
opts.Format() = mlpack::FileType::PNM; |
Image data. | Load/save as a PNM (Portable Anymap) image. Only for loading. |
| Save behavior. | Β | Β | Β |
| (n/a) | opts.Quality() |
Image data with JPEG format. | Desired JPEG quality level for saving (a size_t in the range from 0 to 100). |
| Metadata. | Β | Β | Β |
| (n/a) | opts.Height() |
Image data | Returns a size_t representing the height in pixels of the loaded image(s), or the desired height in pixels for saving. |
| (n/a) | opts.Width() |
Image data | Returns a size_t representing the width in pixels of the loaded image(s), or the desired width in pixels for saving. |
| (n/a) | opts.Channels() |
Image data | Returns a size_t representing the number of channels of the loaded image(s). |
Notes:
-
After a call to
Load(), if an instantiatedImageOptionswas passed, theopts.Height(),opts.Width(), andopts.Channels()members will be set with the values found during loading. -
Before calling
Load(), the value ofopts.Channels()can be set to the desired number of channels (1/3/4) to force loading with that many color channels. -
The
opts.Quality()option is only relevant when callingSave()when using theJPGformat.
π AudioOptions
The AudioOptions class represents options specific to audio files.
AudioOptions is a child class of DataOptions and thus any
standalone operators or member functions from DataOptions
(e.g. Fatal, NoFatal, and AutoDetect) can also be used with
AudioOptions.
π AudioOptions standalone operators and members
The options below can be used as standalone operators to the
Load() and Save() functions, or as
calls to set members of an instantiated AudioOptions object.
If an option is given that does not match the type of data being loaded or
saved, if Fatal() is set,
then an exception will be thrown; otherwise, a warning will be printed if
MLPACK_PRINT_WARN
is set.
| Standalone operator | Member function | Available for: | Description |
|---|---|---|---|
| Formats. | Β | Β | Β |
WAV |
opts.Format() = mlpack::FileType::WAV; |
Audio data. | Load/save as a WAV file. |
MP3 |
opts.Format() = mlpack::FileType::MP3; |
Audio data. | Load/save as a MP3 file. |
| Metadata. | Β | Β | Β |
| (n/a) | opts.AudioDuration() |
Audio data | Returns a double representing the duration of the loaded audio, in seconds. Set after loading / saving. |
| (n/a) | opts.BitsPerSample() |
Audio data | Returns a size_t representing the bit depth per sample. Set after loading. This must be set to either 16 or 32 when saving. |
| (n/a) | opts.Channels() |
Audio data | Returns a size_t representing the number of audio channels (e.g. 1 for mono, 2 for stereo). Set after loading, or before saving. |
| (n/a) | opts.SampleRate() |
Audio data | Returns a size_t representing the sample rate in Hz (e.g. 44100, 48000). Set after loading, or before saving. |
| (n/a) | opts.TotalSamples() |
Audio data | Returns a size_t representing the total number of samples loaded (totalFrames * Channels()). |
π Formats
The Load() and Save() functions
support numerous different formats for loading and saving. Not all formats are
relevant for all types of data. The table below lists standalone options that
can be used to specify the format, as well as member functions for a
DataOptions object.
When AutoDetect (the default) is specified as the format, the actual file
format is auto-detected using the filenameβs extension and (if loading)
inspecting the file contents. Accepted filename extensions for each type are
given in the table.
| Standalone operator | Member function | Filename extensions | Available for: | Description |
|---|---|---|---|---|
AutoDetect (default) |
opts.Format() = mlpack::FileType::AutoDetect |
(n/a) | All data types. | The format of the file is autodetected as one of the formats below. |
CSV |
opts.Format() = mlpack::FileType::CSVASCII; |
.csv |
Numeric and categorical data | CSV format. If loading a sparse matrix and the CSV has three columns, the data is interpreted as a coordinate list. |
TSV |
opts.Format() = mlpack::FileType::TSVASCII; |
.tsv |
Numeric and categorical data. | TSV format. If loading a sparse matrix and the TSV has three columns, the data is interpreted as a coordinate list. |
ArmaASCII |
opts.Format() = mlpack::FileType::ArmaASCII; |
.txt, .csv |
Numeric data | Space-separated values as saved by Armadillo with the arma_ascii format. |
RawASCII |
opts.Format() = mlpack::FileType::RawASCII; |
.txt |
Numeric data | Space-separated values with no header. If loading a sparse matrix and the file has three columns, the data is interpreted as a coordinate list. |
CoordASCII |
opts.Format() = mlpack::FileType::CoordASCII; |
.txt (if X is sparse) |
Numeric data where X is a sparse matrix (e.g. arma::sp_mat). |
Coordinate list format for sparse data (see coord_ascii). |
ARFF |
opts.Format() = mlpack::FileType::ARFFASCII; |
.arff |
Categorical data | ARFF filetype. Used specifically to load mixed categorical dataset. See ARFF documentation. Only for loading. |
PGM |
opts.Format() = mlpack::FileType::PGMBinary; |
.pgm |
Numeric data | Load/save in the PGM image format; data should have values in the range [0, 255]. The size of the image will be the same as the size of the matrix (after any transpose is applied). |
PPM |
opts.Format() = mlpack::FileType::PPMBinary; |
.ppm |
Numeric data | Load/save in the PPM image format; data should have values in the range [0, 255]. The size of the image will be the same as the size of the matrix (after any transpose is applied). |
HDF5 |
opts.Format() = mlpack::FileType::HDF5Binary; |
.h5, .hdf5, .hdf, .he5 |
Numeric data | Load/save in the HDF5 binary format; only available if Armadillo is configured with HDF5 support. |
ArmaBin |
opts.Format() = mlpack::FileType::ArmaBinary; |
.bin (if X is an Armadillo type) |
Numeric data | Load/save in the space-efficient arma_binary format (packed binary data). |
RawBinary |
opts.Format() = mlpack::FileType::RawBinary; |
Β | Numeric data | Load/save as packed binary data with no header and no size information; data will be loaded as a single column vector (not recommended). |
Image |
opts.Format() = mlpack::FileType::ImageType |
(n/a) | Image data | Load in the image format detected by the header of the file; save in the image format specified by the filenameβs extension. |
PNG |
opts.Format() = mlpack::FileType::PNG |
.png |
Image data | Load/save as a PNG image. |
JPG |
opts.Format() = mlpack::FileType::JPG |
.jpg, .jpeg |
Image data | Load/save as a JPEG image. |
TGA |
opts.Format() = mlpack::FileType::TGA |
.tga |
Image data | Load/save as a TGA image. |
BMP |
opts.Format() = mlpack::FileType::BMP |
.bmp |
Image data | Load/save as a BMP image. |
PSD |
opts.Format() = mlpack::FileType::PSD |
.psd |
Image data | Load/save as a PSD (Photoshop) image. Only for loading. |
GIF |
opts.Format() = mlpack::FileType::GIF |
.gif |
Image data | Load/save as a GIF image. Only for loading. |
PIC |
opts.Format() = mlpack::FileType::PIC |
.pic |
Image data | Load/save as a PIC (PICtor) image. Only for loading. |
PNM |
opts.Format() = mlpack::FileType::PNM |
.pnm |
Image data | Load/save as a PNM (Portable Anymap) image. Only for loading. |
WAV |
opts.Format() = mlpack::FileType::WAV |
.wav, .wave |
Audio data | Load/save as wave file. |
MP3 |
opts.Format() = mlpack::FileType::MP3 |
.mp3 |
Audio data | Load as mp3 file. |
BIN |
opts.Format() = mlpack::FileType::BIN |
.bin |
mlpack models and objects | Load/save the object using an efficient packed binary format. |
JSON |
opts.Format() = mlpack::FileType::JSON |
.json |
mlpack models and objects | Load/save the object using human- and machine-readable JSON. |
XML |
opts.Format() = mlpack::FileType::XML |
.xml |
mlpack models and objects | Load/save the object using XML (warning: may be very large). |
π Numeric data
Standard numeric data is represented in mlpack as a column-major matrix and a variety of formats for loading and saving are supported.
-
When calling
Load()andSave(),Xshould have typearma::mator any other supported matrix type (e.g.arma::fmat,arma::umat, and so forth). -
When calling
Load()with a vectorfilenames, all files must have the same number of dimensions and header names (if using CSVs with headers). All files will be concatenated into the output matrixX. -
When loading and saving with an instantiated
DataOptionsobject, theMatrixOptionsandTextOptionssubtypes can be used. -
Supported formats are CSV, TSV, text, binary, ARFF, and others; see the table of format options.
π Numeric data load/save examples
Load two datasets, print information about them, modify them, and save them back to disk.
// Throw an exception if loading fails with the Fatal option.
// See https://datasets.mlpack.org/satellite.train.csv.
arma::mat dataset;
mlpack::Load("satellite.train.csv", dataset, mlpack::Fatal);
// See https://datasets.mlpack.org/satellite.train.labels.csv.
arma::Row<size_t> labels;
mlpack::Load("satellite.train.labels.csv", labels, mlpack::Fatal);
// Print information about the data.
std::cout << "The data in 'satellite.train.csv' has: " << std::endl;
std::cout << " - " << dataset.n_cols << " points." << std::endl;
std::cout << " - " << dataset.n_rows << " dimensions." << std::endl;
std::cout << "The labels in 'satellite.train.labels.csv' have: " << std::endl;
std::cout << " - " << labels.n_elem << " labels." << std::endl;
std::cout << " - A maximum label of " << labels.max() << "." << std::endl;
std::cout << " - A minimum label of " << labels.min() << "." << std::endl;
// Modify and save the data. Add 2 to the data and drop the last column.
dataset += 2;
dataset.shed_col(dataset.n_cols - 1);
labels.shed_col(labels.n_cols - 1);
// Don't throw an exception if saving fails. Technically there is no need to
// explicitly specify NoFatal---it is the default.
mlpack::Save("satellite.train.mod.csv", dataset, mlpack::NoFatal);
mlpack::Save("satellite.train.labels.mod.csv", labels, mlpack::NoFatal);
Load a dataset stored in a binary format and save it as a CSV.
// See https://datasets.mlpack.org/iris.bin.
arma::mat dataset;
mlpack::Load("iris.bin",
dataset, mlpack::Fatal + mlpack::ArmaBin);
// Save it back to disk as a CSV.
mlpack::Save("iris.converted.csv", dataset, mlpack::CSV);
Load a dataset that has a semicolon as a separator instead of a comma.
// First write the semicolon file to disk.
std::fstream f;
f.open("semicolon.csv", std::fstream::out);
f << "1; 2; 3; 4" << std::endl;
f << "5; 6; 7; 8" << std::endl;
f << "9; 10; 11; 12" << std::endl;
// Now create a TextOptions and specify that the separator is a semicolon.
// Since all of the elements are integers, we load into an `arma::umat` (a
// matrix that holds unsigned integers) instead of an `arma::mat`.
arma::umat dataset;
mlpack::TextOptions opts;
opts.Semicolon() = true;
// Note that instead of `opts` we could just specify `Semicolon` instead!
mlpack::Load("semicolon.csv", dataset, opts);
std::cout << "The data in 'semicolon.csv' has: " << std::endl;
std::cout << " - " << dataset.n_cols << " points." << std::endl;
std::cout << " - " << dataset.n_rows << " dimensions." << std::endl;
Load a dataset with missing elements, and replace the missing elements with NaN
using the MissingToNan option.
// First write a CSV file to disk with some missing values.
std::fstream f;
f.open("missing_to_nan.csv", std::fstream::out);
// Missing 2 value in the first row.
f << "1, , 3, 4" << std::endl;
f << "5, 6, 7, 8" << std::endl;
f << "9, 10, 11, 12" << std::endl;
arma::mat dataset;
mlpack::TextOptions opts;
opts.MissingToNan() = true;
// Note that instead of `opts` we could just specify `MissingToNan` instead!
mlpack::Load("missing_to_nan.csv", dataset, opts);
// Print information about the data.
std::cout << "Loaded data:" << std::endl;
std::cout << dataset;
Load a CSV into a 32-bit floating point matrix and print the headers (column names).
// See https://datasets.mlpack.org/Admission_Predict.csv.
arma::fmat dataset;
// We have to make a TextOptions object so that we can recover the headers.
mlpack::TextOptions opts;
opts.Format() = mlpack::FileType::CSVASCII;
opts.HasHeaders() = true;
mlpack::Load("Admission_Predict.csv", dataset, opts);
std::cout << "Found " << opts.Headers().size() << " columns." << std::endl;
for (size_t i = 0; i < opts.Headers().size(); ++i)
{
std::cout << " - Column " << i << ": '" << opts.Headers()[i] << "'."
<< std::endl;
}
Load a CSV containing a coordinate list into a sparse matrix and print the overall size of the loaded matrix.
// See https://datasets.mlpack.org/movielens-100k.csv.
arma::sp_mat dataset;
// A 3-column CSV into a sparse matrix is interpreted as a coordinate list.
mlpack::Load("movielens-100k.csv", dataset, mlpack::CSV);
std::cout << "Loaded data from movielens-100k.csv; matrix size: "
<< dataset.n_rows << " x " << dataset.n_cols << "." << std::endl;
π Loading from remote URLs
mlpack supports loading datasets from URLs. Files will be downloaded using the cpp-httplib library, which is bundled with mlpack for ease of use.
When a remote URL is given to Load():
-
The URL must start with either
http://orhttps://or loading will fail. -
If the URL starts with
https://, support must be enabled with#define MLPACK_USE_HTTPSbefore including mlpack, and the program must be additionally linked with-lssl -lcrypto. -
The downloaded file will be saved to the system temporary directory (e.g.
/tmp/on Linux systems).
// Throw an exception if loading fails with the Fatal option.
arma::mat dataset;
mlpack::Load("http://datasets.mlpack.org/satellite.train.csv", dataset,
mlpack::Fatal);
arma::Row<size_t> labels;
mlpack::Load("http://datasets.mlpack.org/satellite.train.labels.csv",
labels, mlpack::Fatal);
// Print information about the data.
std::cout << "The data in 'satellite.train.csv' has: " << std::endl;
std::cout << " - " << dataset.n_cols << " points." << std::endl;
std::cout << " - " << dataset.n_rows << " dimensions." << std::endl;
std::cout << "The labels in 'satellite.train.labels.csv' have: " << std::endl;
std::cout << " - " << labels.n_elem << " labels." << std::endl;
std::cout << " - A maximum label of " << labels.max() << "." << std::endl;
std::cout << " - A minimum label of " << labels.min() << "." << std::endl;
π Mixed categorical data
mlpack supports mixed categorical data, e.g., data where some dimensions take
only categorical values (e.g. 0, 1, 2, etc.). When using mlpack, string
data and other non-numerical data must be mapped to categorical values and
represented as part of an arma::mat or other matrix type. Category metadata
is stored in an auxiliary DatasetInfo object.
-
When calling
Load()andSave(),Xshould have typearma::mator any other supported matrix type (e.g.arma::fmat,arma::umat, and so forth). -
To load categorical data, either the
Categoricalstandalone option must be passed, or an instantiatedTextOptionsoptsmust be passed withopts.Categorical() = true. -
Supported formats are CSV, TSV, text, and ARFF; see the table of format options.
-
When loading, each unique non-numeric value is mapped (sequentially) to positive integers. Any columns with non-numeric values are marked as categorical.
-
To access mappings from each categorical value to its original value after load, as well as which dimensions are categorical, an instantiated
TextOptionsoptsmust be passed toLoad(); then, the associatedDatasetInfois accessible viaopts.DatasetInfo(). -
When saving, reverse mappings from positive integers to the original unique non-numeric values in
opts.DatasetInfo()are applied. To set these mappings, as well as which dimensions are categorical, an instantiatedTextOptionsoptsmust be passed toSave()withopts.DatasetInfo()set accordingly.
Categorical data is supported by a number of mlpack algorithms, including
DecisionTree,
HoeffdingTree, and
RandomForest.
π DatasetInfo
mlpack represents categorical data via the use of the auxiliary
DatasetInfo object, which stores information about which dimensions are
numeric or categorical and allows conversion from the original category values
to the numeric values used to represent those categories.
For loading and saving categorical data, an instantiated
TextOptions must be passed to Load() or
Save(); this object contains a DatasetInfo object,
accessible via the
.DatasetInfo() method; e.g.,
opts.DatasetInfo().
Accessing and setting properties
This documentation uses info as the name of the DatasetInfo object,
but if a categorical dataset has been loaded with Load(),
it is instead suggested to use opts.DatasetInfo() in place of info.
info = DatasetInfo(dimensionality)- Create a
DatasetInfoobject with the given dimensionality - All dimensions are assumed to be numeric (not categorical).
- Create a
info.Type(d)- Get the type (categorical or numeric) of dimension
d. - Returns a
Datatype, eitherDatatype::numericorDatatype::categorical. - Calling
info.Type(d) = twill set a dimension to typet, but this should only be done beforeinfois used withLoad()orSave().
- Get the type (categorical or numeric) of dimension
info.NumMappings(d)- Get the number of categories in dimension
das asize_t. - Returns
0if dimensiondis numeric.
- Get the number of categories in dimension
info.Dimensionality()- Return the dimensionality of the object as a
size_t.
- Return the dimensionality of the object as a
Map to and from numeric values
info.MapString<double>(value, d)- Given
value(astd::string), return thedoublerepresenting the categorical mapping (an integer value) ofvaluein dimensiond. - If a mapping for
valuedoes not exist in dimensiond, a new mapping is created, andinfo.NumMappings(d)is increased by one. - If dimension
dis numeric andvaluecannot be parsed as a numeric value, then dimensiondis changed to categorical and a new mapping is returned.
- Given
info.UnmapString(mappedValue, d)- Given
mappedValue(asize_t), return thestd::stringcontaining the original category that mapped to the valuemappedValuein dimensiond. - If dimension
dis not categorical, astd::invalid_argumentis thrown.
- Given
π Categorical data load/save examples
Load and manipulate an ARFF file.
// Load a categorical dataset.
arma::mat dataset;
// Define a TextOptions to load categorical data.
mlpack::TextOptions opts;
opts.Fatal() = true;
opts.Categorical() = true;
// See https://datasets.mlpack.org/covertype.train.arff.
mlpack::Load("covertype.train.arff", dataset, opts);
// Print information about the data.
std::cout << "The data in 'covertype.train.arff' has: " << std::endl;
std::cout << " - " << dataset.n_cols << " points." << std::endl;
std::cout << " - " << opts.DatasetInfo().Dimensionality() << " dimensions."
<< std::endl;
arma::Row<size_t> labels;
// We need to have a second options, since we are loading two different
// data types and extension.
mlpack::TextOptions labelOpts;
labelOpts.Fatal() = true;
// See https://datasets.mlpack.org/covertype.train.labels.csv.
mlpack::Load("covertype.train.labels.csv", labels, labelOpts);
// Print information about each dimension.
for (size_t d = 0; d < opts.DatasetInfo().Dimensionality(); ++d)
{
if (opts.DatasetInfo().Type(d) == mlpack::Datatype::categorical)
{
std::cout << " - Dimension " << d << " is categorical with "
<< opts.DatasetInfo().NumMappings(d) << " categories." << std::endl;
}
else
{
std::cout << " - Dimension " << d << " is numeric." << std::endl;
}
}
// Modify the 5th point. Increment any numeric values, and set any categorical
// values to the string "hooray!".
for (size_t d = 0; d < opts.DatasetInfo().Dimensionality(); ++d)
{
if (opts.DatasetInfo().Type(d) == mlpack::Datatype::categorical)
{
// This will create a new mapping if the string "hooray!" does not already
// exist as a category for dimension d..
dataset(d, 4) = opts.DatasetInfo().MapString<double>("hooray!", d);
}
else
{
dataset(d, 4) += 1.0;
}
}
Manually create a DatasetInfo object and use it to
save a categorical dataset.
// This will manually create the following data matrix (shown as it would appear
// in a CSV):
//
// 1, TRUE, "good", 7.0, 4
// 2, FALSE, "good", 5.6, 3
// 3, FALSE, "bad", 6.1, 4
// 4, TRUE, "bad", 6.1, 1
// 5, TRUE, "unknown", 6.3, 0
// 6, FALSE, "unknown", 5.1, 2
//
// Although the last dimension is numeric, we will take it as a categorical
// dimension.
arma::mat dataset(5, 6); // 6 data points in 5 dimensions.
mlpack::DatasetInfo info(5);
// Set types of dimensions. By default they are numeric so we only set
// categorical dimensions.
info.Type(1) = mlpack::Datatype::categorical;
info.Type(2) = mlpack::Datatype::categorical;
info.Type(4) = mlpack::Datatype::categorical;
// The first dimension is numeric.
dataset(0, 0) = 1;
dataset(0, 1) = 2;
dataset(0, 2) = 3;
dataset(0, 3) = 4;
dataset(0, 4) = 5;
dataset(0, 5) = 6;
// The second dimension is categorical.
dataset(1, 0) = info.MapString<double>("TRUE", 1);
dataset(1, 1) = info.MapString<double>("FALSE", 1);
dataset(1, 2) = info.MapString<double>("FALSE", 1);
dataset(1, 3) = info.MapString<double>("TRUE", 1);
dataset(1, 4) = info.MapString<double>("TRUE", 1);
dataset(1, 5) = info.MapString<double>("FALSE", 1);
// The third dimension is categorical.
dataset(2, 0) = info.MapString<double>("good", 2);
dataset(2, 1) = info.MapString<double>("good", 2);
dataset(2, 2) = info.MapString<double>("bad", 2);
dataset(2, 3) = info.MapString<double>("bad", 2);
dataset(2, 4) = info.MapString<double>("unknown", 2);
dataset(2, 5) = info.MapString<double>("unknown", 2);
// The fourth dimension is numeric.
dataset(3, 0) = 7.0;
dataset(3, 1) = 5.6;
dataset(3, 2) = 6.1;
dataset(3, 3) = 6.1;
dataset(3, 4) = 6.3;
dataset(3, 5) = 5.1;
// The fifth dimension is categorical. Note that `info` will choose to assign
// category values in the order they are seen, even if the category can be
// parsed as a number. So, here, the value '4' will be assigned category '0',
// since it is seen first.
dataset(4, 0) = info.MapString<double>("4", 4);
dataset(4, 1) = info.MapString<double>("3", 4);
dataset(4, 2) = info.MapString<double>("4", 4);
dataset(4, 3) = info.MapString<double>("1", 4);
dataset(4, 4) = info.MapString<double>("0", 4);
dataset(4, 5) = info.MapString<double>("2", 4);
// Print the dataset with mapped categories.
dataset.print("Dataset with mapped categories");
// Print the mappings for the third dimension.
std::cout << "Mappings for dimension 3: " << std::endl;
for (size_t i = 0; i < info.NumMappings(2); ++i)
{
std::cout << " - \"" << info.UnmapString(i, 2) << "\" maps to " << i << "."
<< std::endl;
}
// Now `dataset` is ready for use with an mlpack algorithm that supports
// categorical data. We will save it to `categorical-data.csv`.
mlpack::TextOptions opts;
opts.Categorical() = true;
opts.DatasetInfo() = std::move(info);
mlpack::Save("categorical-data.csv", dataset, opts);
π Image data
mlpack loads, saves, and modifies image data using the STB library. STB is a header-only library that is bundled with mlpack; but, it is also possible to use a version of STB available on the system.
When loading images, each image is represented as a flattened single column
vector in a data matrix; each row of the resulting vector will correspond to a
single pixel value (between 0 and 255) in a single channel. If an
ImageOptions was passed to Load(), it
will be populated with the metadata of the image.
Images are flattened along rows, with channel values interleaved, starting from
the top left. Thus, the value of the pixel at position (x, y) in channel c
will be contained in element/row y * (channels) + x * (width * channels) + c
of the flattened vector.
-
Supported image loading formats are JPEG, PNG, TGA, BMP, PSD, GIF, PIC, and PNM; see the table of formats for more details.
-
Multiple images can be loaded into the columns of a single matrix using the overload of
Savethat takes a vector offilenames. -
Supported image saving formats are JPEG, PNG, TGA, and BMP.
-
Accessing the metadata of an image after loading can be done with
opts.Width(),opts.Height(), andopts.Channels(). See theImageOptionsmember documentation for more details. -
mlpack offers several utility functions for image modification and preprocessing, documented in Image preprocessing.
When working with images, the following overload for
Save() is also available:
Save(filenames, X, opts)-
Save each column in
X(anarma::mator other matrix type) as a separate image. -
filenamesis astd::vector<std::string>representing all the images that should be saved. -
optsis anImageOptionsthat contains image metadata. -
opts.Width(),opts.Height(),opts.Channels(), andopts.Quality()should be set to the desired parameters before calling; seeImageOptionsmembers for more details. -
The
ith column ofXwill be saved to theith filename infilenames. -
If all images are saved successfully,
truewill be returned.
-
Note: when loading and saving images, if the element type of X is not
unsigned char (e.g. if image is not arma::Mat<unsigned char>, when
loading, the data will be temporarily loaded as unsigned chars and then
converted, and when saving, X will be converted to unsigned chars before
saving.
π Image data load/save examples
Load a single image, but donβt store the metadata (so, e.g., height, width, and number of channels are unavailable after loading!).
// See https://www.mlpack.org/static/img/numfocus-logo.png.
arma::mat image;
mlpack::Load("numfocus-logo.png", image, mlpack::PNG);
// If we wanted image metadata, we would need to pass an ImageOptions. See the
// next example.
//
// We could also specify `Image` instead of `PNG` if we did not care which image
// format was used, but just that *some* image format was used.
std::cout << "The image in 'numfocus-logo.png' has " << image.n_rows
<< " pixels." << std::endl;
Load and save a single image:
// See https://www.mlpack.org/static/img/numfocus-logo.png.
mlpack::ImageOptions opts;
opts.Fatal() = true;
arma::mat matrix;
mlpack::Load("numfocus-logo.png", matrix, opts /* format autodetected */);
// `matrix` should now contain one column.
// Print information about the image.
std::cout << "Information about the image in 'numfocus-logo.png': "
<< std::endl;
std::cout << " - " << opts.Width() << " pixels in width." << std::endl;
std::cout << " - " << opts.Height() << " pixels in height." << std::endl;
std::cout << " - " << opts.Channels() << " color channels." << std::endl;
std::cout << "Value at pixel (x=3, y=4) in the first channel: ";
const size_t index = (4 * opts.Width() * opts.Channels()) +
(3 * opts.Channels());
std::cout << matrix[index] << "." << std::endl;
// Increment each pixel value, but make sure they are still within the bounds.
matrix += 1;
matrix.clamp(0, 255);
mlpack::Save("numfocus-logo-mod.png", matrix, opts);
Load and save multiple images:
// Load some favicons from websites associated with mlpack.
std::vector<std::string> images;
// See the following files:
// - https://datasets.mlpack.org/images/mlpack-favicon.png
// - https://datasets.mlpack.org/images/ensmallen-favicon.png
// - https://datasets.mlpack.org/images/armadillo-favicon.png
// - https://datasets.mlpack.org/images/bandicoot-favicon.png
images.push_back("mlpack-favicon.png");
images.push_back("ensmallen-favicon.png");
images.push_back("armadillo-favicon.png");
images.push_back("bandicoot-favicon.png");
mlpack::ImageOptions opts;
opts.Channels() = 1; // Force loading in grayscale.
opts.Fatal() = true;
arma::mat matrix;
mlpack::Load(images, matrix, opts);
// Print information about what we loaded.
std::cout << "Loaded " << matrix.n_cols << " images. Images are of size "
<< opts.Width() << " x " << opts.Height() << " with " << opts.Channels()
<< " color channel." << std::endl;
// Invert images.
matrix = (255.0 - matrix);
// Save as compressed JPEGs with low quality.
opts.Quality() = 75;
std::vector<std::string> outImages;
outImages.push_back("mlpack-favicon-inv.jpeg");
outImages.push_back("ensmallen-favicon-inv.jpeg");
outImages.push_back("armadillo-favicon-inv.jpeg");
outImages.push_back("bandicoot-favicon-inv.jpeg");
mlpack::Save(outImages, matrix, opts);
π Audio data
mlpack loads WAV and MP3 audio data using the
dr_libs library. dr_libs is a
header-only library that decodes WAV, MP3 and FLAC files. mlpack bundles WAV
and MP3; but, it is also possible to use a version of dr_libs
available on the system.
dr_libs decodes audio files into Pulse-Code Modulation (PCM) frames.
Each frame represents a single sample for each audio channel. Thus, for mono
(one channel), each frame has only one element; for stereo (2 channels), each
frame has two elements.
When loading audio files, each audio file is flattened into a single column
vector in the loaded data matrix, in order of frames. So, for a stereo audio
file, the rows of the column vector are in the ordering [l0, r0, l1, r1, ..., ln, rn]
where l0 and r0 are the left and right samples in frame 0.
Visual representation for Stereo:
Time βββββββββββββββββββββββββββββββΊ
Frame 0 Frame 1 Frame 2
ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ
β Lβ β Rβ β β Lβ β Rβ β β Lβ β Rβ β
ββββββββ΄βββββββ ββββββββ΄βββββββ ββββββββ΄βββββββ
If an AudioOptions is passed to Load(), it will be
populated with the metadata of the audio file.
-
Supported audio loading formats are WAV and MP3; see the table of formats for more details.
-
When loading an audio file into a matrix with a floating-point type (e.g.
arma::fmat,arma::mat, etc.), regardless of the underlying sample format of the audio file, the loaded values will be in the range[-1.0, 1.0]. -
When loading an audio file into a matrix with an integer type (e.g.
arma::imat,arma::umat,arma::Mat<short>, etc., regardless of the underlying sample format of the audio file:-
If the integer type is unsigned, then the loaded values will be between 0 and the maximum representable value (e.g.
[0, 65535]forunsigned short). -
If the integer type is signed, then the loaded values will be between the most negative and most positive representable values (e.g.
[-32768, 32767]forshort).
-
-
The only supported audio saving format is
WAV. -
When saving to a WAV file, the value of
opts.BitsPerSample()must be set to either8,16,32or64to define the format used for each sample in the file:-
If
opts.BitsPerSample()is8, then regardless of the format of the given matrix to be saved, the data will be stored as 8-bit PCM format unsigned integers. -
If
opts.BitsPerSample()is16, then regardless of the format of the given matrix to be saved, the data will be stored as 16-bit PCM format signed integers. -
If
opts.BitsPerSample()is32and the given matrix has integral elements (e.g.arma::imat,arma::umat, etc.), the data will be stored as 32-bit PCM format signed integers. -
If
opts.BitsPerSample()is64and the given matrix has integral elements (e.g.arma::imat,arma::umat, etc.), the data will be stored as 64-bit PCM format signed integers. -
If
opts.BitsPerSample()is either32or64and the given matrix has floating-point elements (e.g.arma::fmat,arma::mat, etc.), the data will be stored as either 32-bit or 64-bit IEEE floating point numbers respectively.
-
π Audio data load/save examples
Load a single audio file, but donβt store the metadata (note that this means the number of channels are unavailable after loading!).
// See https://datasets.mlpack.org/sine.wav
arma::mat audio;
mlpack::Load("sine.wav", audio, mlpack::WAV);
// If we wanted audio metadata, we would need to pass an AudioOptions. See the
// next example.
std::cout << "The audio file in 'file.wav' contains " << audio.n_rows
<< " samples." << std::endl;
Load and save a single audio file:
// See https://datasets.mlpack.org/fifths.mp3
mlpack::AudioOptions opts, opts2;
opts.Fatal() = true;
arma::mat matrix;
mlpack::Load("fifths.mp3", matrix, opts /* format autodetected */);
// `matrix` contains one column.
// Print some information about the audio file.
std::cout << "Information about the audio file in 'fifths.mp3': "
<< std::endl;
std::cout << "Audio Duration: " << opts.AudioDuration() << std::endl;
std::cout << "Audio Channels: " << opts.Channels() << std::endl;
std::cout << "Sampling Rate: " << opts.SampleRate() << std::endl;
// opts will be populated with mp3 filetype, we need to use another options
mlpack::Save("myFifths.wav", matrix, opts2);
π mlpack models and objects
Machine learning models and any mlpack object (i.e. anything in the mlpack::
namespace) can be saved with Save() and loaded with
Load(). Serialization is performed using the
cereal serialization toolkit.
-
When calling
Load()andSave(),Xshould be the desired mlpack model or object type. -
When loading and saving with an instantiated
DataOptionsobject, the baseDataOptionssubtype should be used. -
Supported formats are binary, JSON, and XML; see the table of format options.
-
FileType::BIN(.bin) is recommended for the sake of size; objects in binary format may be an order of magnitude or more smaller than JSON! -
FileType::JSON(.json) andFileType::XML(.xml) produce human-readable files, but they may be quite large.
-
Note: when loading an object that was saved in the binary format
(BIN), the C++ type of the
object must be exactly the same (including template parameters) as the
type used to save the object. If not, undefined behavior will occurβmost
likely a crash.
π mlpack models and objects load/save examples
Simple example: create a math::Range object, then save and load it.
mlpack::math::Range r(3.0, 6.0);
// How we can use DataOptions with loading / saving objects.
mlpack::DataOptions opts;
opts.Fatal() = true;
opts.Format() = mlpack::FileType::BIN;
// Save the Range to 'range.bin', using the name "range".
mlpack::Save("range.bin", r, opts);
// Load the range into a new object.
mlpack::math::Range r2;
mlpack::Load("range.bin", r2, mlpack::BIN + mlpack::Fatal);
std::cout << "Loaded range: [" << r2.Lo() << ", " << r2.Hi() << "]."
<< std::endl;
// Modify and save the range as JSON.
r2.Lo() = 4.0;
mlpack::Save("range.json", r2, mlpack::JSON + mlpack::Fatal);
// Now 'range.json' will contain the following:
//
// {
// "range": {
// "cereal_class_version": 0,
// "hi": 6.0,
// "lo": 4.0
// }
// }
Train a LinearRegression model and save it to
disk, then reload it.
// See https://datasets.mlpack.org/admission_predict.csv.
arma::mat data;
mlpack::Load("admission_predict.csv", data, mlpack::NoFatal);
// See https://datasets.mlpack.org/admission_predict.responses.csv.
arma::rowvec responses;
mlpack::Load("admission_predict.responses.csv", responses, mlpack::Fatal);
// Train a linear regression model, fitting an intercept term and using an L2
// regularization parameter of 0.3.
mlpack::LinearRegression lr(data, responses, 0.3, true);
// Save the model using the binary format as a standalone parameter, throwing an
// exception on failure.
mlpack::Save("lr-model.bin", lr, mlpack::Fatal + mlpack::BIN);
std::cout << "Saved model to lr-model.bin." << std::endl;
// Now load the model back, using format autodetection on the filename
// extension.
mlpack::LinearRegression loadedModel;
if (!mlpack::Load("lr-model.bin", loadedModel))
{
std::cout << "Model not loaded successfully from 'lr-model.bin'!"
<< std::endl;
}
else
{
std::cout << "Model loaded successfully from 'lr-model.bin' with "
<< "intercept value of " << loadedModel.Parameters()[0] << "."
<< std::endl;
}