Functions to load and save matrices and models. More...
Classes | |
class | BagOfWordsEncodingPolicy |
Definition of the BagOfWordsEncodingPolicy class. More... | |
class | CharExtract |
The class is used to split a string into characters. More... | |
class | CustomImputation |
A simple custom imputation class. More... | |
class | DatasetMapper |
Auxiliary information for a dataset, including mappings to/from strings (or other types) and the datatype of each dimension. More... | |
class | DictionaryEncodingPolicy |
DicitonaryEnocdingPolicy is used as a helper class for StringEncoding. More... | |
struct | HasSerialize |
struct | HasSerializeFunction |
class | ImageInfo |
Implements meta-data of images required by data::Load and data::Save for loading and saving images into arma::Mat. More... | |
class | Imputer |
Given a dataset of a particular datatype, replace user-specified missing value with a variable dependent on the StrategyType and MapperType. More... | |
class | IncrementPolicy |
IncrementPolicy is used as a helper class for DatasetMapper. More... | |
class | ListwiseDeletion |
A complete-case analysis to remove the values containing mappedValue. More... | |
class | LoadCSV |
Load the csv file.This class use boost::spirit to implement the parser, please refer to following link http://theboostcpplibraries.com/boost.spirit for quick review. More... | |
class | MaxAbsScaler |
A simple MaxAbs Scaler class. More... | |
class | MeanImputation |
A simple mean imputation class. More... | |
class | MeanNormalization |
A simple Mean Normalization class. More... | |
class | MedianImputation |
This is a class implementation of simple median imputation. More... | |
class | MinMaxScaler |
A simple MinMax Scaler class. More... | |
class | MissingPolicy |
MissingPolicy is used as a helper class for DatasetMapper. More... | |
class | PCAWhitening |
A simple PCAWhitening class. More... | |
class | ScalingModel |
The model to save to disk. More... | |
class | SplitByAnyOf |
The SplitByAnyOf class tokenizes a string using a set of delimiters. More... | |
class | StandardScaler |
A simple Standard Scaler class. More... | |
class | StringEncoding |
The class translates a set of strings into numbers using various encoding algorithms. More... | |
class | StringEncodingDictionary |
This class provides a dictionary interface for the purpose of string encoding. More... | |
class | StringEncodingDictionary< boost::string_view > |
class | StringEncodingDictionary< int > |
struct | StringEncodingPolicyTraits |
This is a template struct that provides some information about various encoding policies. More... | |
struct | StringEncodingPolicyTraits< DictionaryEncodingPolicy > |
The specialization provides some information about the dictionary encoding policy. More... | |
class | TfIdfEncodingPolicy |
Definition of the TfIdfEncodingPolicy class. More... | |
class | ZCAWhitening |
A simple ZCAWhitening class. More... | |
Typedefs | |
template | |
using | BagOfWordsEncoding = StringEncoding< BagOfWordsEncodingPolicy, StringEncodingDictionary< TokenType > > |
A convenient alias for the StringEncoding class with BagOfWordsEncodingPolicy and the default dictionary for the given token type. More... | |
using | DatasetInfo = DatasetMapper< data::IncrementPolicy > |
template | |
using | DictionaryEncoding = StringEncoding< DictionaryEncodingPolicy, StringEncodingDictionary< TokenType > > |
A convenient alias for the StringEncoding class with DictionaryEncodingPolicy and the default dictionary for the given token type. More... | |
template | |
using | TfIdfEncoding = StringEncoding< TfIdfEncodingPolicy, StringEncodingDictionary< TokenType > > |
A convenient alias for the StringEncoding class with TfIdfEncodingPolicy and the default dictionary for the given token type. More... | |
Enumerations | |
enum | Datatype : bool { numeric = 0, categorical = 1 } |
The Datatype enum specifies the types of data mlpack algorithms can use. More... | |
enum | format { autodetect , text , xml , binary } |
Define the formats we can read through boost::serialization. More... | |
Functions | |
arma::file_type | AutoDetect (std::fstream &stream, const std::string &filename) |
Attempt to auto-detect the type of a file given its extension, and by inspecting the parts of the file to disambiguate between types when necessary. More... | |
template | |
void | Binarize (const arma::Mat< T > &input, arma::Mat< T > &output, const double threshold) |
Given an input dataset and threshold, set values greater than threshold to 1 and values less than or equal to the threshold to 0. More... | |
template | |
void | Binarize (const arma::Mat< T > &input, arma::Mat< T > &output, const double threshold, const size_t dimension) |
Given an input dataset and threshold, set values greater than threshold to 1 and values less than or equal to the threshold to 0. More... | |
template | |
void | ConfusionMatrix (const arma::Row< size_t > predictors, const arma::Row< size_t > responses, arma::Mat< eT > &output, const size_t numClasses) |
A confusion matrix is a summary of prediction results on a classification problem. More... | |
arma::file_type | DetectFromExtension (const std::string &filename) |
Return the type based only on the extension. More... | |
std::string | Extension (const std::string &filename) |
std::string | GetStringType (const arma::file_type &type) |
Given a file type, return a logical name corresponding to that file type. More... | |
arma::file_type | GuessFileType (std::istream &f) |
Given an istream, attempt to guess the file type. More... | |
HAS_EXACT_METHOD_FORM (serialize, HasSerializeCheck) | |
bool | ImageFormatSupported (const std::string &fileName, const bool save=false) |
Checks if the given image filename is supported. More... | |
template | |
bool | IsNaNInf (T &val, const std::string &token) |
See if the token is a NaN or an Inf, and if so, set the value accordingly and return a boolean representing whether or not it is. More... | |
template | |
bool | Load (const std::string &filename, arma::Mat< eT > &matrix, const bool fatal=false, const bool transpose=true, const arma::file_type inputLoadType=arma::auto_detect) |
Loads a matrix from file, guessing the filetype from the extension. More... | |
template | |
bool | Load (const std::string &filename, arma::SpMat< eT > &matrix, const bool fatal=false, const bool transpose=true) |
Loads a sparse matrix from file, using arma::coord_ascii format. More... | |
template | |
bool | Load (const std::string &filename, arma::Col< eT > &vec, const bool fatal=false) |
Don't document these with doxygen; these declarations aren't helpful to users. More... | |
template | |
bool | Load (const std::string &filename, arma::Row< eT > &rowvec, const bool fatal=false) |
Load a row vector from a file, guessing the filetype from the extension. More... | |
template | |
bool | Load (const std::string &filename, arma::Mat< eT > &matrix, DatasetMapper< PolicyType > &info, const bool fatal=false, const bool transpose=true) |
Loads a matrix from a file, guessing the filetype from the extension and mapping categorical features with a DatasetMapper object. More... | |
template | |
bool | Load (const std::string &filename, const std::string &name, T &t, const bool fatal=false, format f=format::autodetect) |
Don't document these with doxygen; they aren't helpful for users to know about. More... | |
template | |
bool | Load (const std::string &filename, arma::Mat< eT > &matrix, ImageInfo &info, const bool fatal=false) |
Image load/save interfaces. More... | |
template | |
bool | Load (const std::vector< std::string > &files, arma::Mat< eT > &matrix, ImageInfo &info, const bool fatal=false) |
Load the image file into the given matrix. More... | |
template | |
void | LoadARFF (const std::string &filename, arma::Mat< eT > &matrix) |
A utility function to load an ARFF dataset as numeric features (that is, as an Armadillo matrix without any modification). More... | |
template | |
void | LoadARFF (const std::string &filename, arma::Mat< eT > &matrix, DatasetMapper< PolicyType > &info) |
A utility function to load an ARFF dataset as numeric and categorical features, using the DatasetInfo structure for mapping. More... | |
bool | LoadImage (const std::string &filename, arma::Mat< unsigned char > &matrix, ImageInfo &info, const bool fatal=false) |
template | |
void | NormalizeLabels (const RowType &labelsIn, arma::Row< size_t > &labels, arma::Col< eT > &mapping) |
Given a set of labels of a particular datatype, convert them to unsigned labels in the range [0, n) where n is the number of different labels. More... | |
template | |
void | OneHotEncoding (const RowType &labelsIn, MatType &output) |
Given a set of labels of a particular datatype, convert them to binary vector. More... | |
template | |
void | OneHotEncoding (const arma::Mat< eT > &input, const arma::Col< size_t > &indices, arma::Mat< eT > &output) |
Overloaded function for the above function, which takes a matrix as input and also a vector of indices to encode and outputs a matrix. More... | |
template | |
void | OneHotEncoding (const arma::Mat< eT > &input, arma::Mat< eT > &output, const data::DatasetInfo &datasetInfo) |
Overloaded function for the above function, which takes a matrix as input and also a DatasetInfo object and outputs a matrix. More... | |
template | |
void | RevertLabels (const arma::Row< size_t > &labels, const arma::Col< eT > &mapping, arma::Row< eT > &labelsOut) |
Given a set of labels that have been mapped to the range [0, n), map them back to the original labels given by the 'mapping' vector. More... | |
template | |
bool | Save (const std::string &filename, const arma::Mat< eT > &matrix, const bool fatal=false, bool transpose=true, arma::file_type inputSaveType=arma::auto_detect) |
Saves a matrix to file, guessing the filetype from the extension. More... | |
template | |
bool | Save (const std::string &filename, const arma::SpMat< eT > &matrix, const bool fatal=false, bool transpose=true) |
Saves a sparse matrix to file, guessing the filetype from the extension. More... | |
template | |
bool | Save (const std::string &filename, const std::string &name, T &t, const bool fatal=false, format f=format::autodetect) |
Saves a model to file, guessing the filetype from the extension, or, optionally, saving the specified format. More... | |
template | |
bool | Save (const std::string &filename, arma::Mat< eT > &matrix, ImageInfo &info, const bool fatal=false) |
Save the image file from the given matrix. More... | |
template | |
bool | Save (const std::vector< std::string > &files, arma::Mat< eT > &matrix, ImageInfo &info, const bool fatal=false) |
Save the image file from the given matrix. More... | |
bool | SaveImage (const std::string &filename, arma::Mat< unsigned char > &image, ImageInfo &info, const bool fatal=false) |
Helper function to save files. More... | |
template | |
void | Split (const arma::Mat< T > &input, const arma::Row< U > &inputLabel, arma::Mat< T > &trainData, arma::Mat< T > &testData, arma::Row< U > &trainLabel, arma::Row< U > &testLabel, const double testRatio, const bool shuffleData=true) |
Given an input dataset and labels, split into a training set and test set. More... | |
template | |
void | Split (const arma::Mat< T > &input, arma::Mat< T > &trainData, arma::Mat< T > &testData, const double testRatio, const bool shuffleData=true) |
Given an input dataset, split into a training set and test set. More... | |
template | |
std::tuple< arma::Mat< T >, arma::Mat< T >, arma::Row< U >, arma::Row< U > > | Split (const arma::Mat< T > &input, const arma::Row< U > &inputLabel, const double testRatio, const bool shuffleData=true) |
Given an input dataset and labels, split into a training set and test set. More... | |
template | |
std::tuple< arma::Mat< T >, arma::Mat< T > > | Split (const arma::Mat< T > &input, const double testRatio, const bool shuffleData=true) |
Given an input dataset, split into a training set and test set. More... | |
Functions to load and save matrices and models.
Functions to load and save matrices.
A convenient alias for the StringEncoding class with BagOfWordsEncodingPolicy and the default dictionary for the given token type.
TokenType | Type of the tokens. |
Definition at line 167 of file bag_of_words_encoding_policy.hpp.
typedef DatasetMapper< IncrementPolicy, std::string > DatasetInfo |
Definition at line 196 of file dataset_mapper.hpp.
A convenient alias for the StringEncoding class with DictionaryEncodingPolicy and the default dictionary for the given token type.
TokenType | Type of the tokens. |
Definition at line 146 of file dictionary_encoding_policy.hpp.
A convenient alias for the StringEncoding class with TfIdfEncodingPolicy and the default dictionary for the given token type.
TokenType | Type of the tokens. |
Definition at line 345 of file tf_idf_encoding_policy.hpp.
enum Datatype : bool |
The Datatype enum specifies the types of data mlpack algorithms can use.
The vast majority of mlpack algorithms can only use numeric data (i.e. float/double/etc.), but some algorithms can use categorical data, specified via this Datatype enum and the DatasetMapper class.
Enumerator | |
---|---|
numeric | |
categorical |
Definition at line 24 of file datatype.hpp.
enum format |
Define the formats we can read through boost::serialization.
Enumerator | |
---|---|
autodetect | |
text | |
xml | |
binary |
Definition at line 20 of file format.hpp.
arma::file_type mlpack::data::AutoDetect | ( | std::fstream & | stream, |
const std::string & | filename | ||
) |
Attempt to auto-detect the type of a file given its extension, and by inspecting the parts of the file to disambiguate between types when necessary.
(For instance, a .csv file could be delimited by spaces, commas, or tabs.) This is meant to be used during loading.
stream | Opened file stream to look into for autodetection. |
filename | Name of the file. |
void mlpack::data::Binarize | ( | const arma::Mat< T > & | input, |
arma::Mat< T > & | output, | ||
const double | threshold | ||
) |
Given an input dataset and threshold, set values greater than threshold to 1 and values less than or equal to the threshold to 0.
This overload applies the changes to all dimensions.
input | Input matrix to Binarize. |
output | Matrix you want to save binarized data into. |
threshold | Threshold can by any number. |
Definition at line 41 of file binarize.hpp.
References omp_size_t.
void mlpack::data::Binarize | ( | const arma::Mat< T > & | input, |
arma::Mat< T > & | output, | ||
const double | threshold, | ||
const size_t | dimension | ||
) |
Given an input dataset and threshold, set values greater than threshold to 1 and values less than or equal to the threshold to 0.
This overload takes a dimension and applys the changes to the given dimension.
input | Input matrix to Binarize. |
output | Matrix you want to save binarized data into. |
threshold | Threshold can by any number. |
dimension | Feature to apply the Binarize function. |
Definition at line 77 of file binarize.hpp.
References omp_size_t.
void mlpack::data::ConfusionMatrix | ( | const arma::Row< size_t > | predictors, |
const arma::Row< size_t > | responses, | ||
arma::Mat< eT > & | output, | ||
const size_t | numClasses | ||
) |
A confusion matrix is a summary of prediction results on a classification problem.
The number of correct and incorrect predictions are summarized by count and broken down by each class. For example, for 2 classes, the function call will be
In this case, the output matrix will be of size 2 * 2:
The confusion matrix for two labels will look like what is shown above. In this confusion matrix, TP represents the number of true positives, FP represents the number of false positives, FN represents the number of false negatives, and TN represents the number of true negatives.
When generalizing to 2 or more classes, the row index of the confusion matrix represents the predicted classes and column index represents the actual class.
predictors | Vector of data points. |
responses | The measured data for each point. |
output | Matrix which is represented as confusion matrix. |
numClasses | Number of classes. |
arma::file_type mlpack::data::DetectFromExtension | ( | const std::string & | filename | ) |
Return the type based only on the extension.
filename | Name of the file whose type we should detect. |
|
inline |
Definition at line 21 of file extension.hpp.
std::string mlpack::data::GetStringType | ( | const arma::file_type & | type | ) |
Given a file type, return a logical name corresponding to that file type.
type | Type to get the logical name of. |
arma::file_type mlpack::data::GuessFileType | ( | std::istream & | f | ) |
Given an istream, attempt to guess the file type.
This is taken originally from Armadillo's function guess_file_type_internal(), but we avoid using internal Armadillo functionality.
f | Opened istream to look into to guess the file type. |
mlpack::data::HAS_EXACT_METHOD_FORM | ( | serialize | , |
HasSerializeCheck | |||
) |
|
inline |
Checks if the given image filename is supported.
fileName | Name of the image file. |
save | Set to true to check if the file format can be saved, else loaded. |
|
inline |
See if the token is a NaN or an Inf, and if so, set the value accordingly and return a boolean representing whether or not it is.
Definition at line 27 of file is_naninf.hpp.
bool mlpack::data::Load | ( | const std::string & | filename, |
arma::Mat< eT > & | matrix, | ||
const bool | fatal = false , |
||
const bool | transpose = true , |
||
const arma::file_type | inputLoadType = arma::auto_detect |
||
) |
Loads a matrix from file, guessing the filetype from the extension.
This will transpose the matrix at load time (unless the transpose parameter is set to false).
The supported types of files are the same as found in Armadillo:
By default, this function will try to automatically determine the type of file to load based on its extension and by inspecting the file. If you know the file type and want to specify it manually, override the default inputLoadType
parameter with the correct type above (e.g. arma::csv_ascii
.)
If the parameter 'fatal' is set to true, a std::runtime_error exception will be thrown if the matrix does not load successfully. The parameter 'transpose' controls whether or not the matrix is transposed after loading. In most cases, because data is generally stored in a row-major format and mlpack requires column-major matrices, this should be left at its default value of 'true'.
filename | Name of file to load. |
matrix | Matrix to load contents of file into. |
fatal | If an error should be reported as fatal (default false). |
transpose | If true, transpose the matrix after loading (default true). |
inputLoadType | Used to determine the type of file to load (default arma::auto_detect). |
Referenced by mlpack::bindings::cli::GetParam().
bool mlpack::data::Load | ( | const std::string & | filename, |
arma::SpMat< eT > & | matrix, | ||
const bool | fatal = false , |
||
const bool | transpose = true |
||
) |
Loads a sparse matrix from file, using arma::coord_ascii format.
This will transpose the matrix at load time (unless the transpose parameter is set to false). If the filetype cannot be determined, an error will be given.
The supported types of files are the same as found in Armadillo:
If the file extension is not one of those types, an error will be given. This is preferable to Armadillo's default behavior of loading an unknown filetype as raw_binary, which can have very confusing effects.
If the parameter 'fatal' is set to true, a std::runtime_error exception will be thrown if the matrix does not load successfully. The parameter 'transpose' controls whether or not the matrix is transposed after loading. In most cases, because data is generally stored in a row-major format and mlpack requires column-major matrices, this should be left at its default value of 'true'.
filename | Name of file to load. |
matrix | Sparse matrix to load contents of file into. |
fatal | If an error should be reported as fatal (default false). |
transpose | If true, transpose the matrix after loading (default true). |
bool mlpack::data::Load | ( | const std::string & | filename, |
arma::Col< eT > & | vec, | ||
const bool | fatal = false |
||
) |
Don't document these with doxygen; these declarations aren't helpful to users.
Load a column vector from a file, guessing the filetype from the extension.
The supported types of files are the same as found in Armadillo:
If the file extension is not one of those types, an error will be given. This is preferable to Armadillo's default behavior of loading an unknown filetype as raw_binary, which can have very confusing effects.
If the parameter 'fatal' is set to true, a std::runtime_error exception will be thrown if the matrix does not load successfully.
filename | Name of file to load. |
vec | Column vector to load contents of file into. |
fatal | If an error should be reported as fatal (default false). |
bool mlpack::data::Load | ( | const std::string & | filename, |
arma::Row< eT > & | rowvec, | ||
const bool | fatal = false |
||
) |
Load a row vector from a file, guessing the filetype from the extension.
The supported types of files are the same as found in Armadillo:
If the file extension is not one of those types, an error will be given. This is preferable to Armadillo's default behavior of loading an unknown filetype as raw_binary, which can have very confusing effects.
If the parameter 'fatal' is set to true, a std::runtime_error exception will be thrown if the matrix does not load successfully.
filename | Name of file to load. |
rowvec | Row vector to load contents of file into. |
fatal | If an error should be reported as fatal (default false). |
bool mlpack::data::Load | ( | const std::string & | filename, |
arma::Mat< eT > & | matrix, | ||
DatasetMapper< PolicyType > & | info, | ||
const bool | fatal = false , |
||
const bool | transpose = true |
||
) |
Loads a matrix from a file, guessing the filetype from the extension and mapping categorical features with a DatasetMapper object.
This will transpose the matrix (unless the transpose parameter is set to false). This particular overload of Load() can only load text-based formats, such as those given below:
If the file extension is not one of those types, an error will be given. This is preferable to Armadillo's default behavior of loading an unknown filetype as raw_binary, which can have very confusing effects.
If the parameter 'fatal' is set to true, a std::runtime_error exception will be thrown if the matrix does not load successfully. The parameter 'transpose' controls whether or not the matrix is transposed after loading. In most cases, because data is generally stored in a row-major format and mlpack requires column-major matrices, this should be left at its default value of 'true'.
The DatasetMapper object passed to this function will be re-created, so any mappings from previous loads will be lost.
filename | Name of file to load. |
matrix | Matrix to load contents of file into. |
info | DatasetMapper object to populate with mappings and data types. |
fatal | If an error should be reported as fatal (default false). |
transpose | If true, transpose the matrix after loading. |
bool mlpack::data::Load | ( | const std::string & | filename, |
const std::string & | name, | ||
T & | t, | ||
const bool | fatal = false , |
||
format | f = format::autodetect |
||
) |
Don't document these with doxygen; they aren't helpful for users to know about.
Load a model from a file, guessing the filetype from the extension, or, optionally, loading the specified format. If automatic extension detection is used and the filetype cannot be determined, an error will be given.
The supported types of files are the same as what is supported by the boost::serialization library:
The format parameter can take any of the values in the 'format' enum: 'format::autodetect', 'format::text', 'format::xml', and 'format::binary'. The autodetect functionality operates on the file extension (so, "file.txt" would be autodetected as text).
The name parameter should be specified to indicate the name of the structure to be loaded. This should be the same as the name that was used to save the structure (otherwise, the loading procedure will fail).
If the parameter 'fatal' is set to true, then an exception will be thrown in the event of load failure. Otherwise, the method will return false and the relevant error information will be printed to Log::Warn.
bool mlpack::data::Load | ( | const std::string & | filename, |
arma::Mat< eT > & | matrix, | ||
ImageInfo & | info, | ||
const bool | fatal = false |
||
) |
Image load/save interfaces.
Load the image file into the given matrix.
filename | Name of the image file. |
matrix | Matrix to load the image into. |
info | An object of ImageInfo class. |
fatal | If an error should be reported as fatal (default false). |
bool mlpack::data::Load | ( | const std::vector< std::string > & | files, |
arma::Mat< eT > & | matrix, | ||
ImageInfo & | info, | ||
const bool | fatal = false |
||
) |
Load the image file into the given matrix.
files | A vector consisting of filenames. |
matrix | Matrix to save the image from. |
info | An object of ImageInfo class. |
fatal | If an error should be reported as fatal (default false). |
void mlpack::data::LoadARFF | ( | const std::string & | filename, |
arma::Mat< eT > & | matrix | ||
) |
A utility function to load an ARFF dataset as numeric features (that is, as an Armadillo matrix without any modification).
An exception will be thrown if any features are non-numeric.
void mlpack::data::LoadARFF | ( | const std::string & | filename, |
arma::Mat< eT > & | matrix, | ||
DatasetMapper< PolicyType > & | info | ||
) |
A utility function to load an ARFF dataset as numeric and categorical features, using the DatasetInfo structure for mapping.
An exception will be thrown upon failure.
A pre-existing DatasetInfo object can be passed in, but if the dimensionality of the given DatasetInfo object (info.Dimensionality()) does not match the dimensionality of the data, a std::invalid_argument exception will be thrown. If an empty DatasetInfo object is given (constructed with the default constructor or otherwise, so that info.Dimensionality() is 0), it will be set to the right dimensionality.
This ability to pass in pre-existing DatasetInfo objects is very necessary when, e.g., loading a test set after training. If the same DatasetInfo from loading the training set is not used, then the test set may be loaded with different mappings—which can cause horrible problems!
filename | Name of ARFF file to load. |
matrix | Matrix to load data into. |
info | DatasetInfo object; can be default-constructed or pre-existing from another call to LoadARFF(). |
bool mlpack::data::LoadImage | ( | const std::string & | filename, |
arma::Mat< unsigned char > & | matrix, | ||
ImageInfo & | info, | ||
const bool | fatal = false |
||
) |
void mlpack::data::NormalizeLabels | ( | const RowType & | labelsIn, |
arma::Row< size_t > & | labels, | ||
arma::Col< eT > & | mapping | ||
) |
Given a set of labels of a particular datatype, convert them to unsigned labels in the range [0, n) where n is the number of different labels.
Also, a reverse mapping from the new label to the old value is stored in the 'mapping' vector.
labelsIn | Input labels of arbitrary datatype. |
labels | Vector that unsigned labels will be stored in. |
mapping | Reverse mapping to convert new labels back to old labels. |
void mlpack::data::OneHotEncoding | ( | const RowType & | labelsIn, |
MatType & | output | ||
) |
Given a set of labels of a particular datatype, convert them to binary vector.
The categorical values be mapped to integer values. Then, each integer value is represented as a binary vector that is all zero values except the index of the integer, which is marked with a 1.
labelsIn | Input labels of arbitrary datatype. |
output | Binary matrix. |
void mlpack::data::OneHotEncoding | ( | const arma::Mat< eT > & | input, |
const arma::Col< size_t > & | indices, | ||
arma::Mat< eT > & | output | ||
) |
Overloaded function for the above function, which takes a matrix as input and also a vector of indices to encode and outputs a matrix.
Indices represent the IDs of the dimensions to be one-hot encoded.
input | Input dataset to be encoded. |
indices | Index of rows to be encoded. |
output | Encoded matrix. |
void mlpack::data::OneHotEncoding | ( | const arma::Mat< eT > & | input, |
arma::Mat< eT > & | output, | ||
const data::DatasetInfo & | datasetInfo | ||
) |
Overloaded function for the above function, which takes a matrix as input and also a DatasetInfo object and outputs a matrix.
This function encodes all the dimensions marked Datatype::categorical
in the data::DatasetInfo.
input | Input dataset to be encoded. |
output | Encoded matrix. |
datasetInfo | DatasetInfo object that has information about data. |
void mlpack::data::RevertLabels | ( | const arma::Row< size_t > & | labels, |
const arma::Col< eT > & | mapping, | ||
arma::Row< eT > & | labelsOut | ||
) |
Given a set of labels that have been mapped to the range [0, n), map them back to the original labels given by the 'mapping' vector.
labels | Set of normalized labels to convert. |
mapping | Mapping to use to convert labels. |
labelsOut | Vector to store new labels in. |
bool mlpack::data::Save | ( | const std::string & | filename, |
const arma::Mat< eT > & | matrix, | ||
const bool | fatal = false , |
||
bool | transpose = true , |
||
arma::file_type | inputSaveType = arma::auto_detect |
||
) |
Saves a matrix to file, guessing the filetype from the extension.
This will transpose the matrix at save time. If the filetype cannot be determined, an error will be given.
The supported types of files are the same as found in Armadillo:
By default, this function will try to automatically determine the format to save with based only on the filename's extension. If you would prefer to specify a file type manually, override the default inputSaveType
parameter with the correct type above (e.g. arma::csv_ascii
.)
If the 'fatal' parameter is set to true, a std::runtime_error exception will be thrown upon failure. If the 'transpose' parameter is set to true, the matrix will be transposed before saving. Generally, because mlpack stores matrices in a column-major format and most datasets are stored on disk as row-major, this parameter should be left at its default value of 'true'.
filename | Name of file to save to. |
matrix | Matrix to save into file. |
fatal | If an error should be reported as fatal (default false). |
transpose | If true, transpose the matrix before saving (default true). |
inputSaveType | File type to save to (defaults to arma::auto_detect). |
bool mlpack::data::Save | ( | const std::string & | filename, |
const arma::SpMat< eT > & | matrix, | ||
const bool | fatal = false , |
||
bool | transpose = true |
||
) |
Saves a sparse matrix to file, guessing the filetype from the extension.
This will transpose the matrix at save time. If the filetype cannot be determined, an error will be given.
The supported types of files are the same as found in Armadillo:
If the file extension is not one of those types, an error will be given. If the 'fatal' parameter is set to true, a std::runtime_error exception will be thrown upon failure. If the 'transpose' parameter is set to true, the matrix will be transposed before saving. Generally, because mlpack stores matrices in a column-major format and most datasets are stored on disk as row-major, this parameter should be left at its default value of 'true'.
filename | Name of file to save to. |
matrix | Sparse matrix to save into file. |
fatal | If an error should be reported as fatal (default false). |
transpose | If true, transpose the matrix before saving (default true). |
bool mlpack::data::Save | ( | const std::string & | filename, |
const std::string & | name, | ||
T & | t, | ||
const bool | fatal = false , |
||
format | f = format::autodetect |
||
) |
Saves a model to file, guessing the filetype from the extension, or, optionally, saving the specified format.
If automatic extension detection is used and the filetype cannot be determined, and error will be given.
The supported types of files are the same as what is supported by the boost::serialization library:
The format parameter can take any of the values in the 'format' enum: 'format::autodetect', 'format::text', 'format::xml', and 'format::binary'. The autodetect functionality operates on the file extension (so, "file.txt" would be autodetected as text).
The name parameter should be specified to indicate the name of the structure to be saved. If Load() is later called on the generated file, the name used to load should be the same as the name used for this call to Save().
If the parameter 'fatal' is set to true, then an exception will be thrown in the event of a save failure. Otherwise, the method will return false and the relevant error information will be printed to Log::Warn.
bool mlpack::data::Save | ( | const std::string & | filename, |
arma::Mat< eT > & | matrix, | ||
ImageInfo & | info, | ||
const bool | fatal = false |
||
) |
Save the image file from the given matrix.
filename | Name of the image file. |
matrix | Matrix to save the image from. |
info | An object of ImageInfo class. |
fatal | If an error should be reported as fatal (default false). |
bool mlpack::data::Save | ( | const std::vector< std::string > & | files, |
arma::Mat< eT > & | matrix, | ||
ImageInfo & | info, | ||
const bool | fatal = false |
||
) |
Save the image file from the given matrix.
files | A vector consisting of filenames. |
matrix | Matrix to save the image from. |
info | An object of ImageInfo class. |
fatal | If an error should be reported as fatal (default false). |
bool mlpack::data::SaveImage | ( | const std::string & | filename, |
arma::Mat< unsigned char > & | image, | ||
ImageInfo & | info, | ||
const bool | fatal = false |
||
) |
Helper function to save files.
Implementation in save_image.cpp.
void mlpack::data::Split | ( | const arma::Mat< T > & | input, |
const arma::Row< U > & | inputLabel, | ||
arma::Mat< T > & | trainData, | ||
arma::Mat< T > & | testData, | ||
arma::Row< U > & | trainLabel, | ||
arma::Row< U > & | testLabel, | ||
const double | testRatio, | ||
const bool | shuffleData = true |
||
) |
Given an input dataset and labels, split into a training set and test set.
Example usage below. This overload places the split dataset into the four output parameters given (trainData, testData, trainLabel, and testLabel).
input | Input dataset to split. |
inputLabel | Input labels to split. |
trainData | Matrix to store training data into. |
testData | Matrix to store test data into. |
trainLabel | Vector to store training labels into. |
testLabel | Vector to store test labels into. |
testRatio | Percentage of dataset to use for test set (between 0 and 1). |
shuffleData | If true, the sample order is shuffled; otherwise, each sample is visited in linear order. (Default true.) |
Definition at line 51 of file split_data.hpp.
Referenced by Split().
void mlpack::data::Split | ( | const arma::Mat< T > & | input, |
arma::Mat< T > & | trainData, | ||
arma::Mat< T > & | testData, | ||
const double | testRatio, | ||
const bool | shuffleData = true |
||
) |
Given an input dataset, split into a training set and test set.
Example usage below. This overload places the split dataset into the two output parameters given (trainData, testData).
input | Input dataset to split. |
trainData | Matrix to store training data into. |
testData | Matrix to store test data into. |
testRatio | Percentage of dataset to use for test set (between 0 and 1). |
shuffleData | If true, the sample order is shuffled; otherwise, each sample is visited in linear order. (Default true). |
Definition at line 121 of file split_data.hpp.
std::tuple |
( | const arma::Mat< T > & | input, |
const arma::Row< U > & | inputLabel, | ||
const double | testRatio, | ||
const bool | shuffleData = true |
||
) |
Given an input dataset and labels, split into a training set and test set.
Example usage below. This overload returns the split dataset as a std::tuple with four elements: an arma::Mat
input | Input dataset to split. |
inputLabel | Input labels to split. |
testRatio | Percentage of dataset to use for test set (between 0 and 1). |
shuffleData | If true, the sample order is shuffled; otherwise, each sample is visited in linear order. (Default true). |
Definition at line 176 of file split_data.hpp.
References Split().
std::tuple |
( | const arma::Mat< T > & | input, |
const double | testRatio, | ||
const bool | shuffleData = true |
||
) |
Given an input dataset, split into a training set and test set.
Example usage below. This overload returns the split dataset as a std::tuple with two elements: an arma::Mat
input | Input dataset to split. |
testRatio | Percentage of dataset to use for test set (between 0 and 1). |
shuffleData | If true, the sample order is shuffled; otherwise, each sample is visited in linear order. (Default true). |
Definition at line 215 of file split_data.hpp.
References Split().