mlpack: a scalable c++ machine learning library
mlpack  2.0.2
mlpack::data::DatasetInfo Class Reference

Auxiliary information for a dataset, including mappings to/from strings and the datatype of each dimension. More...

Public Member Functions

 DatasetInfo (const size_t dimensionality=0)
 Create the DatasetInfo object with the given dimensionality. More...

 
size_t Dimensionality () const
 Get the dimensionality of the DatasetInfo object (that is, how many dimensions it has information for). More...

 
size_t MapString (const std::string &string, const size_t dimension)
 Given the string and the dimension to which it belongs, return its numeric mapping. More...

 
size_t NumMappings (const size_t dimension) const
 Get the number of mappings for a particular dimension. More...

 
template
<
typename
Archive
>
void Serialize (Archive &ar, const unsigned int)
 Serialize the dataset information. More...

 
Datatype Type (const size_t dimension) const
 Return the type of a given dimension (numeric or categorical). More...

 
DatatypeType (const size_t dimension)
 Modify the type of a given dimension (be careful!). More...

 
const std::string & UnmapString (const size_t value, const size_t dimension)
 Return the string that corresponds to a given value in a given dimension. More...

 

Private Attributes

std::unordered_map< size_t, std::pair< boost::bimap< std::string, size_t >, size_t > > maps
 Mappings from strings to integers. More...

 
std::vector< Datatypetypes
 Types of each dimension. More...

 

Detailed Description

Auxiliary information for a dataset, including mappings to/from strings and the datatype of each dimension.

DatasetInfo objects are optionally produced by data::Load(), and store the type of each dimension (Datatype::numeric or Datatype::categorical) as well as mappings from strings to unsigned integers and vice versa.

Definition at line 45 of file dataset_info.hpp.

Constructor & Destructor Documentation

◆ DatasetInfo()

mlpack::data::DatasetInfo::DatasetInfo ( const size_t  dimensionality = 0)

Create the DatasetInfo object with the given dimensionality.

Note that the dimensionality cannot be changed later; you will have to create a new DatasetInfo object.

Member Function Documentation

◆ Dimensionality()

size_t mlpack::data::DatasetInfo::Dimensionality ( ) const

Get the dimensionality of the DatasetInfo object (that is, how many dimensions it has information for).

If this object was created by a call to mlpack::data::Load(), then the dimensionality will be the same as the number of rows (dimensions) in the dataset.

◆ MapString()

size_t mlpack::data::DatasetInfo::MapString ( const std::string &  string,
const size_t  dimension 
)

Given the string and the dimension to which it belongs, return its numeric mapping.

If no mapping yet exists, the string is added to the list of mappings for the given dimension. The dimension parameter refers to the index of the dimension of the string (i.e. the row in the dataset).

Parameters
stringString to find/create mapping for.
dimensionIndex of the dimension of the string.

◆ NumMappings()

size_t mlpack::data::DatasetInfo::NumMappings ( const size_t  dimension) const

Get the number of mappings for a particular dimension.

If the dimension is numeric, then this will return 0.

◆ Serialize()

template
<
typename
Archive
>
void mlpack::data::DatasetInfo::Serialize ( Archive &  ar,
const unsigned  int 
)
inline

Serialize the dataset information.

Definition at line 99 of file dataset_info.hpp.

References mlpack::data::CreateNVP(), maps, and types.

◆ Type() [1/2]

Datatype mlpack::data::DatasetInfo::Type ( const size_t  dimension) const

Return the type of a given dimension (numeric or categorical).

◆ Type() [2/2]

Datatype& mlpack::data::DatasetInfo::Type ( const size_t  dimension)

Modify the type of a given dimension (be careful!).

◆ UnmapString()

const std::string& mlpack::data::DatasetInfo::UnmapString ( const size_t  value,
const size_t  dimension 
)

Return the string that corresponds to a given value in a given dimension.

If the string is not a valid mapping in the given dimension, a std::invalid_argument is thrown.

Parameters
valueMapped value for string.
dimensionDimension to unmap string from.

Member Data Documentation

◆ maps

std::unordered_map<size_t, std::pair<boost::bimap<std::string, size_t>, size_t> > mlpack::data::DatasetInfo::maps
private

Mappings from strings to integers.

Map entries will only exist for dimensions that are categorical.

Definition at line 112 of file dataset_info.hpp.

Referenced by Serialize().

◆ types

std::vector<Datatype> mlpack::data::DatasetInfo::types
private

Types of each dimension.

Definition at line 107 of file dataset_info.hpp.

Referenced by Serialize().


The documentation for this class was generated from the following file: