mlpack_preprocess_split

NAME

mlpack_preprocess_split - split data

SYNOPSIS

mlpack_preprocess_split [-h] [-v]

DESCRIPTION

This utility takes a dataset and optionally labels and splits them into a training set and a test set. Before the split, the points in the dataset are randomly reordered. The percentage of the dataset to be used as the test set can be specified with the ’--test_ratio (-r)’ parameter; the default is 0.2 (20 ).

The output training and test matrices may be saved with the ’--training_file (-t)’ and ’--test_file (-T)’ output parameters.

Optionally, labels can be also be split along with the data by specifying the ’--input_labels_file (-I)’ parameter. Splitting labels works the same way as splitting the data. The output training and test labels may be saved with the ’--training_labels_file (-l)’ and ’--test_labels_file (-L)’ output parameters, respectively.

So, a simple example where we want to split the dataset ’X.csv’ into ’X_train.csv’ and ’X_test.csv’ with 60 of the data in the training set and 40 of the dataset in the test set, we could run

$ preprocess_split --input_file X.csv --training_file X_train.csv --test_file X_test.csv --test_ratio 0.4

If we had a dataset ’X.csv’ and associated labels ’y.csv’, and we wanted to split these into ’X_train.csv’, ’y_train.csv’, ’X_test.csv’, and ’y_test.csv’, with 30 of the data in the test set, we could run

$ preprocess_split --input_file X.csv --input_labels_file y.csv --test_ratio 0.3 --training_file X_train.csv --training_labels_file y_train.csv --test_file X_test.csv --test_labels_file y_test.csv

REQUIRED INPUT OPTIONS

--input_file (-i) [string]

Matrix containing data.

OPTIONAL INPUT OPTIONS

--help (-h) [bool]

Default help info.

--info [string]

Get help on a specific module or option. Default value ’’. --input_labels_file (-I) [string] Matrix containing labels. Default value ’’.

--seed (-s) [int]

Random seed (0 for std::time(NULL)). Default value 0.

--test_ratio (-r) [double]

Ratio of test set; if not set,the ratio defaults to 0.2 Default value 0.2.

--verbose (-v) [bool]

Display informational messages and the full list of parameters and timers at the end of execution.

--version (-V) [bool]

Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

--test_file (-T) [string]

Matrix to save test data to. Default value ’’. --test_labels_file (-L) [string] Matrix to save test labels to. Default value ’’. --training_file (-t) [string] Matrix to save training data to. Default value ’’. --training_labels_file (-l) [string] Matrix to save train labels to. Default value ’’.

ADDITIONAL INFORMATION

ADDITIONAL INFORMATION

For further information, including relevant papers, citations, and theory, For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your consult the documentation found at http://www.mlpack.org or included with your DISTRIBUTION OF MLPACK. DISTRIBUTION OF MLPACK.