mlpack.preprocess_split

preprocess_split(...)
Split Data

>>> from mlpack import preprocess_split

This utility takes a dataset and optionally labels and splits them into a training set and a test set. Before the split, the points in the dataset are randomly reordered. The percentage of the dataset to be used as the test set can be specified with the 'test_ratio' parameter; the default is 0.2 (20%).

The output training and test matrices may be saved with the 'training' and 'test' output parameters.

Optionally, labels can be also be split along with the data by specifying the 'input_labels' parameter. Splitting labels works the same way as splitting the data. The output training and test labels may be saved with the 'training_labels' and 'test_labels' output parameters, respectively.

So, a simple example where we want to split the dataset 'X' into 'X_train' and 'X_test' with 60% of the data in the training set and 40% of the dataset in the test set, we could run

>>> preprocess_split(input=X, test_ratio=0.4)
>>> X_train = output['training']
>>> X_test = output['test']

If we had a dataset 'X' and associated labels 'y', and we wanted to split these into 'X_train', 'y_train', 'X_test', and 'y_test', with 30% of the data in the test set, we could run

>>> preprocess_split(input=X, input_labels=y, test_ratio=0.3)
>>> X_train = output['training']
>>> y_train = output['training_labels']
>>> X_test = output['test']
>>> y_test = output['test_labels']

input options

output options

The return value from the binding is a dict containing the following elements: