Binarize Data

>>> from mlpack import preprocess_binarize

This utility takes a dataset and binarizes the variables into either 0 or 1 given threshold. User can apply binarization on a dimension or the whole dataset. The dimension to apply binarization to can be specified using the 'dimension' parameter; if left unspecified, every dimension will be binarized. The threshold for binarization can also be specified with the 'threshold' parameter; the default threshold is 0.0.

The binarized matrix may be saved with the 'output' output parameter.

For example, if we want to set all variables greater than 5 in the dataset 'X' to 1 and variables less than or equal to 5.0 to 0, and save the result to 'Y', we could run

>>> preprocess_binarize(input=X, threshold=5)
>>> Y = output['output']

But if we want to apply this to only the first (0th) dimension of 'X', we could instead run

>>> preprocess_binarize(input=X, threshold=5, dimension=0)
>>> Y = output['output']

input options

output options

The return value from the binding is a dict containing the following elements: