>>> from mlpack import decision_stump
This program implements a decision stump, which is a single-level decision tree. The decision stump will split on one dimension of the input data, and will split into multiple buckets. The dimension and bins are selected by maximizing the information gain of the split. Optionally, the minimum number of training points in each bin can be specified with the 'bucket_size' parameter.
The decision stump is parameterized by a splitting dimension and a vector of values that denote the splitting values of each bin.
This program enables several applications: a decision tree may be trained or loaded, and then that decision tree may be used to classify a given set of test points. The decision tree may also be saved to a file for later usage.
To train a decision stump, training data should be passed with the 'training' parameter, and their corresponding labels should be passed with the 'labels' option. Optionally, if 'labels' is not specified, the labels are assumed to be the last dimension of the training dataset. The 'bucket_size' parameter controls the minimum number of training points in each decision stump bucket.
For classifying a test set, a decision stump may be loaded with the 'input_model' parameter (useful for the situation where a stump has already been trained), and a test set may be specified with the 'test' parameter. The predicted labels can be saved with the 'predictions' output parameter.
Because decision stumps are trained in batch, retraining does not make sense and thus it is not possible to pass both 'training' and 'input_model'; instead, simply build a new decision stump with the training data.
After training, a decision stump can be saved with the 'output_model' output parameter. That stump may later be re-used in subsequent calls to this program (or others).
- bucket_size (int): The minimum number of training points in each decision stump bucket. Default value 6.
- copy_all_inputs (bool): If specified, all input parameters will be deep copied before the method is run. This is useful for debugging problems where the input parameters are being modified by the algorithm, but can slow down the code.
- input_model (mlpack.DSModelType): Decision stump model to load.
- labels (numpy vector or array, int/long dtype): Labels for the training set. If not specified, the labels are assumed to be the last row of the training data.
- test (numpy matrix or arraylike, float dtype): A dataset to calculate predictions for.
- training (numpy matrix or arraylike, float dtype): The dataset to train on.
- verbose (bool): Display informational messages and the full list of parameters and timers at the end of execution.
The return value from the binding is a dict containing the following elements:
- output_model (mlpack.DSModelType): Output decision stump model to save.
- predictions (numpy vector, int dtype): The output matrix that will hold the predicted labels for the test set.