mlpack.det

det(...)Density Estimation With Density Estimation Trees

>>> from mlpack import det

This program performs a number of functions related to Density Estimation Trees. The optimal Density Estimation Tree (DET) can be trained on a set of data (specified by 'training') using cross-validation (with number of folds specified with the 'folds' parameter). This trained density estimation tree may then be saved with the 'output_model' output parameter.

The variable importances (that is, the feature importance values for each dimension) may be saved with the 'vi' output parameter, and the density estimates for each training point may be saved with the 'training_set_estimates' output parameter.

Enabling path printing for each node outputs the path from the root node to a leaf for each entry in the test set, or training set (if a test set is not provided). Strings like 'LRLRLR' (indicating that traversal went to the left child, then the right child, then the left child, and so forth) will be output. If 'lr-id' or 'id-lr' are given as the 'path_format' parameter, then the ID (tag) of every node along the path will be printed after or before the L or R character indicating the direction of traversal, respectively.

This program also can provide density estimates for a set of test points, specified in the 'test' parameter. The density estimation tree used for this task will be the tree that was trained on the given training points, or a tree given as the parameter 'input_model'. The density estimates for the test points may be saved using the 'test_set_estimates' output parameter.

## input options

- copy_all_inputs (bool): If specified, all input parameters will be deep copied before the method is run. This is useful for debugging problems where the input parameters are being modified by the algorithm, but can slow down the code.
- folds (int): The number of folds of cross-validation to perform for the estimation (0 is LOOCV) Default value 10.
- input_model (mlpack.DTreeType): Trained density estimation tree to load.
- max_leaf_size (int): The maximum size of a leaf in the unpruned, fully grown DET. Default value 10.
- min_leaf_size (int): The minimum size of a leaf in the unpruned, fully grown DET. Default value 5.
- path_format (string): The format of path printing: 'lr', 'id-lr', or 'lr-id'. Default value lr.
- skip_pruning (bool): Whether to bypass the pruning process and output the unpruned tree only.
- test (numpy matrix or arraylike, float dtype): A set of test points to estimate the density of.
- training (numpy matrix or arraylike, float dtype): The data set on which to build a density estimation tree.
- verbose (bool): Display informational messages and the full list of parameters and timers at the end of execution.

## output options

The return value from the binding is a dict containing the following elements:

- output_model (mlpack.DTreeType): Output to save trained density estimation tree to.
- tag_counters_file (string): The file to output the number of points that went to each leaf.
- tag_file (string): The file to output the tags (and possibly paths) for each sample in the test set.
- test_set_estimates (numpy matrix, float dtype): The output estimates on the test set from the final optimally pruned tree.
- training_set_estimates (numpy matrix, float dtype): The output density estimates on the training set from the final optimally pruned tree.
- vi (numpy matrix, float dtype): The output variable importance values for each feature.