mlpack.knn

knn(...)k-Nearest-Neighbors Search

>>> from mlpack import knn

This program will calculate the k-nearest-neighbors of a set of points using kd-trees or cover trees (cover tree support is experimental and may be slow). You may specify a separate set of reference points and query points, or just a reference set which will be used as both the reference and query set.

For example, the following command will calculate the 5 nearest neighbors of each point in 'input' and store the distances in 'distances' and the neighbors in 'neighbors':

>>> knn(k=5, reference=input)

>>> neighbors = output['neighbors']

The output files are organized such that row i and column j in the neighbors output matrix corresponds to the index of the point in the reference set which is the j'th nearest neighbor from the point in the query set with index i. Row j and column i in the distances output matrix corresponds to the distance between those two points.

## input options

- algorithm (string): Type of neighbor search: 'naive', 'single_tree', 'dual_tree', 'greedy'. Default value dual_tree.
- copy_all_inputs (bool): If specified, all input parameters will be deep copied before the method is run. This is useful for debugging problems where the input parameters are being modified by the algorithm, but can slow down the code.
- epsilon (float): If specified, will do approximate nearest neighbor search with given relative error. Default value 0.
- input_model (mlpack.KNNModelType): Pre-trained kNN model.
- k (int): Number of nearest neighbors to find. Default value 0.
- leaf_size (int): Leaf size for tree building (used for kd-trees, vp trees, random projection trees, UB trees, R trees, R* trees, X trees, Hilbert R trees, R+ trees, R++ trees, spill trees, and octrees). Default value 20.
- query (numpy matrix or arraylike, float dtype): Matrix containing query points (optional).
- random_basis (bool): Before tree-building, project the data onto a random orthogonal basis.
- reference (numpy matrix or arraylike, float dtype): Matrix containing the reference dataset.
- rho (float): Balance threshold (only valid for spill trees). Default value 0.7.
- seed (int): Random seed (if 0, std::time(NULL) is used). Default value 0.
- tau (float): Overlapping size (only valid for spill trees). Default value 0.
- tree_type (string): Type of tree to use: 'kd', 'vp', 'rp', 'max-rp', 'ub', 'cover', 'r', 'r-star', 'x', 'ball', 'hilbert-r', 'r-plus', 'r-plus-plus', 'spill', 'oct'. Default value kd.
- true_distances (numpy matrix or arraylike, float dtype): Matrix of true distances to compute the effective error (average relative error) (it is printed when -v is specified).
- true_neighbors (numpy matrix or arraylike, int/long dtype): Matrix of true neighbors to compute the recall (it is printed when -v is specified).
- verbose (bool): Display informational messages and the full list of parameters and timers at the end of execution.

## output options

The return value from the binding is a dict containing the following elements:

- distances (numpy matrix, float dtype): Matrix to output distances into.
- neighbors (numpy matrix, int dtype): Matrix to output neighbors into.
- output_model (mlpack.KNNModelType): If specified, the kNN model will be output here.