mlpack.nmf

nmf(...)Non-negative Matrix Factorization

>>> from mlpack import nmf

This program performs non-negative matrix factorization on the given dataset, storing the resulting decomposed matrices in the specified files. For an input dataset V, NMF decomposes V into two matrices W and H such that

V = W * H

where all elements in W and H are non-negative. If V is of size (n x m), then W will be of size (n x r) and H will be of size (r x m), where r is the rank of the factorization (specified by the 'rank' parameter).

Optionally, the desired update rules for each NMF iteration can be chosen from the following list:

- multdist: multiplicative distance-based update rules (Lee and Seung 1999)

- multdiv: multiplicative divergence-based update rules (Lee and Seung 1999)

- als: alternating least squares update rules (Paatero and Tapper 1994)

The maximum number of iterations is specified with 'max_iterations', and the minimum residue required for algorithm termination is specified with the 'min_residue' parameter.

For example, to run NMF on the input matrix 'V' using the 'multdist' update rules with a rank-10 decomposition and storing the decomposed matrices into 'W' and 'H', the following command could be used:

>>> nmf(input=V, rank=10, update_rules='multdist')

>>> W = output['w']

>>> H = output['h']

## input options

- input (numpy matrix or arraylike, float dtype): [required] Input dataset to perform NMF on.
- rank (int): [required] Rank of the factorization. Default value 0.
- copy_all_inputs (bool): If specified, all input parameters will be deep copied before the method is run. This is useful for debugging problems where the input parameters are being modified by the algorithm, but can slow down the code.
- initial_h (numpy matrix or arraylike, float dtype): Initial H matrix.
- initial_w (numpy matrix or arraylike, float dtype): Initial W matrix.
- max_iterations (int): Number of iterations before NMF terminates (0 runs until convergence. Default value 10000.
- min_residue (float): The minimum root mean square residue allowed for each iteration, below which the program terminates. Default value 1e-05.
- seed (int): Random seed. If 0, 'std::time(NULL)' is used. Default value 0.
- update_rules (string): Update rules for each iteration; ( multdist | multdiv | als ). Default value multdist.
- verbose (bool): Display informational messages and the full list of parameters and timers at the end of execution.

## output options

The return value from the binding is a dict containing the following elements:

- h (numpy matrix, float dtype): Matrix to save the calculated H to.
- w (numpy matrix, float dtype): Matrix to save the calculated W to.