# mlpack_logistic_regression

## NAME

mlpack_logistic_regression - l2-regularized logistic regression and prediction

## SYNOPSIS

mlpack_logistic_regression [-h] [-v]

## DESCRIPTION

An implementation of L2-regularized logistic regression using either the L-BFGS optimizer or SGD (stochastic gradient descent). This solves the regression problem

y = (1 / 1 + e^-(X * b))

where y takes values 0 or 1.

This program allows loading a logistic regression model from a file (-i) or training a logistic regression model given training data (-t), or both those things at once. In addition, this program allows classification on a test dataset (-T) and will save the classification results to the given output file (-o). The logistic regression model itself may be saved with a file specified using the -m option.

The training data given with the -t option should have class labels as its last dimension (so, if the training data is in CSV format, labels should be the last column). Alternately, the -l (--labels_file) option may be used to specify a separate file of labels.

When a model is being trained, there are many options. L2 regularization (to prevent overfitting) can be specified with the -l option, and the optimizer used to train the model can be specified with the --optimizer option. Available options are ’sgd’ (stochastic gradient descent), ’lbfgs’ (the L-BFGS optimizer), and ’minibatch-sgd’ (minibatch stochastic gradient descent). There are also various parameters for the optimizer; the --max_iterations parameter specifies the maximum number of allowed iterations, and the --tolerance (-e) parameter specifies the tolerance for convergence. For the SGD and mini-batch SGD optimizers, the --step_size parameter controls the step size taken at each iteration by the optimizer. The batch size for mini-batch SGD is controlled with the --batch_size (-b) parameter. If the objective function for your data is oscillating between Inf and 0, the step size is probably too large. There are more parameters for the optimizers, but the C++ interface must be used to access these.

For SGD, an iteration refers to a single point, and for mini-batch SGD, an iteration refers to a single batch. So to take a single pass over the dataset with SGD, --max_iterations should be set to the number of points in the dataset.

Optionally, the model can be used to predict the responses for another matrix of data points, if --test_file is specified. The --test_file option can be specified without --input_file, so long as an existing logistic regression model is given with --model_file. The output predictions from the logistic regression model are stored in the file given with --output_predictions.

This implementation of logistic regression does not support the general multi-class case but instead only the two-class case. Any responses must be either 0 or 1.

## OPTIONAL INPUT OPTIONS

--batch_size (-b) [int]

Batch size for mini-batch SGD. Default value

50. |
--decision_boundary (-d) [double] Decision boundary for prediction; if the logistic function for a point is less than the boundary, the class is taken to be 0; otherwise, the class is 1. Default value 0.5. |

--help (-h) [bool]

Default help info. Default value 0.

--info [string]

Get help on a specific module or option. Default value ’’. --input_model_file (-m) [string] Existing model (parameters). Default value ’’.

--labels_file (-l) [string]

A matrix containing labels (0 or 1) for the points in the training set (y). Default value ’’.

--lambda (-L) [double]

L2-regularization parameter for training. Default value 0.

--max_iterations (-n) [int]

Maximum iterations for optimizer (0 indicates no limit). Default value 10000.

--optimizer (-O) [string]

Optimizer to use for training (’lbfgs’ or ’sgd’). Default value ’lbfgs’.

--step_size (-s) [double]

Step size for SGD and mini-batch SGD optimizers. Default value 0.01.

--test_file (-T) [string]

Matrix containing test dataset. Default value ’’.

--tolerance (-e) [double]

Convergence tolerance for optimizer. Default value 1e-10. --training_file (-t) [string] A matrix containing the training set (the matrix of predictors, X). Default value ’’.

--verbose (-v) [bool]

Display informational messages and the full list of parameters and timers at the end of execution. Default value 0.

--version (-V) [bool]

Display the version of mlpack. Default value

0. |

## OPTIONAL OUTPUT OPTIONS

--output_file (-o) [string]

If --test_file is specified, this matrix is where the predictions for the test set will be saved. Default value ’’. --output_model_file (-M) [string] Output for trained logistic regression model. Default value ’’. --output_probabilities_file (-p) [string] If --test_file is specified, this matrix is where the class probabilities for the test set will be saved. Default value ’’.

## ADDITIONAL INFORMATION

## ADDITIONAL INFORMATION

For further information, including relevant papers, citations, and theory, For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your consult the documentation found at http://www.mlpack.org or included with your DISTRIBUTION OF MLPACK. DISTRIBUTION OF MLPACK.