mlpack implements a generic hyperparameter tuner that is able to tune both continuous and discrete parameters of various different algorithms. This is an important task—the performance of many machine learning algorithms can be highly dependent on the hyperparameters that are chosen for that algorithm. (One example: the choice of for a -nearest-neighbors classifier.)
This hyper-parameter tuner is built on the same general concept as the cross-validation classes (see the cross-validation tutorial): given some machine learning algorithm, some data, some performance measure, and a set of hyperparameters, attempt to find the hyperparameter set that best optimizes the performance measure on the given data with the given algorithm.
mlpack's implementation of hyperparameter tuning is flexible, and is built in a way that supports many algorithms and many optimizers. At the time of this writing, complex hyperparameter optimization techniques are not available, but the hyperparameter tuner does support these, should they be implemented in the future.
In this tutorial we will see the usage examples of the hyper-parameter tuning module, and also more details about the
The interface of the hyper-parameter tuning module is quite similar to the interface of the cross-validation module. To construct a
HyperParameterTuner object you need to specify as template parameters what machine learning algorithm, cross-validation strategy, performance measure, and optimization strategy (GridSearch will be used by default) you are going to use. Then, you must pass the same arguments as for the cross-validation classes: the data and labels (or responses) to use are given to the constructor, and the possible hyperparameter values are given to the
HyperParameterTuner::Optimize() method, which returns the best algorithm configuration as a
Let's see some examples.
Suppose we have the following data to train and validate on.
In this example we have used GridSearch (the default optimizer) to find a good value for the
lambda hyper-parameter. For that we have specified what values should be tried.
When some hyper-parameters should not be optimized, you can specify values for them with the
Fixed() method as in the following example of trying to find good
lambda2 values for LARS (least-angle regression).
Note that for the call to
hpt2.Optimize(), we have used the same order of arguments as they appear in the corresponding LARS constructor:
In some cases we may wish to optimize a hyperparameter over the space of all possible real values, instead of providing a grid in which to search. Alternately, we may know approximately optimal values from a grid search for real-valued hyperparameters, but wish to further tune those values.
In this case, we can use a gradient-based optimizer for hyperparameter search. In the following example, we try to optimize the
lambda2 hyper-parameters for LARS with the GradientDescent optimizer.
HyperParameterTuner class is very similar to the KFoldCV and SimpleCV classes (see the "cross-validation tutorial" for more information on those two classes), but there are a few important differences.
HyperParameterTuner accepts five different hyperparameters; only the first three of these are required:
MLAlgorithmThis is the algorithm to be used.
MetricThis is the performance measure to be used; see Performance measures for more information.
CVTypeThis is the type of cross-validation to be used for evaluating the performance measure; this should be KFoldCV or SimpleCV.
OptimizerTypeThis is the type of optimizer to use; it can be
GridSearchor a gradient-based optimizer.
MatTypeThis is the type of data matrix to use. The default is
arma::mat. This only needs to be changed if you are specifically using sparse data, or if you want to use a numeric type other than
The last two template parameters are automatically inferred by the
HyperParameterTuner and should not need to be manually specified, unless an unconventional data type like
arma::fmat is being used for data points.
Typically, SimpleCV is a good choice for
CVType because it takes so much less time to compute than full KFoldCV; however, the disadvantage is that SimpleCV might give a somewhat more noisy estimate of the performance measure on unseen test data.
The constructor for the
HyperParameterTuner is called with exactly the same arguments as the corresponding
CVType that has been chosen. For more information on that, please see the cross-validation constructor tutorial. As an example, if we are using SimpleCV and wish to hold out 20% of the dataset as a validation set, we might construct a
HyperParameterTuner like this:
Next, we must set up the hyperparameters to be optimized. If we are doing a grid search with the GridSearch optimizer (the default), then we only need to pass a
std::vector (for non-numeric hyperparameters) or an
arma::vec (for numeric hyperparameters) containing all of the possible choices that we wish to search over.
For instance, a set of numeric values might be chosen like this, for the
lambda parameter (of type
Similarly, a set of non-numeric values might be chosen like this, for the
Once all of these are set up, the
HyperParameterTuner::Optimize() method may be called to find the best set of hyperparameters:
For continuous optimizers like GradientDescent, a range does not need to be specified but instead only a single value. See the Gradient-Based Optimization section for more details.