MLPACK  1.0.10
mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy > Class Template Reference

This class implements K-Means clustering. More...

Public Member Functions

 KMeans (const size_t maxIterations=1000, const double overclusteringFactor=1.0, const MetricType metric=MetricType(), const InitialPartitionPolicy partitioner=InitialPartitionPolicy(), const EmptyClusterPolicy emptyClusterAction=EmptyClusterPolicy())
 Create a K-Means object and (optionally) set the parameters which K-Means will be run with. More...

 
template
<
typename
MatType
>
void Cluster (const MatType &data, const size_t clusters, arma::Col< size_t > &assignments, const bool initialGuess=false) const
 Perform k-means clustering on the data, returning a list of cluster assignments. More...

 
template
<
typename
MatType
>
void Cluster (const MatType &data, const size_t clusters, arma::Col< size_t > &assignments, MatType &centroids, const bool initialAssignmentGuess=false, const bool initialCentroidGuess=false) const
 Perform k-means clustering on the data, returning a list of cluster assignments and also the centroids of each cluster. More...

 
const EmptyClusterPolicy & EmptyClusterAction () const
 Get the empty cluster policy. More...

 
EmptyClusterPolicy & EmptyClusterAction ()
 Modify the empty cluster policy. More...

 
size_t MaxIterations () const
 Get the maximum number of iterations. More...

 
size_t & MaxIterations ()
 Set the maximum number of iterations. More...

 
const MetricType & Metric () const
 Get the distance metric. More...

 
MetricType & Metric ()
 Modify the distance metric. More...

 
double OverclusteringFactor () const
 Return the overclustering factor. More...

 
double & OverclusteringFactor ()
 Set the overclustering factor. Must be greater than 1. More...

 
const InitialPartitionPolicy & Partitioner () const
 Get the initial partitioning policy. More...

 
InitialPartitionPolicy & Partitioner ()
 Modify the initial partitioning policy. More...

 
std::string ToString () const
 

Private Attributes

EmptyClusterPolicy emptyClusterAction
 Instantiated empty cluster policy. More...

 
size_t maxIterations
 Maximum number of iterations before giving up. More...

 
MetricType metric
 Instantiated distance metric. More...

 
double overclusteringFactor
 Factor controlling how many clusters are actually found. More...

 
InitialPartitionPolicy partitioner
 Instantiated initial partitioning policy. More...

 

Detailed Description


template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>

class mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >

This class implements K-Means clustering.

This implementation supports overclustering, which means that more clusters than are requested will be found; then, those clusters will be merged together to produce the desired number of clusters.

Two template parameters can (optionally) be supplied: the policy for how to find the initial partition of the data, and the actions to be taken when an empty cluster is encountered, as well as the distance metric to be used.

A simple example of how to run K-Means clustering is shown below.

extern arma::mat data; // Dataset we want to run K-Means on.
arma::Col<size_t> assignments; // Cluster assignments.
KMeans<> k; // Default options.
k.Cluster(data, 3, assignments); // 3 clusters.
// Cluster using the Manhattan distance, 100 iterations maximum, and an
// overclustering factor of 4.0.
KMeans<metric::ManhattanDistance> k(100, 4.0);
k.Cluster(data, 6, assignments); // 6 clusters.
Template Parameters
MetricTypeThe distance metric to use for this KMeans; see metric::LMetric for an example.
InitialPartitionPolicyInitial partitioning policy; must implement a default constructor and 'void Cluster(const arma::mat&, const size_t, arma::Col<size_t>&)'.
EmptyClusterPolicyPolicy for what to do on an empty cluster; must implement a default constructor and 'void EmptyCluster(const arma::mat&, arma::Col<size_t&)'.
See also
RandomPartition, RefinedStart, AllowEmptyClusters, MaxVarianceNewCluster

Definition at line 75 of file kmeans.hpp.

Constructor & Destructor Documentation

◆ KMeans()

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::KMeans ( const size_t  maxIterations = 1000,
const double  overclusteringFactor = 1.0,
const MetricType  metric = MetricType(),
const InitialPartitionPolicy  partitioner = InitialPartitionPolicy(),
const EmptyClusterPolicy  emptyClusterAction = EmptyClusterPolicy() 
)

Create a K-Means object and (optionally) set the parameters which K-Means will be run with.

This implementation allows a few strategies to improve the performance of K-Means, including "overclustering" and disallowing empty clusters.

The overclustering factor controls how many clusters are actually found; for instance, with an overclustering factor of 4, if K-Means is run to find 3 clusters, it will actually find 12, then merge the nearest clusters until only 3 are left.

Parameters
maxIterationsMaximum number of iterations allowed before giving up (0 is valid, but the algorithm may never terminate).
overclusteringFactorFactor controlling how many extra clusters are found and then merged to get the desired number of clusters.
metricOptional MetricType object; for when the metric has state it needs to store.
partitionerOptional InitialPartitionPolicy object; for when a specially initialized partitioning policy is required.
emptyClusterActionOptional EmptyClusterPolicy object; for when a specially initialized empty cluster policy is required.

Member Function Documentation

◆ Cluster() [1/2]

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
template
<
typename
MatType
>
void mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::Cluster ( const MatType &  data,
const size_t  clusters,
arma::Col< size_t > &  assignments,
const bool  initialGuess = false 
) const

Perform k-means clustering on the data, returning a list of cluster assignments.

Optionally, the vector of assignments can be set to an initial guess of the cluster assignments; to do this, set initialGuess to true.

Template Parameters
MatTypeType of matrix (arma::mat or arma::sp_mat).
Parameters
dataDataset to cluster.
clustersNumber of clusters to compute.
assignmentsVector to store cluster assignments in.
initialGuessIf true, then it is assumed that assignments has a list of initial cluster assignments.

◆ Cluster() [2/2]

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
template
<
typename
MatType
>
void mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::Cluster ( const MatType &  data,
const size_t  clusters,
arma::Col< size_t > &  assignments,
MatType &  centroids,
const bool  initialAssignmentGuess = false,
const bool  initialCentroidGuess = false 
) const

Perform k-means clustering on the data, returning a list of cluster assignments and also the centroids of each cluster.

Optionally, the vector of assignments can be set to an initial guess of the cluster assignments; to do this, set initialAssignmentGuess to true. Another way to set initial cluster guesses is to fill the centroids matrix with the centroid guesses, and then set initialCentroidGuess to true. initialAssignmentGuess supersedes initialCentroidGuess, so if both are set to true, the assignments vector is used.

Note that if the overclustering factor is greater than 1, the centroids matrix will be resized in the method. Regardless of the overclustering factor, the centroid guess matrix (if initialCentroidGuess is set to true) should have the same number of rows as the data matrix, and number of columns equal to 'clusters'.

Template Parameters
MatTypeType of matrix (arma::mat or arma::sp_mat).
Parameters
dataDataset to cluster.
clustersNumber of clusters to compute.
assignmentsVector to store cluster assignments in.
centroidsMatrix in which centroids are stored.
initialAssignmentGuessIf true, then it is assumed that assignments has a list of initial cluster assignments.
initialCentroidGuessIf true, then it is assumed that centroids contains the initial centroids of each cluster.

◆ EmptyClusterAction() [1/2]

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
const EmptyClusterPolicy& mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::EmptyClusterAction ( ) const
inline

◆ EmptyClusterAction() [2/2]

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
EmptyClusterPolicy& mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::EmptyClusterAction ( )
inline

◆ MaxIterations() [1/2]

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
size_t mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::MaxIterations ( ) const
inline

Get the maximum number of iterations.

Definition at line 166 of file kmeans.hpp.

References mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::maxIterations.

◆ MaxIterations() [2/2]

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
size_t& mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::MaxIterations ( )
inline

Set the maximum number of iterations.

Definition at line 168 of file kmeans.hpp.

References mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::maxIterations.

◆ Metric() [1/2]

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
const MetricType& mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::Metric ( ) const
inline

Get the distance metric.

Definition at line 171 of file kmeans.hpp.

References mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::metric.

◆ Metric() [2/2]

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
MetricType& mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::Metric ( )
inline

Modify the distance metric.

Definition at line 173 of file kmeans.hpp.

References mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::metric.

◆ OverclusteringFactor() [1/2]

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
double mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::OverclusteringFactor ( ) const
inline

Return the overclustering factor.

Definition at line 161 of file kmeans.hpp.

References mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::overclusteringFactor.

◆ OverclusteringFactor() [2/2]

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
double& mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::OverclusteringFactor ( )
inline

Set the overclustering factor. Must be greater than 1.

Definition at line 163 of file kmeans.hpp.

References mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::overclusteringFactor.

◆ Partitioner() [1/2]

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
const InitialPartitionPolicy& mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::Partitioner ( ) const
inline

Get the initial partitioning policy.

Definition at line 176 of file kmeans.hpp.

References mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::partitioner.

◆ Partitioner() [2/2]

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
InitialPartitionPolicy& mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::Partitioner ( )
inline

Modify the initial partitioning policy.

Definition at line 178 of file kmeans.hpp.

References mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::partitioner.

◆ ToString()

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
std::string mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::ToString ( ) const

Member Data Documentation

◆ emptyClusterAction

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
EmptyClusterPolicy mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::emptyClusterAction
private

Instantiated empty cluster policy.

Definition at line 199 of file kmeans.hpp.

Referenced by mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::EmptyClusterAction().

◆ maxIterations

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
size_t mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::maxIterations
private

Maximum number of iterations before giving up.

Definition at line 193 of file kmeans.hpp.

Referenced by mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::MaxIterations().

◆ metric

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
MetricType mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::metric
private

Instantiated distance metric.

Definition at line 195 of file kmeans.hpp.

Referenced by mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::Metric().

◆ overclusteringFactor

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
double mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::overclusteringFactor
private

Factor controlling how many clusters are actually found.

Definition at line 191 of file kmeans.hpp.

Referenced by mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::OverclusteringFactor().

◆ partitioner

template
<
typename
MetricType
=
metric::SquaredEuclideanDistance
,
typename
InitialPartitionPolicy
=
RandomPartition
,
typename
EmptyClusterPolicy
=
MaxVarianceNewCluster
>
InitialPartitionPolicy mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::partitioner
private

Instantiated initial partitioning policy.

Definition at line 197 of file kmeans.hpp.

Referenced by mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy >::Partitioner().


The documentation for this class was generated from the following file: