A refined approach for choosing initial points for kmeans clustering. More...
Public Member Functions  
RefinedStart (const size_t samplings=100, const double percentage=0.02)  
Create the RefinedStart object, optionally specifying parameters for the number of samplings to perform and the percentage of the dataset to use in each sampling. More...  
template < typename MatType >  
void  Cluster (const MatType &data, const size_t clusters, arma::mat ¢roids) const 
Partition the given dataset into the given number of clusters according to the random sampling scheme outlined in Bradley and Fayyad's paper, and return centroids. More...  
template < typename MatType >  
void  Cluster (const MatType &data, const size_t clusters, arma::Row< size_t > &assignments) const 
Partition the given dataset into the given number of clusters according to the random sampling scheme outlined in Bradley and Fayyad's paper, and return point assignments. More...  
double  Percentage () const 
Get the percentage of the data used by each subsampling. More...  
double &  Percentage () 
Modify the percentage of the data used by each subsampling. More...  
size_t  Samplings () const 
Get the number of samplings that will be performed. More...  
size_t &  Samplings () 
Modify the number of samplings that will be performed. More...  
template < typename Archive >  
void  serialize (Archive &ar, const unsigned int) 
Serialize the object. More...  
A refined approach for choosing initial points for kmeans clustering.
This approach runs kmeans several times on random subsets of the data, and then clusters those solutions to select refined initial cluster assignments. It is an implementation of the following paper:
{bradley1998refining, title={Refining initial points for kmeans clustering}, author={Bradley, Paul S and Fayyad, Usama M}, booktitle={Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998)}, volume={66}, year={1998} }
Definition at line 37 of file refined_start.hpp.

inline 
Create the RefinedStart object, optionally specifying parameters for the number of samplings to perform and the percentage of the dataset to use in each sampling.
Definition at line 45 of file refined_start.hpp.
References RefinedStart::Cluster().
void Cluster  (  const MatType &  data, 
const size_t  clusters,  
arma::mat &  centroids  
)  const 
Partition the given dataset into the given number of clusters according to the random sampling scheme outlined in Bradley and Fayyad's paper, and return centroids.
MatType  Type of data (arma::mat or arma::sp_mat). 
data  Dataset to partition. 
clusters  Number of clusters to split dataset into. 
centroids  Matrix to store centroids into. 
Referenced by RefinedStart::RefinedStart().
void Cluster  (  const MatType &  data, 
const size_t  clusters,  
arma::Row< size_t > &  assignments  
)  const 
Partition the given dataset into the given number of clusters according to the random sampling scheme outlined in Bradley and Fayyad's paper, and return point assignments.
MatType  Type of data (arma::mat or arma::sp_mat). 
data  Dataset to partition. 
clusters  Number of clusters to split dataset into. 
assignments  Vector to store cluster assignments into. Values will be between 0 and (clusters  1). 

inline 
Get the percentage of the data used by each subsampling.
Definition at line 86 of file refined_start.hpp.

inline 
Modify the percentage of the data used by each subsampling.
Definition at line 88 of file refined_start.hpp.

inline 
Get the number of samplings that will be performed.
Definition at line 81 of file refined_start.hpp.

inline 
Modify the number of samplings that will be performed.
Definition at line 83 of file refined_start.hpp.

inline 
Serialize the object.
Definition at line 92 of file refined_start.hpp.