Class XMeans<V extends NumberVector,​M extends MeanModel>

  • Type Parameters:
    V - Vector type
    M - Model type
    All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<M>>, KMeans<V,​M>
    Direct Known Subclasses:
    GMeans

    @Title("X-means")
    @Reference(authors="D. Pelleg, A. Moore",
               title="X-means: Extending K-means with Efficient Estimation on the Number of Clusters",
               booktitle="Proc. 17th Int. Conf. on Machine Learning (ICML 2000)",
               url="http://www.pelleg.org/shared/hp/download/xmeans.ps",
               bibkey="DBLP:conf/icml/PellegM00")
    public class XMeans<V extends NumberVector,​M extends MeanModel>
    extends AbstractKMeans<V,​M>
    X-means: Extending K-means with Efficient Estimation on the Number of Clusters.

    Note: this implementation does currently not use a k-d-tree for acceleration. Also note that kmax is not a hard threshold - the algorithm can return up to 2*kmax clusters!

    Reference:

    D. Pelleg, A. Moore
    X-means: Extending K-means with Efficient Estimation on the Number of Clusters
    Proc. 17th Int. Conf. on Machine Learning (ICML 2000)

    Since:
    0.7.0
    Author:
    Tibor Goldschwendt, Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        The logger for this class.
      • k_min

        private int k_min
        Effective number of clusters, minimum and maximum.
      • k_max

        private int k_max
        Effective number of clusters, minimum and maximum.
      • splitInitializer

        Predefined splitInitializer
        Initializer for k-means.
    • Constructor Detail

      • XMeans

        public XMeans​(NumberVectorDistance<? super V> distance,
                      int k_min,
                      int k_max,
                      int maxiter,
                      KMeans<V,​M> innerKMeans,
                      KMeansInitialization initializer,
                      KMeansQualityMeasure<V> informationCriterion,
                      RandomFactory random)
        Constructor.
        Parameters:
        distance - Distance function
        k_min - k_min parameter - minimum number of result clusters
        k_max - k_max parameter - maximum number of result clusters
        maxiter - Maximum number of iterations each.
        innerKMeans - K-Means variant to use inside.
        informationCriterion - The information criterion used for the splitting step
        random - Random factory
    • Method Detail

      • run

        public Clustering<M> run​(Relation<V> relation)
        Run the algorithm on a database and relation.
        Parameters:
        relation - Data relation
        Returns:
        Clustering result.
      • splitCluster

        protected java.util.List<Cluster<M>> splitCluster​(Cluster<M> parentCluster,
                                                          Relation<V> relation)
        Conditionally splits the clusters based on the information criterion.
        Parameters:
        parentCluster - Cluster to split
        relation - Data relation
        Returns:
        Parent cluster when split decreases clustering quality or child clusters when split improves clustering.
      • splitCentroid

        protected double[][] splitCentroid​(Cluster<? extends MeanModel> parentCluster,
                                           Relation<V> relation)
        Split an existing centroid into two initial centers.
        Parameters:
        parentCluster - Existing cluster
        relation - Data relation
        Returns:
        List of new centroids