Class AbstractKMeans.Instance

    • Field Detail

      • means

        protected double[][] means
        Cluster means.
      • clusters

        protected java.util.List<ModifiableDBIDs> clusters
        Store the elements per cluster.
      • varsum

        protected double[] varsum
        Sum of squared deviations in each cluster.
      • diststat

        protected long diststat
        Number of distance computations
      • k

        protected final int k
        Number of clusters.
      • isSquared

        protected final boolean isSquared
        Indicates whether the distance function is squared.
      • key

        protected java.lang.String key
        Key for statistics logging.
    • Constructor Detail

    • Method Detail

      • distance

        protected double distance​(NumberVector x,
                                  NumberVector y)
        Compute the squared distance (and count the distance computations).
        Parameters:
        x - First object
        y - Second object
        Returns:
        Distance
      • distance

        protected double distance​(NumberVector x,
                                  double[] y)
        Compute the squared distance (and count the distance computations).
        Parameters:
        x - First object
        y - Second object
        Returns:
        Distance
      • distance

        protected double distance​(double[] x,
                                  double[] y)
        Compute the squared distance (and count the distance computations).
        Parameters:
        x - First object
        y - Second object
        Returns:
        Distance
      • sqrtdistance

        protected double sqrtdistance​(NumberVector x,
                                      NumberVector y)
        Compute the distance (and count the distance computations). If the distance is squared, also compute the square root.
        Parameters:
        x - First object
        y - Second object
        Returns:
        Distance
      • sqrtdistance

        protected double sqrtdistance​(NumberVector x,
                                      double[] y)
        Compute the distance (and count the distance computations). If the distance is squared, also compute the square root.
        Parameters:
        x - First object
        y - Second object
        Returns:
        Distance
      • sqrtdistance

        protected double sqrtdistance​(double[] x,
                                      double[] y)
        Compute the distance (and count the distance computations). If the distance is squared, also compute the square root.
        Parameters:
        x - First object
        y - Second object
        Returns:
        Distance
      • run

        public void run​(int maxiter)
        Run the clustering.
        Parameters:
        maxiter - Maximum number of iterations
      • iterate

        protected abstract int iterate​(int iteration)
        Main loop function.
        Parameters:
        iteration - Iteration number (beginning at 1)
        Returns:
        Number of reassigned points
      • meansFromSums

        protected void meansFromSums​(double[][] dst,
                                     double[][] sums,
                                     double[][] prev)
        Compute means from cluster sums by averaging.
        Parameters:
        dst - Output means
        sums - Input sums
        prev - Previous means, to handle empty clusters
      • copyMeans

        protected void copyMeans​(double[][] src,
                                 double[][] dst)
        Copy means
        Parameters:
        src - Source values
        dst - Destination values
      • assignToNearestCluster

        protected int assignToNearestCluster()
        Assign each object to the nearest cluster.
        Returns:
        number of objects reassigned
      • recomputeSeperation

        protected void recomputeSeperation​(double[] sep,
                                           double[][] cdist)
        Recompute the separation of cluster means.

        Used by Elkan's variant and Exponion.

        Parameters:
        sep - Output array of separation
        cdist - Center-to-Center distances (half-sqrt scaled)
      • initialSeperation

        protected void initialSeperation​(double[][] cdist)
        Initial separation of means. Used by Elkan, SimplifiedElkan.
        Parameters:
        cdist - Pairwise separation output (as sqrt/2)
      • computeSquaredSeparation

        protected void computeSquaredSeparation​(double[][] cost)
        Initial separation of means. Used by Hamerly, Exponion, and Annulus.
        Parameters:
        cost - Pairwise separation output (as squared/4)
      • movedDistance

        protected void movedDistance​(double[][] means,
                                     double[][] newmeans,
                                     double[] dists)
        Maximum distance moved.

        Used by Hamerly, Elkan, and derived classes.

        Parameters:
        means - Old means
        newmeans - New means
        dists - Distances moved (output)
      • buildResult

        public Clustering<KMeansModel> buildResult()
        Build a standard k-means result, with known cluster variance sums.

        Note: this expects the varsum field to be correct!

        Returns:
        Clustering result
      • buildResult

        public Clustering<KMeansModel> buildResult​(boolean varstat,
                                                   Relation<? extends NumberVector> relation)
        Build the result, recomputing the cluster variance if varstat is set to true.
        Parameters:
        varstat - Recompute cluster variance
        relation - Data relation (only needed if varstat is set)
        Returns:
        Clustering result
      • recomputeVariance

        protected void recomputeVariance​(Relation<? extends NumberVector> relation)
        Recompute the cluster variances.
        Parameters:
        relation - Data relation
      • getLogger

        protected abstract Logging getLogger()
        Get the class logger.
        Returns:
        Logger