Package elki.clustering.kmeans
Class AbstractKMeans.Instance
- java.lang.Object
-
- elki.clustering.kmeans.AbstractKMeans.Instance
-
- Direct Known Subclasses:
CompareMeans.Instance,HamerlyKMeans.Instance,HartiganWongKMeans.Instance,KDTreePruningKMeans.Instance,KMeansMinusMinus.Instance,KMediansLloyd.Instance,LloydKMeans.Instance,MacQueenKMeans.Instance,SimplifiedElkanKMeans.Instance,SingleAssignmentKMeans.Instance,SphericalKMeans.Instance,YinYangKMeans.Instance
- Enclosing class:
- AbstractKMeans<V extends NumberVector,M extends Model>
public abstract static class AbstractKMeans.Instance extends java.lang.ObjectInner instance for a run, for better encapsulation, that encapsulates the standard flow of most (but not all) k-means variations.- Author:
- Erich Schubert
-
-
Field Summary
Fields Modifier and Type Field Description protected WritableIntegerDataStoreassignmentA mapping of elements to cluster ids.protected java.util.List<ModifiableDBIDs>clustersStore the elements per cluster.private NumberVectorDistance<?>dfDistance function.protected longdiststatNumber of distance computationsprotected booleanisSquaredIndicates whether the distance function is squared.protected intkNumber of clusters.protected java.lang.StringkeyKey for statistics logging.protected double[][]meansCluster means.protected Relation<? extends NumberVector>relationData relation.protected double[]varsumSum of squared deviations in each cluster.
-
Constructor Summary
Constructors Constructor Description Instance(Relation<? extends NumberVector> relation, NumberVectorDistance<?> df, double[][] means)Constructor.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected intassignToNearestCluster()Assign each object to the nearest cluster.Clustering<KMeansModel>buildResult()Build a standard k-means result, with known cluster variance sums.Clustering<KMeansModel>buildResult(boolean varstat, Relation<? extends NumberVector> relation)Build the result, recomputing the cluster variance ifvarstatis set to true.protected voidcomputeSquaredSeparation(double[][] cost)Initial separation of means.protected voidcopyMeans(double[][] src, double[][] dst)Copy meansprotected doubledistance(double[] x, double[] y)Compute the squared distance (and count the distance computations).protected doubledistance(NumberVector x, double[] y)Compute the squared distance (and count the distance computations).protected doubledistance(NumberVector x, NumberVector y)Compute the squared distance (and count the distance computations).protected abstract LogginggetLogger()Get the class logger.protected voidinitialSeperation(double[][] cdist)Initial separation of means.protected abstract intiterate(int iteration)Main loop function.protected voidmeansFromSums(double[][] dst, double[][] sums, double[][] prev)Compute means from cluster sums by averaging.protected voidmovedDistance(double[][] means, double[][] newmeans, double[] dists)Maximum distance moved.protected voidrecomputeSeperation(double[] sep, double[][] cdist)Recompute the separation of cluster means.protected voidrecomputeVariance(Relation<? extends NumberVector> relation)Recompute the cluster variances.voidrun(int maxiter)Run the clustering.protected doublesqrtdistance(double[] x, double[] y)Compute the distance (and count the distance computations).protected doublesqrtdistance(NumberVector x, double[] y)Compute the distance (and count the distance computations).protected doublesqrtdistance(NumberVector x, NumberVector y)Compute the distance (and count the distance computations).
-
-
-
Field Detail
-
means
protected double[][] means
Cluster means.
-
clusters
protected java.util.List<ModifiableDBIDs> clusters
Store the elements per cluster.
-
assignment
protected WritableIntegerDataStore assignment
A mapping of elements to cluster ids.
-
varsum
protected double[] varsum
Sum of squared deviations in each cluster.
-
relation
protected Relation<? extends NumberVector> relation
Data relation.
-
diststat
protected long diststat
Number of distance computations
-
df
private final NumberVectorDistance<?> df
Distance function.
-
k
protected final int k
Number of clusters.
-
isSquared
protected final boolean isSquared
Indicates whether the distance function is squared.
-
key
protected java.lang.String key
Key for statistics logging.
-
-
Constructor Detail
-
Instance
public Instance(Relation<? extends NumberVector> relation, NumberVectorDistance<?> df, double[][] means)
Constructor.- Parameters:
relation- Relation to processmeans- Initial mean
-
-
Method Detail
-
distance
protected double distance(NumberVector x, NumberVector y)
Compute the squared distance (and count the distance computations).- Parameters:
x- First objecty- Second object- Returns:
- Distance
-
distance
protected double distance(NumberVector x, double[] y)
Compute the squared distance (and count the distance computations).- Parameters:
x- First objecty- Second object- Returns:
- Distance
-
distance
protected double distance(double[] x, double[] y)Compute the squared distance (and count the distance computations).- Parameters:
x- First objecty- Second object- Returns:
- Distance
-
sqrtdistance
protected double sqrtdistance(NumberVector x, NumberVector y)
Compute the distance (and count the distance computations). If the distance is squared, also compute the square root.- Parameters:
x- First objecty- Second object- Returns:
- Distance
-
sqrtdistance
protected double sqrtdistance(NumberVector x, double[] y)
Compute the distance (and count the distance computations). If the distance is squared, also compute the square root.- Parameters:
x- First objecty- Second object- Returns:
- Distance
-
sqrtdistance
protected double sqrtdistance(double[] x, double[] y)Compute the distance (and count the distance computations). If the distance is squared, also compute the square root.- Parameters:
x- First objecty- Second object- Returns:
- Distance
-
run
public void run(int maxiter)
Run the clustering.- Parameters:
maxiter- Maximum number of iterations
-
iterate
protected abstract int iterate(int iteration)
Main loop function.- Parameters:
iteration- Iteration number (beginning at 1)- Returns:
- Number of reassigned points
-
meansFromSums
protected void meansFromSums(double[][] dst, double[][] sums, double[][] prev)Compute means from cluster sums by averaging.- Parameters:
dst- Output meanssums- Input sumsprev- Previous means, to handle empty clusters
-
copyMeans
protected void copyMeans(double[][] src, double[][] dst)Copy means- Parameters:
src- Source valuesdst- Destination values
-
assignToNearestCluster
protected int assignToNearestCluster()
Assign each object to the nearest cluster.- Returns:
- number of objects reassigned
-
recomputeSeperation
protected void recomputeSeperation(double[] sep, double[][] cdist)Recompute the separation of cluster means.Used by Elkan's variant and Exponion.
- Parameters:
sep- Output array of separationcdist- Center-to-Center distances (half-sqrt scaled)
-
initialSeperation
protected void initialSeperation(double[][] cdist)
Initial separation of means. Used by Elkan, SimplifiedElkan.- Parameters:
cdist- Pairwise separation output (as sqrt/2)
-
computeSquaredSeparation
protected void computeSquaredSeparation(double[][] cost)
Initial separation of means. Used by Hamerly, Exponion, and Annulus.- Parameters:
cost- Pairwise separation output (as squared/4)
-
movedDistance
protected void movedDistance(double[][] means, double[][] newmeans, double[] dists)Maximum distance moved.Used by Hamerly, Elkan, and derived classes.
- Parameters:
means- Old meansnewmeans- New meansdists- Distances moved (output)
-
buildResult
public Clustering<KMeansModel> buildResult()
Build a standard k-means result, with known cluster variance sums.Note: this expects the varsum field to be correct!
- Returns:
- Clustering result
-
buildResult
public Clustering<KMeansModel> buildResult(boolean varstat, Relation<? extends NumberVector> relation)
Build the result, recomputing the cluster variance ifvarstatis set to true.- Parameters:
varstat- Recompute cluster variancerelation- Data relation (only needed if varstat is set)- Returns:
- Clustering result
-
recomputeVariance
protected void recomputeVariance(Relation<? extends NumberVector> relation)
Recompute the cluster variances.- Parameters:
relation- Data relation
-
getLogger
protected abstract Logging getLogger()
Get the class logger.- Returns:
- Logger
-
-