Package elki.clustering.kmeans
Class AbstractKMeans.Instance
- java.lang.Object
-
- elki.clustering.kmeans.AbstractKMeans.Instance
-
- Direct Known Subclasses:
CompareMeans.Instance
,HamerlyKMeans.Instance
,HartiganWongKMeans.Instance
,KDTreePruningKMeans.Instance
,KMeansMinusMinus.Instance
,KMediansLloyd.Instance
,LloydKMeans.Instance
,MacQueenKMeans.Instance
,SimplifiedElkanKMeans.Instance
,SingleAssignmentKMeans.Instance
,SphericalKMeans.Instance
,YinYangKMeans.Instance
- Enclosing class:
- AbstractKMeans<V extends NumberVector,M extends Model>
public abstract static class AbstractKMeans.Instance extends java.lang.Object
Inner instance for a run, for better encapsulation, that encapsulates the standard flow of most (but not all) k-means variations.- Author:
- Erich Schubert
-
-
Field Summary
Fields Modifier and Type Field Description protected WritableIntegerDataStore
assignment
A mapping of elements to cluster ids.protected java.util.List<ModifiableDBIDs>
clusters
Store the elements per cluster.private NumberVectorDistance<?>
df
Distance function.protected long
diststat
Number of distance computationsprotected boolean
isSquared
Indicates whether the distance function is squared.protected int
k
Number of clusters.protected java.lang.String
key
Key for statistics logging.protected double[][]
means
Cluster means.protected Relation<? extends NumberVector>
relation
Data relation.protected double[]
varsum
Sum of squared deviations in each cluster.
-
Constructor Summary
Constructors Constructor Description Instance(Relation<? extends NumberVector> relation, NumberVectorDistance<?> df, double[][] means)
Constructor.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected int
assignToNearestCluster()
Assign each object to the nearest cluster.Clustering<KMeansModel>
buildResult()
Build a standard k-means result, with known cluster variance sums.Clustering<KMeansModel>
buildResult(boolean varstat, Relation<? extends NumberVector> relation)
Build the result, recomputing the cluster variance ifvarstat
is set to true.protected void
computeSquaredSeparation(double[][] cost)
Initial separation of means.protected void
copyMeans(double[][] src, double[][] dst)
Copy meansprotected double
distance(double[] x, double[] y)
Compute the squared distance (and count the distance computations).protected double
distance(NumberVector x, double[] y)
Compute the squared distance (and count the distance computations).protected double
distance(NumberVector x, NumberVector y)
Compute the squared distance (and count the distance computations).protected abstract Logging
getLogger()
Get the class logger.protected void
initialSeperation(double[][] cdist)
Initial separation of means.protected abstract int
iterate(int iteration)
Main loop function.protected void
meansFromSums(double[][] dst, double[][] sums, double[][] prev)
Compute means from cluster sums by averaging.protected void
movedDistance(double[][] means, double[][] newmeans, double[] dists)
Maximum distance moved.protected void
recomputeSeperation(double[] sep, double[][] cdist)
Recompute the separation of cluster means.protected void
recomputeVariance(Relation<? extends NumberVector> relation)
Recompute the cluster variances.void
run(int maxiter)
Run the clustering.protected double
sqrtdistance(double[] x, double[] y)
Compute the distance (and count the distance computations).protected double
sqrtdistance(NumberVector x, double[] y)
Compute the distance (and count the distance computations).protected double
sqrtdistance(NumberVector x, NumberVector y)
Compute the distance (and count the distance computations).
-
-
-
Field Detail
-
means
protected double[][] means
Cluster means.
-
clusters
protected java.util.List<ModifiableDBIDs> clusters
Store the elements per cluster.
-
assignment
protected WritableIntegerDataStore assignment
A mapping of elements to cluster ids.
-
varsum
protected double[] varsum
Sum of squared deviations in each cluster.
-
relation
protected Relation<? extends NumberVector> relation
Data relation.
-
diststat
protected long diststat
Number of distance computations
-
df
private final NumberVectorDistance<?> df
Distance function.
-
k
protected final int k
Number of clusters.
-
isSquared
protected final boolean isSquared
Indicates whether the distance function is squared.
-
key
protected java.lang.String key
Key for statistics logging.
-
-
Constructor Detail
-
Instance
public Instance(Relation<? extends NumberVector> relation, NumberVectorDistance<?> df, double[][] means)
Constructor.- Parameters:
relation
- Relation to processmeans
- Initial mean
-
-
Method Detail
-
distance
protected double distance(NumberVector x, NumberVector y)
Compute the squared distance (and count the distance computations).- Parameters:
x
- First objecty
- Second object- Returns:
- Distance
-
distance
protected double distance(NumberVector x, double[] y)
Compute the squared distance (and count the distance computations).- Parameters:
x
- First objecty
- Second object- Returns:
- Distance
-
distance
protected double distance(double[] x, double[] y)
Compute the squared distance (and count the distance computations).- Parameters:
x
- First objecty
- Second object- Returns:
- Distance
-
sqrtdistance
protected double sqrtdistance(NumberVector x, NumberVector y)
Compute the distance (and count the distance computations). If the distance is squared, also compute the square root.- Parameters:
x
- First objecty
- Second object- Returns:
- Distance
-
sqrtdistance
protected double sqrtdistance(NumberVector x, double[] y)
Compute the distance (and count the distance computations). If the distance is squared, also compute the square root.- Parameters:
x
- First objecty
- Second object- Returns:
- Distance
-
sqrtdistance
protected double sqrtdistance(double[] x, double[] y)
Compute the distance (and count the distance computations). If the distance is squared, also compute the square root.- Parameters:
x
- First objecty
- Second object- Returns:
- Distance
-
run
public void run(int maxiter)
Run the clustering.- Parameters:
maxiter
- Maximum number of iterations
-
iterate
protected abstract int iterate(int iteration)
Main loop function.- Parameters:
iteration
- Iteration number (beginning at 1)- Returns:
- Number of reassigned points
-
meansFromSums
protected void meansFromSums(double[][] dst, double[][] sums, double[][] prev)
Compute means from cluster sums by averaging.- Parameters:
dst
- Output meanssums
- Input sumsprev
- Previous means, to handle empty clusters
-
copyMeans
protected void copyMeans(double[][] src, double[][] dst)
Copy means- Parameters:
src
- Source valuesdst
- Destination values
-
assignToNearestCluster
protected int assignToNearestCluster()
Assign each object to the nearest cluster.- Returns:
- number of objects reassigned
-
recomputeSeperation
protected void recomputeSeperation(double[] sep, double[][] cdist)
Recompute the separation of cluster means.Used by Elkan's variant and Exponion.
- Parameters:
sep
- Output array of separationcdist
- Center-to-Center distances (half-sqrt scaled)
-
initialSeperation
protected void initialSeperation(double[][] cdist)
Initial separation of means. Used by Elkan, SimplifiedElkan.- Parameters:
cdist
- Pairwise separation output (as sqrt/2)
-
computeSquaredSeparation
protected void computeSquaredSeparation(double[][] cost)
Initial separation of means. Used by Hamerly, Exponion, and Annulus.- Parameters:
cost
- Pairwise separation output (as squared/4)
-
movedDistance
protected void movedDistance(double[][] means, double[][] newmeans, double[] dists)
Maximum distance moved.Used by Hamerly, Elkan, and derived classes.
- Parameters:
means
- Old meansnewmeans
- New meansdists
- Distances moved (output)
-
buildResult
public Clustering<KMeansModel> buildResult()
Build a standard k-means result, with known cluster variance sums.Note: this expects the varsum field to be correct!
- Returns:
- Clustering result
-
buildResult
public Clustering<KMeansModel> buildResult(boolean varstat, Relation<? extends NumberVector> relation)
Build the result, recomputing the cluster variance ifvarstat
is set to true.- Parameters:
varstat
- Recompute cluster variancerelation
- Data relation (only needed if varstat is set)- Returns:
- Clustering result
-
recomputeVariance
protected void recomputeVariance(Relation<? extends NumberVector> relation)
Recompute the cluster variances.- Parameters:
relation
- Data relation
-
getLogger
protected abstract Logging getLogger()
Get the class logger.- Returns:
- Logger
-
-