Package elki.clustering.uncertain
Class UKMeans
- java.lang.Object
-
- elki.clustering.uncertain.UKMeans
-
- All Implemented Interfaces:
Algorithm,ClusteringAlgorithm<Clustering<KMeansModel>>
@Title("UK-means") @Reference(authors="M. Chau, R. Cheng, B. Kao, J. Ng", title="Uncertain data mining: An example in clustering location data", booktitle="Proc. 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006)", url="https://doi.org/10.1007/11731139_24", bibkey="DBLP:conf/pakdd/ChauCKN06") public class UKMeans extends java.lang.Object implements ClusteringAlgorithm<Clustering<KMeansModel>>
Uncertain K-Means clustering, using the average deviation from the center.Note: this method is, essentially, superficial. It was shown to be equivalent to doing regular K-means on the object centroids instead (see
CKMeansfor the reference and an implementation). This is only for completeness.Reference:
M. Chau, R. Cheng, B. Kao, J. Ng
Uncertain data mining: An example in clustering location data
Proc. 10th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD)- Since:
- 0.7.0
- Author:
- Klaus Arthur Schmidt
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classUKMeans.ParParameterization class.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description protected intkNumber of cluster centers to initialize.private static java.lang.StringKEYKey for statistics logging.private static LoggingLOGClass logger.protected intmaxiterMaximum number of iterationsprotected RandomFactoryrndOur Random factory
-
Constructor Summary
Constructors Constructor Description UKMeans(int k, int maxiter, RandomFactory rnd)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected booleanassignToNearestCluster(Relation<DiscreteUncertainObject> relation, java.util.List<double[]> means, java.util.List<? extends ModifiableDBIDs> clusters, WritableIntegerDataStore assignment, double[] varsum)Returns a list of clusters.protected doublegetExpectedRepDistance(NumberVector rep, DiscreteUncertainObject uo)Get expected distance between a Vector and an uncertain objectTypeInformation[]getInputTypeRestriction()Get the input type restriction used for negotiating the data query.protected voidlogVarstat(DoubleStatistic varstat, double[] varsum)Log statistics on the variance sum.protected java.util.List<double[]>means(java.util.List<? extends ModifiableDBIDs> clusters, java.util.List<double[]> means, Relation<DiscreteUncertainObject> database)Returns the mean vectors of the given clusters in the given database.Clustering<KMeansModel>run(Relation<DiscreteUncertainObject> relation)Run the clustering.protected booleanupdateAssignment(DBIDIter iditer, java.util.List<? extends ModifiableDBIDs> clusters, WritableIntegerDataStore assignment, int newA)Update the cluster assignment.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger.
-
KEY
private static final java.lang.String KEY
Key for statistics logging.
-
k
protected int k
Number of cluster centers to initialize.
-
maxiter
protected int maxiter
Maximum number of iterations
-
rnd
protected RandomFactory rnd
Our Random factory
-
-
Constructor Detail
-
UKMeans
public UKMeans(int k, int maxiter, RandomFactory rnd)Constructor.- Parameters:
k- Number of clustersmaxiter- Maximum number of iterationsrnd- Random initialization
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:AlgorithmGet the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestrictionin interfaceAlgorithm- Returns:
- Type restriction
-
run
public Clustering<KMeansModel> run(Relation<DiscreteUncertainObject> relation)
Run the clustering.- Parameters:
relation- the Relation- Returns:
- Clustering result
-
assignToNearestCluster
protected boolean assignToNearestCluster(Relation<DiscreteUncertainObject> relation, java.util.List<double[]> means, java.util.List<? extends ModifiableDBIDs> clusters, WritableIntegerDataStore assignment, double[] varsum)
Returns a list of clusters. The kth cluster contains the ids of those FeatureVectors, that are nearest to the kth mean.- Parameters:
relation- the database to clustermeans- a list of k meansclusters- cluster assignmentassignment- Current cluster assignmentvarsum- Variance sum output- Returns:
- true when the object was reassigned
-
updateAssignment
protected boolean updateAssignment(DBIDIter iditer, java.util.List<? extends ModifiableDBIDs> clusters, WritableIntegerDataStore assignment, int newA)
Update the cluster assignment.- Parameters:
iditer- Object idclusters- Cluster listassignment- Assignment storagenewA- New assignment.- Returns:
trueif the assignment has changed.
-
getExpectedRepDistance
protected double getExpectedRepDistance(NumberVector rep, DiscreteUncertainObject uo)
Get expected distance between a Vector and an uncertain object- Parameters:
rep- A vector, e.g., a cluster representativeuo- A discrete uncertain object- Returns:
- The distance
-
means
protected java.util.List<double[]> means(java.util.List<? extends ModifiableDBIDs> clusters, java.util.List<double[]> means, Relation<DiscreteUncertainObject> database)
Returns the mean vectors of the given clusters in the given database.- Parameters:
clusters- the clusters to compute the meansmeans- the recent meansdatabase- the database containing the vectors- Returns:
- the mean vectors of the given clusters in the given database
-
logVarstat
protected void logVarstat(DoubleStatistic varstat, double[] varsum)
Log statistics on the variance sum.- Parameters:
varstat- Statistics log instancevarsum- Variance sum per cluster
-
-