Package elki.clustering.uncertain
Class UKMeans
- java.lang.Object
-
- elki.clustering.uncertain.UKMeans
-
- All Implemented Interfaces:
Algorithm
,ClusteringAlgorithm<Clustering<KMeansModel>>
@Title("UK-means") @Reference(authors="M. Chau, R. Cheng, B. Kao, J. Ng", title="Uncertain data mining: An example in clustering location data", booktitle="Proc. 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006)", url="https://doi.org/10.1007/11731139_24", bibkey="DBLP:conf/pakdd/ChauCKN06") public class UKMeans extends java.lang.Object implements ClusteringAlgorithm<Clustering<KMeansModel>>
Uncertain K-Means clustering, using the average deviation from the center.Note: this method is, essentially, superficial. It was shown to be equivalent to doing regular K-means on the object centroids instead (see
CKMeans
for the reference and an implementation). This is only for completeness.Reference:
M. Chau, R. Cheng, B. Kao, J. Ng
Uncertain data mining: An example in clustering location data
Proc. 10th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD)- Since:
- 0.7.0
- Author:
- Klaus Arthur Schmidt
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
UKMeans.Par
Parameterization class.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description protected int
k
Number of cluster centers to initialize.private static java.lang.String
KEY
Key for statistics logging.private static Logging
LOG
Class logger.protected int
maxiter
Maximum number of iterationsprotected RandomFactory
rnd
Our Random factory
-
Constructor Summary
Constructors Constructor Description UKMeans(int k, int maxiter, RandomFactory rnd)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected boolean
assignToNearestCluster(Relation<DiscreteUncertainObject> relation, java.util.List<double[]> means, java.util.List<? extends ModifiableDBIDs> clusters, WritableIntegerDataStore assignment, double[] varsum)
Returns a list of clusters.protected double
getExpectedRepDistance(NumberVector rep, DiscreteUncertainObject uo)
Get expected distance between a Vector and an uncertain objectTypeInformation[]
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.protected void
logVarstat(DoubleStatistic varstat, double[] varsum)
Log statistics on the variance sum.protected java.util.List<double[]>
means(java.util.List<? extends ModifiableDBIDs> clusters, java.util.List<double[]> means, Relation<DiscreteUncertainObject> database)
Returns the mean vectors of the given clusters in the given database.Clustering<KMeansModel>
run(Relation<DiscreteUncertainObject> relation)
Run the clustering.protected boolean
updateAssignment(DBIDIter iditer, java.util.List<? extends ModifiableDBIDs> clusters, WritableIntegerDataStore assignment, int newA)
Update the cluster assignment.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger.
-
KEY
private static final java.lang.String KEY
Key for statistics logging.
-
k
protected int k
Number of cluster centers to initialize.
-
maxiter
protected int maxiter
Maximum number of iterations
-
rnd
protected RandomFactory rnd
Our Random factory
-
-
Constructor Detail
-
UKMeans
public UKMeans(int k, int maxiter, RandomFactory rnd)
Constructor.- Parameters:
k
- Number of clustersmaxiter
- Maximum number of iterationsrnd
- Random initialization
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:Algorithm
Get the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestriction
in interfaceAlgorithm
- Returns:
- Type restriction
-
run
public Clustering<KMeansModel> run(Relation<DiscreteUncertainObject> relation)
Run the clustering.- Parameters:
relation
- the Relation- Returns:
- Clustering result
-
assignToNearestCluster
protected boolean assignToNearestCluster(Relation<DiscreteUncertainObject> relation, java.util.List<double[]> means, java.util.List<? extends ModifiableDBIDs> clusters, WritableIntegerDataStore assignment, double[] varsum)
Returns a list of clusters. The kth cluster contains the ids of those FeatureVectors, that are nearest to the kth mean.- Parameters:
relation
- the database to clustermeans
- a list of k meansclusters
- cluster assignmentassignment
- Current cluster assignmentvarsum
- Variance sum output- Returns:
- true when the object was reassigned
-
updateAssignment
protected boolean updateAssignment(DBIDIter iditer, java.util.List<? extends ModifiableDBIDs> clusters, WritableIntegerDataStore assignment, int newA)
Update the cluster assignment.- Parameters:
iditer
- Object idclusters
- Cluster listassignment
- Assignment storagenewA
- New assignment.- Returns:
true
if the assignment has changed.
-
getExpectedRepDistance
protected double getExpectedRepDistance(NumberVector rep, DiscreteUncertainObject uo)
Get expected distance between a Vector and an uncertain object- Parameters:
rep
- A vector, e.g., a cluster representativeuo
- A discrete uncertain object- Returns:
- The distance
-
means
protected java.util.List<double[]> means(java.util.List<? extends ModifiableDBIDs> clusters, java.util.List<double[]> means, Relation<DiscreteUncertainObject> database)
Returns the mean vectors of the given clusters in the given database.- Parameters:
clusters
- the clusters to compute the meansmeans
- the recent meansdatabase
- the database containing the vectors- Returns:
- the mean vectors of the given clusters in the given database
-
logVarstat
protected void logVarstat(DoubleStatistic varstat, double[] varsum)
Log statistics on the variance sum.- Parameters:
varstat
- Statistics log instancevarsum
- Variance sum per cluster
-
-