Class UKMeans

  • All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<KMeansModel>>

    @Reference(authors="M. Chau, R. Cheng, B. Kao, J. Ng",
               title="Uncertain data mining: An example in clustering location data",
               booktitle="Proc. 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006)",
    public class UKMeans
    extends java.lang.Object
    implements ClusteringAlgorithm<Clustering<KMeansModel>>
    Uncertain K-Means clustering, using the average deviation from the center.

    Note: this method is, essentially, superficial. It was shown to be equivalent to doing regular K-means on the object centroids instead (see CKMeans for the reference and an implementation). This is only for completeness.


    M. Chau, R. Cheng, B. Kao, J. Ng
    Uncertain data mining: An example in clustering location data
    Proc. 10th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD)

    Klaus Arthur Schmidt
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
      • KEY

        private static final java.lang.String KEY
        Key for statistics logging.
      • k

        protected int k
        Number of cluster centers to initialize.
      • maxiter

        protected int maxiter
        Maximum number of iterations
    • Constructor Detail

      • UKMeans

        public UKMeans​(int k,
                       int maxiter,
                       RandomFactory rnd)
        k - Number of clusters
        maxiter - Maximum number of iterations
        rnd - Random initialization
    • Method Detail

      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in interface Algorithm
        Type restriction
      • assignToNearestCluster

        protected boolean assignToNearestCluster​(Relation<DiscreteUncertainObject> relation,
                                                 java.util.List<double[]> means,
                                                 java.util.List<? extends ModifiableDBIDs> clusters,
                                                 WritableIntegerDataStore assignment,
                                                 double[] varsum)
        Returns a list of clusters. The kth cluster contains the ids of those FeatureVectors, that are nearest to the kth mean.
        relation - the database to cluster
        means - a list of k means
        clusters - cluster assignment
        assignment - Current cluster assignment
        varsum - Variance sum output
        true when the object was reassigned
      • updateAssignment

        protected boolean updateAssignment​(DBIDIter iditer,
                                           java.util.List<? extends ModifiableDBIDs> clusters,
                                           WritableIntegerDataStore assignment,
                                           int newA)
        Update the cluster assignment.
        iditer - Object id
        clusters - Cluster list
        assignment - Assignment storage
        newA - New assignment.
        true if the assignment has changed.
      • getExpectedRepDistance

        protected double getExpectedRepDistance​(NumberVector rep,
                                                DiscreteUncertainObject uo)
        Get expected distance between a Vector and an uncertain object
        rep - A vector, e.g., a cluster representative
        uo - A discrete uncertain object
        The distance
      • means

        protected java.util.List<double[]> means​(java.util.List<? extends ModifiableDBIDs> clusters,
                                                 java.util.List<double[]> means,
                                                 Relation<DiscreteUncertainObject> database)
        Returns the mean vectors of the given clusters in the given database.
        clusters - the clusters to compute the means
        means - the recent means
        database - the database containing the vectors
        the mean vectors of the given clusters in the given database
      • logVarstat

        protected void logVarstat​(DoubleStatistic varstat,
                                  double[] varsum)
        Log statistics on the variance sum.
        varstat - Statistics log instance
        varsum - Variance sum per cluster