Class RepresentativeUncertainClustering

  • All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<?>>

    @Reference(authors="Andreas Z\u00fcfle, Tobias Emrich, Klaus Arthur Schmid, Nikos Mamoulis, Arthur Zimek, Mathias Renz",
               title="Representative clustering of uncertain data",
               booktitle="Proc. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
               url="https://doi.org/10.1145/2623330.2623725",
               bibkey="DBLP:conf/kdd/ZufleESMZR14")
    public class RepresentativeUncertainClustering
    extends java.lang.Object
    implements ClusteringAlgorithm<Clustering<?>>
    Representative clustering of uncertain data.

    This algorithm clusters uncertain data by repeatedly sampling a possible world, then running a traditional clustering algorithm on this sample.

    The resulting "possible" clusterings are then clustered themselves, using a clustering similarity measure. This yields a number of representatives for the set of all possible worlds.

    Reference:

    Andreas Z├╝fle, Tobias Emrich, Klaus Arthur Schmid, Nikos Mamoulis, Arthur Zimek, Mathias Renz
    Representative clustering of uncertain data
    In Proc. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    Since:
    0.7.0
    Author:
    Alexander Koos, Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Initialize a Logger.
      • metaAlgorithm

        protected ClusteringAlgorithm<?> metaAlgorithm
        The algorithm for meta-clustering.
      • samplesAlgorithm

        protected ClusteringAlgorithm<?> samplesAlgorithm
        The algorithm to be wrapped and run.
      • numsamples

        protected int numsamples
        How many clusterings shall be made for aggregation.
      • random

        protected RandomFactory random
        Random factory for sampling.
      • alpha

        protected double alpha
        Alpha parameter for confidence.
      • keep

        protected boolean keep
        Keep all samples (not only the representative results)
    • Constructor Detail

      • RepresentativeUncertainClustering

        public RepresentativeUncertainClustering​(ClusteringDistanceSimilarity distance,
                                                 ClusteringAlgorithm<?> metaAlgorithm,
                                                 ClusteringAlgorithm<?> samplesAlgorithm,
                                                 int numsamples,
                                                 RandomFactory random,
                                                 double alpha,
                                                 boolean keep)
        Constructor, quite trivial.
        Parameters:
        distance - Distance function for meta clustering
        metaAlgorithm - Meta clustering algorithm
        samplesAlgorithm - Primary clustering algorithm
        numsamples - Number of samples
        alpha - Alpha confidence
        keep - Keep all samples (not only the representative results).
    • Method Detail

      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in interface Algorithm
        Returns:
        Type restriction
      • run

        public Clustering<?> run​(Database database,
                                 Relation<? extends UncertainObject> relation)
        This run method will do the wrapping.
        Parameters:
        database - Database
        relation - Data relation of uncertain objects
        Returns:
        Clustering result
      • computeConfidence

        private double computeConfidence​(int support,
                                         int samples)
        Estimate the confidence probability of a clustering.
        Parameters:
        support - Number of supporting samples
        samples - Total samples
        Returns:
        Probability
      • runClusteringAlgorithm

        protected Clustering<?> runClusteringAlgorithm​(java.lang.Object parent,
                                                       DBIDs ids,
                                                       DataStore<DoubleVector> store,
                                                       int dim,
                                                       java.lang.String title)
        Run a clustering algorithm on a single instance.
        Parameters:
        parent - Parent result to attach to
        ids - Object IDs to process
        store - Input data
        dim - Dimensionality
        title - Title of relation
        Returns:
        Clustering result