Class RepresentativeUncertainClustering

  • All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<?>>

    @Reference(authors="Andreas Z\u00fcfle, Tobias Emrich, Klaus Arthur Schmid, Nikos Mamoulis, Arthur Zimek, Mathias Renz",
               title="Representative clustering of uncertain data",
               booktitle="Proc. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
               url="https://doi.org/10.1145/2623330.2623725",
               bibkey="DBLP:conf/kdd/ZufleESMZR14")
    public class RepresentativeUncertainClustering
    extends java.lang.Object
    implements ClusteringAlgorithm<Clustering<?>>
    Representative clustering of uncertain data.

    This algorithm clusters uncertain data by repeatedly sampling a possible world, then running a traditional clustering algorithm on this sample.

    The resulting "possible" clusterings are then clustered themselves, using a clustering similarity measure. This yields a number of representatives for the set of all possible worlds.

    Reference:

    Andreas Züfle, Tobias Emrich, Klaus Arthur Schmid, Nikos Mamoulis, Arthur Zimek, Mathias Renz
    Representative clustering of uncertain data
    In Proc. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    Since:
    0.7.0
    Author:
    Alexander Koos, Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Initialize a Logger.
      • metaAlgorithm

        protected ClusteringAlgorithm<?> metaAlgorithm
        The algorithm for meta-clustering.
      • samplesAlgorithm

        protected ClusteringAlgorithm<?> samplesAlgorithm
        The algorithm to be wrapped and run.
      • numsamples

        protected int numsamples
        How many clusterings shall be made for aggregation.
      • random

        protected RandomFactory random
        Random factory for sampling.
      • alpha

        protected double alpha
        Alpha parameter for confidence.
      • keep

        protected boolean keep
        Keep all samples (not only the representative results)
    • Constructor Detail

      • RepresentativeUncertainClustering

        public RepresentativeUncertainClustering​(ClusteringDistanceSimilarity distance,
                                                 ClusteringAlgorithm<?> metaAlgorithm,
                                                 ClusteringAlgorithm<?> samplesAlgorithm,
                                                 int numsamples,
                                                 RandomFactory random,
                                                 double alpha,
                                                 boolean keep)
        Constructor, quite trivial.
        Parameters:
        distance - Distance function for meta clustering
        metaAlgorithm - Meta clustering algorithm
        samplesAlgorithm - Primary clustering algorithm
        numsamples - Number of samples
        alpha - Alpha confidence
        keep - Keep all samples (not only the representative results).
    • Method Detail

      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in interface Algorithm
        Returns:
        Type restriction
      • run

        public Clustering<?> run​(Database database,
                                 Relation<? extends UncertainObject> relation)
        This run method will do the wrapping.
        Parameters:
        database - Database
        relation - Data relation of uncertain objects
        Returns:
        Clustering result
      • computeConfidence

        private double computeConfidence​(int support,
                                         int samples)
        Estimate the confidence probability of a clustering.
        Parameters:
        support - Number of supporting samples
        samples - Total samples
        Returns:
        Probability
      • runClusteringAlgorithm

        protected Clustering<?> runClusteringAlgorithm​(java.lang.Object parent,
                                                       DBIDs ids,
                                                       DataStore<DoubleVector> store,
                                                       int dim,
                                                       java.lang.String title)
        Run a clustering algorithm on a single instance.
        Parameters:
        parent - Parent result to attach to
        ids - Object IDs to process
        store - Input data
        dim - Dimensionality
        title - Title of relation
        Returns:
        Clustering result