Class CanopyPreClustering<O>

  • Type Parameters:
    O - Object type
    All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<PrototypeModel<O>>>

    @Reference(authors="A. McCallum, K. Nigam, L. H. Ungar",
               title="Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching",
               booktitle="Proc. 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining",
               url="https://doi.org/10.1145/347090.347123",
               bibkey="DBLP:conf/kdd/McCallumNU00")
    public class CanopyPreClustering<O>
    extends java.lang.Object
    implements ClusteringAlgorithm<Clustering<PrototypeModel<O>>>
    Canopy pre-clustering is a simple preprocessing step for clustering.

    Reference:

    A. McCallum, K. Nigam, L. H. Ungar
    Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching
    Proc. 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining

    Since:
    0.6.0
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
      • distance

        private Distance<? super O> distance
        Distance function used.
      • t1

        private double t1
        Threshold for inclusion
      • t2

        private double t2
        Threshold for removal
    • Constructor Detail

      • CanopyPreClustering

        public CanopyPreClustering​(Distance<? super O> distance,
                                   double t1,
                                   double t2)
        Constructor.
        Parameters:
        distance - Distance function
        t1 - Inclusion threshold
        t2 - Exclusion threshold