Class PairCounting


  • public class PairCounting
    extends java.lang.Object
    Pair-counting measures, with support for "noise" clusters and self-pairing support.

    Implementation note: this implementation will either use n² or n(n-1) pairs for each cluster intersection; which means we use ordered pairs. In literature, you will often find (n choose 2) pairs, which differs by a factor of 2, but this factor will cancel out everywhere anyway. The raw pair counts are not exposed as an API, only the derived. The Mirkin index removes this factor of 2.

    Since:
    0.5.0
    Author:
    Erich Schubert
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected long inBoth
      Pairs in both clusterings.
      protected long inFirst
      Pairs in first clustering only.
      protected long inNone
      Pairs in neither clusterings.
      protected long inSecond
      Pairs in second clustering only.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      double adjustedRandIndex()
      Computes the adjusted Rand index (ARI).
      double f1Measure()
      Get the pair-counting F1-Measure.
      double fMeasure​(double beta)
      Get the pair-counting F-Measure
      double fowlkesMallows()
      Computes the pair-counting Fowlkes-mallows (flat only, non-hierarchical!)
      double jaccard()
      Computes the Jaccard index
      long mirkin()
      Computes the Mirkin index, aka Equivalence Mismatch Distance.
      double precision()
      Computes the pair-counting precision.
      double randIndex()
      Computes the Rand index (RI).
      double recall()
      Computes the pair-counting recall.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • inBoth

        protected long inBoth
        Pairs in both clusterings.
      • inFirst

        protected long inFirst
        Pairs in first clustering only.
      • inSecond

        protected long inSecond
        Pairs in second clustering only.
      • inNone

        protected long inNone
        Pairs in neither clusterings.
    • Method Detail

      • fMeasure

        public double fMeasure​(double beta)
        Get the pair-counting F-Measure
        Parameters:
        beta - Beta value.
        Returns:
        F-Measure
      • f1Measure

        public double f1Measure()
        Get the pair-counting F1-Measure.
        Returns:
        F1-Measure
      • precision

        public double precision()
        Computes the pair-counting precision.
        Returns:
        pair-counting precision
      • recall

        public double recall()
        Computes the pair-counting recall.
        Returns:
        pair-counting recall
      • fowlkesMallows

        @Reference(authors="E. B. Fowlkes, C. L. Mallows",
                   title="A method for comparing two hierarchical clusterings",
                   booktitle="Journal of the American Statistical Association, Vol. 78 Issue 383",
                   url="https://doi.org/10.2307/2288117",
                   bibkey="doi:10.2307/2288117")
        public double fowlkesMallows()
        Computes the pair-counting Fowlkes-mallows (flat only, non-hierarchical!)

        E. B. Fowlkes, C. L. Mallows
        A method for comparing two hierarchical clusterings
        In: Journal of the American Statistical Association, Vol. 78 Issue 383

        Returns:
        pair-counting Fowlkes-mallows
      • randIndex

        @Reference(authors="W. M. Rand",
                   title="Objective Criteria for the Evaluation of Clustering Methods",
                   booktitle="Journal of the American Statistical Association, Vol. 66 Issue 336",
                   url="https://doi.org/10.2307/2284239",
                   bibkey="doi:10.2307/2284239")
        public double randIndex()
        Computes the Rand index (RI).

        W. M. Rand
        Objective Criteria for the Evaluation of Clustering Methods
        Journal of the American Statistical Association, Vol. 66 Issue 336

        Returns:
        The Rand index (RI).
      • adjustedRandIndex

        @Reference(authors="L. Hubert, P. Arabie",
                   title="Comparing partitions",
                   booktitle="Journal of Classification 2(193)",
                   url="https://doi.org/10.1007/BF01908075",
                   bibkey="doi:10.1007/BF01908075")
        public double adjustedRandIndex()
        Computes the adjusted Rand index (ARI).

        L. Hubert, P. Arabie
        Comparing partitions.
        Journal of Classification 2(193)

        Returns:
        The adjusted Rand index (ARI).
      • jaccard

        @Reference(authors="P. Jaccard",
                   title="Distribution de la florine alpine dans la Bassin de Dranses et dans quelques regiones voisines",
                   booktitle="Bulletin del la Soci\u00e9t\u00e9 Vaudoise des Sciences Naturelles",
                   url="http://data.rero.ch/01-R241574160",
                   bibkey="journals/misc/Jaccard1902")
        public double jaccard()
        Computes the Jaccard index

        P. Jaccard
        Distribution de la florine alpine dans la Bassin de Dranses et dans quelques regiones voisines
        Bulletin del la Société Vaudoise des Sciences Naturelles

        Returns:
        The Jaccard index
      • mirkin

        @Reference(authors="B. Mirkin",
                   title="Mathematical Classification and Clustering",
                   booktitle="Nonconvex Optimization and Its Applications",
                   url="https://doi.org/10.1007/978-1-4613-0457-9",
                   bibkey="doi:10.1007/978-1-4613-0457-9")
        public long mirkin()
        Computes the Mirkin index, aka Equivalence Mismatch Distance.

        This is a multiple of the Rand index.

        B. Mirkin
        Mathematical Classification and Clustering
        Nonconvex Optimization and Its Applications

        Returns:
        The Mirkin index