Class Entropy
- java.lang.Object
-
- elki.evaluation.clustering.Entropy
-
@Reference(authors="M. Meil\u0103",title="Comparing clusterings by the variation of information",booktitle="Learning theory and kernel machines",url="https://doi.org/10.1007/978-3-540-45167-9_14",bibkey="DBLP:conf/colt/Meila03") @Reference(authors="X. V. Nguyen, J. Epps, J. Bailey",title="Information theoretic measures for clusterings comparison: is a correction for chance necessary?",booktitle="Proc. 26th Ann. Int. Conf. on Machine Learning (ICML \'09)",url="https://doi.org/10.1145/1553374.1553511",bibkey="DBLP:conf/icml/NguyenEB09") @Reference(authors="S. Romano, J. Bailey, X. V. Nguyen, K. Verspoor",title="Standardized Mutual Information for Clustering Comparisons: One Step Further in Adjustment for Chance",booktitle="Proc. 31th Int. Conf. on Machine Learning (ICML 2014)",url="http://proceedings.mlr.press/v32/romano14.html",bibkey="DBLP:conf/icml/RomanoBNV14") public class Entropy extends java.lang.Object
Entropy based measures, implemented using natural logarithms.Computing the expected mutual information is well optimized, but nevertheless takes O(n) logarithm computations, for n up to 10000, we will perform this more expensive evaluation, for smaller values it will (currently) not be computed. In theory, as n grows, the expected value becomes 0 anyway.
FIXME: this measure does not yet support noise clusters.
Key References:
M. Meilă
Comparing clusterings by the variation of information
Learning theory and kernel machinesX. V. Nguyen, J. Epps, J. Bailey
Information theoretic measures for clusterings comparison: is a correction for chance necessary?
Proc. 26th Ann. Int. Conf. on Machine Learning (ICML '09)S. Romano, J. Bailey, X. V. Nguyen, K. Verspoor
Standardized Mutual Information for Clustering Comparisons: One Step Further in Adjustment for Chance
Proc. 31th Int. Conf. on Machine Learning (ICML 2014)- Since:
- 0.5.0
- Author:
- Sascha Goldhofer, Erich Schubert
-
-
Field Summary
Fields Modifier and Type Field Description protected doubleentropyFirstEntropy in firstprotected doubleentropyJointJoint entropyprotected doubleentropySecondEntropy in secondprotected doubleexpectedMutualInformationExpected mutual informationprotected doublemutualInformationMutual information (computed directly)protected doublevariationOfInformationVariation of information (computed directly)protected doubleviboundlog(n), for bounding VI
-
Constructor Summary
Constructors Modifier Constructor Description protectedEntropy(ClusterContingencyTable table)Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description doubleadjustedArithmeticMI()Get the adjusted mutual information using the arithmetic version.doubleadjustedGeometricMI()Get the adjusted mutual information using the geometric version.doubleadjustedJointMI()Get the adjusted mutual information using the joint version.doubleadjustedMaxMI()Get the adjusted mutual information using the max version.doubleadjustedMinMI()Get the adjusted mutual information using the min version.doublearithmeticNMI()Get the arithmetic averaged normalized mutual information.private static doublecomputeEntropyFirst(int[][] contingency, int r, int c, double byn, double mlogn, double[] logs)Compute entropy of first clustering.private static doublecomputeEntropySecond(int[][] contingency, int r, int c, double byn, double mlogn, double[] logs)Compute entropy of second clustering.private voidcomputeMIFull(int[][] contingency, int r, int c, int n, int m, double byn, double mlogn, double[] logs)Full computation of mutual information measures, including AMI/EMI.private voidcomputeMILarge(int[][] contingency, int r, int c, double byn, double mlogn)Compute mutual information measures, but skip expensive computation of AMI/EMI for large data sets, where they do not differ much.doubleconditionalEntropyFirst()Get the conditional entropy of the first clustering (not normalized, 0 = equal).doubleconditionalEntropySecond()Get the conditional entropy of the first clustering (not normalized, 0 = equal).doubleentropyFirst()Get the entropy of the first clustering (not normalized, 0 = equal).doubleentropyJoint()Get the joint entropy of both clusterings (not normalized, 0 = equal).doubleentropyPowers()Get Powers entropy (normalized, 0 = equal) Powers = 1 - NMI_SumdoubleentropySecond()Get the entropy of the second clustering (not normalized, 0 = equal).doubleexpectedMutualInformation()Get the expected mutual information.doublegeometricNMI()Get the geometric mean normalized mutual information (using the square root).doublejointNMI()Get the joint-normalized mutual information.private static doublelfac(int i, double[] lfac)Get log(fac(i)) from the cache.private static doublelog(int i, double[] logs)Cached access to log values.private static intmaxClusterSize(int[][] contingency, int r, int c)Get the maximum cluster size of a contingency table.doublemaxNMI()Get the max-normalized mutual information.doubleminNMI()Get the min-normalized mutual information.doublemutualInformation()Get the mutual information (not normalized, small values are good).doublenormalizedInformationDistance()Get the normalized information distance (normalized, small values are good).doublenormalizedVariationOfInformation()Get the normalized variation of information (normalized, small values are good).doubleupperBoundMI()Get an upper bound for the mutual information (for scaling).doubleupperBoundVI()Get an upper bound for the VI (for scaling).doublevariationOfInformation()Get the variation of information (not normalized, small values are good).
-
-
-
Field Detail
-
entropyFirst
protected double entropyFirst
Entropy in first
-
entropySecond
protected double entropySecond
Entropy in second
-
entropyJoint
protected double entropyJoint
Joint entropy
-
mutualInformation
protected double mutualInformation
Mutual information (computed directly)
-
variationOfInformation
protected double variationOfInformation
Variation of information (computed directly)
-
expectedMutualInformation
protected double expectedMutualInformation
Expected mutual information
-
vibound
protected double vibound
log(n), for bounding VI
-
-
Constructor Detail
-
Entropy
protected Entropy(ClusterContingencyTable table)
Constructor.- Parameters:
table- Contingency table
-
-
Method Detail
-
computeMILarge
private void computeMILarge(int[][] contingency, int r, int c, double byn, double mlogn)Compute mutual information measures, but skip expensive computation of AMI/EMI for large data sets, where they do not differ much.- Parameters:
contingency- Contingency tabler- Rowsc- Columnsbyn- 1/N factormlogn- -log(N)
-
computeMIFull
private void computeMIFull(int[][] contingency, int r, int c, int n, int m, double byn, double mlogn, double[] logs)Full computation of mutual information measures, including AMI/EMI.- Parameters:
contingency- Contingency tabler- Rowsc- Columnsn- Total sizem- Maximum cluster sizebyn- 1/N factormlogn- -log(N)logs- Logarithm cache
-
log
private static double log(int i, double[] logs)Cached access to log values.- Parameters:
i- Parameterlogs- Logarithm cache- Returns:
- Logarithm
-
lfac
private static double lfac(int i, double[] lfac)Get log(fac(i)) from the cache.- Parameters:
i- Factoriallfac- Cache- Returns:
- Cached or computed value.
-
maxClusterSize
private static int maxClusterSize(int[][] contingency, int r, int c)Get the maximum cluster size of a contingency table.- Parameters:
contingency- Contingency tabler- Rowsc- Columns- Returns:
- Maximum
-
computeEntropyFirst
private static double computeEntropyFirst(int[][] contingency, int r, int c, double byn, double mlogn, double[] logs)Compute entropy of first clustering.- Parameters:
contingency- Contingency tabler- Rowsc- Columnsbyn- 1 / Nmlogn- -log(N)logs- log value cache- Returns:
- entropy of first clustering
-
computeEntropySecond
private static double computeEntropySecond(int[][] contingency, int r, int c, double byn, double mlogn, double[] logs)Compute entropy of second clustering.- Parameters:
contingency- Contingency tabler- Rowsc- Columnsbyn- 1 / Nmlogn- -log(N)logs- log value cache- Returns:
- entropy of second clustering
-
entropyFirst
public double entropyFirst()
Get the entropy of the first clustering (not normalized, 0 = equal).- Returns:
- Entropy of first clustering
-
entropySecond
public double entropySecond()
Get the entropy of the second clustering (not normalized, 0 = equal).- Returns:
- Entropy of second clustering
-
entropyJoint
public double entropyJoint()
Get the joint entropy of both clusterings (not normalized, 0 = equal).- Returns:
- Joint entropy of both clusterings
-
conditionalEntropyFirst
public double conditionalEntropyFirst()
Get the conditional entropy of the first clustering (not normalized, 0 = equal).- Returns:
- Conditional entropy of first clustering
-
conditionalEntropySecond
public double conditionalEntropySecond()
Get the conditional entropy of the first clustering (not normalized, 0 = equal).- Returns:
- Conditional entropy of second clustering
-
entropyPowers
public double entropyPowers()
Get Powers entropy (normalized, 0 = equal) Powers = 1 - NMI_Sum- Returns:
- Powers
-
mutualInformation
public double mutualInformation()
Get the mutual information (not normalized, small values are good). Beware of the interpretability issues, so use an adjusted or at least normalized version instead.- Returns:
- Mutual information
-
upperBoundMI
public double upperBoundMI()
Get an upper bound for the mutual information (for scaling).- Returns:
- Upper bound, using the individual entropies
-
jointNMI
@Reference(authors="Y. Y. Yao", title="Information-Theoretic Measures for Knowledge Discovery and Data Mining", booktitle="Entropy Measures, Maximum Entropy Principle and Emerging Applications", url="https://doi.org/10.1007/978-3-540-36212-8_6", bibkey="doi:10.1007/978-3-540-36212-8_6") public double jointNMI()
Get the joint-normalized mutual information. Values close to 1 are good.Reference:
Y. Y. Yao
Information-Theoretic Measures for Knowledge Discovery and Data Mining
Entropy Measures, Maximum Entropy Principle and Emerging Applications- Returns:
- Joint Normalized Mutual information
-
minNMI
@Reference(authors="Tarald O. Kv\u00e5lseth", title="Entropy and Correlation: Some Comments", booktitle="IEEE Trans. Systems, Man, and Cybernetics 17(3)", url="https://doi.org/10.1109/TSMC.1987.4309069", bibkey="DBLP:journals/tsmc/Kvalseth87") public double minNMI()
Get the min-normalized mutual information. Values close to 1 are good.Reference:
T. O. Kvålseth
Entropy and Correlation: Some Comments
IEEE Trans. Systems, Man, and Cybernetics 17(3)- Returns:
- Min Normalized Mutual information
-
maxNMI
@Reference(authors="Tarald O. Kv\u00e5lseth", title="Entropy and Correlation: Some Comments", booktitle="IEEE Trans. Systems, Man, and Cybernetics 17(3)", url="https://doi.org/10.1109/TSMC.1987.4309069", bibkey="DBLP:journals/tsmc/Kvalseth87") public double maxNMI()
Get the max-normalized mutual information. Values close to 1 are good.Reference:
T. O. Kvålseth
Entropy and Correlation: Some Comments
IEEE Trans. Systems, Man, and Cybernetics 17(3)- Returns:
- Max Normalized Mutual information
-
arithmeticNMI
@Reference(authors="Tarald O. Kv\u00e5lseth",title="Entropy and Correlation: Some Comments",booktitle="IEEE Trans. Systems, Man, and Cybernetics 17(3)",url="https://doi.org/10.1109/TSMC.1987.4309069",bibkey="DBLP:journals/tsmc/Kvalseth87") @Reference(authors="A. Rosenberg, J. Hirschberg",title="V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure",booktitle="EMNLP-CoNLL 2007",url="https://www.aclweb.org/anthology/D07-1043/",bibkey="DBLP:conf/emnlp/RosenbergH07") public double arithmeticNMI()
Get the arithmetic averaged normalized mutual information. Values close to 1 are good. This is equivalent to the V-Measure for beta=1.Reference:
T. O. Kvålseth
Entropy and Correlation: Some Comments
IEEE Trans. Systems, Man, and Cybernetics 17(3)A. Rosenberg, J. Hirschberg
V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure
EMNLP-CoNLL 2007- Returns:
- Sum Normalized Mutual information
-
geometricNMI
@Reference(authors="A. Strehl, J. Ghosh", title="Cluster Ensembles -A Knowledge Reuse Framework for Combining Multiple Partitions", booktitle="J. Mach. Learn. Res. 3", url="http://jmlr.org/papers/v3/strehl02a.html", bibkey="DBLP:journals/jmlr/StrehlG02") public double geometricNMI()
Get the geometric mean normalized mutual information (using the square root). Values close to 1 are good. This was proposed, for example, by:A. Strehl, J. Ghosh
Cluster Ensembles -A Knowledge Reuse Framework for Combining Multiple Partitions
J. Mach. Learn. Res. 3- Returns:
- Sqrt Normalized Mutual information
-
variationOfInformation
public double variationOfInformation()
Get the variation of information (not normalized, small values are good).- Returns:
- Variation of information
-
upperBoundVI
public double upperBoundVI()
Get an upper bound for the VI (for scaling).- Returns:
- Upper bound, based on the number of points and clusters.
-
normalizedVariationOfInformation
public double normalizedVariationOfInformation()
Get the normalized variation of information (normalized, small values are good). This is1-jointNMI().- Returns:
- Normalized Variation of information
-
normalizedInformationDistance
public double normalizedInformationDistance()
Get the normalized information distance (normalized, small values are good). This is1-maxNMI().- Returns:
- Normalized Variation of information
-
expectedMutualInformation
public double expectedMutualInformation()
Get the expected mutual information.- Returns:
- Expected mutual information
-
adjustedJointMI
public double adjustedJointMI()
Get the adjusted mutual information using the joint version.- Returns:
- Adjusted mutual information
-
adjustedArithmeticMI
public double adjustedArithmeticMI()
Get the adjusted mutual information using the arithmetic version.- Returns:
- Adjusted mutual information
-
adjustedGeometricMI
public double adjustedGeometricMI()
Get the adjusted mutual information using the geometric version.- Returns:
- Adjusted mutual information
-
adjustedMinMI
public double adjustedMinMI()
Get the adjusted mutual information using the min version.- Returns:
- Adjusted mutual information
-
adjustedMaxMI
public double adjustedMaxMI()
Get the adjusted mutual information using the max version.- Returns:
- Adjusted mutual information
-
-