Class Entropy
- java.lang.Object
-
- elki.evaluation.clustering.Entropy
-
@Reference(authors="M. Meil\u0103",title="Comparing clusterings by the variation of information",booktitle="Learning theory and kernel machines",url="https://doi.org/10.1007/978-3-540-45167-9_14",bibkey="DBLP:conf/colt/Meila03") @Reference(authors="X. V. Nguyen, J. Epps, J. Bailey",title="Information theoretic measures for clusterings comparison: is a correction for chance necessary?",booktitle="Proc. 26th Ann. Int. Conf. on Machine Learning (ICML \'09)",url="https://doi.org/10.1145/1553374.1553511",bibkey="DBLP:conf/icml/NguyenEB09") @Reference(authors="S. Romano, J. Bailey, X. V. Nguyen, K. Verspoor",title="Standardized Mutual Information for Clustering Comparisons: One Step Further in Adjustment for Chance",booktitle="Proc. 31th Int. Conf. on Machine Learning (ICML 2014)",url="http://proceedings.mlr.press/v32/romano14.html",bibkey="DBLP:conf/icml/RomanoBNV14") public class Entropy extends java.lang.Object
Entropy based measures, implemented using natural logarithms.Computing the expected mutual information is well optimized, but nevertheless takes O(n) logarithm computations, for n up to 10000, we will perform this more expensive evaluation, for smaller values it will (currently) not be computed. In theory, as n grows, the expected value becomes 0 anyway.
FIXME: this measure does not yet support noise clusters.
Key References:
M. Meilă
Comparing clusterings by the variation of information
Learning theory and kernel machinesX. V. Nguyen, J. Epps, J. Bailey
Information theoretic measures for clusterings comparison: is a correction for chance necessary?
Proc. 26th Ann. Int. Conf. on Machine Learning (ICML '09)S. Romano, J. Bailey, X. V. Nguyen, K. Verspoor
Standardized Mutual Information for Clustering Comparisons: One Step Further in Adjustment for Chance
Proc. 31th Int. Conf. on Machine Learning (ICML 2014)- Since:
- 0.5.0
- Author:
- Sascha Goldhofer, Erich Schubert
-
-
Field Summary
Fields Modifier and Type Field Description protected double
entropyFirst
Entropy in firstprotected double
entropyJoint
Joint entropyprotected double
entropySecond
Entropy in secondprotected double
expectedMutualInformation
Expected mutual informationprotected double
mutualInformation
Mutual information (computed directly)protected double
variationOfInformation
Variation of information (computed directly)protected double
vibound
log(n), for bounding VI
-
Constructor Summary
Constructors Modifier Constructor Description protected
Entropy(ClusterContingencyTable table)
Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description double
adjustedArithmeticMI()
Get the adjusted mutual information using the arithmetic version.double
adjustedGeometricMI()
Get the adjusted mutual information using the geometric version.double
adjustedJointMI()
Get the adjusted mutual information using the joint version.double
adjustedMaxMI()
Get the adjusted mutual information using the max version.double
adjustedMinMI()
Get the adjusted mutual information using the min version.double
arithmeticNMI()
Get the arithmetic averaged normalized mutual information.private static double
computeEntropyFirst(int[][] contingency, int r, int c, double byn, double mlogn, double[] logs)
Compute entropy of first clustering.private static double
computeEntropySecond(int[][] contingency, int r, int c, double byn, double mlogn, double[] logs)
Compute entropy of second clustering.private void
computeMIFull(int[][] contingency, int r, int c, int n, int m, double byn, double mlogn, double[] logs)
Full computation of mutual information measures, including AMI/EMI.private void
computeMILarge(int[][] contingency, int r, int c, double byn, double mlogn)
Compute mutual information measures, but skip expensive computation of AMI/EMI for large data sets, where they do not differ much.double
conditionalEntropyFirst()
Get the conditional entropy of the first clustering (not normalized, 0 = equal).double
conditionalEntropySecond()
Get the conditional entropy of the first clustering (not normalized, 0 = equal).double
entropyFirst()
Get the entropy of the first clustering (not normalized, 0 = equal).double
entropyJoint()
Get the joint entropy of both clusterings (not normalized, 0 = equal).double
entropyPowers()
Get Powers entropy (normalized, 0 = equal) Powers = 1 - NMI_Sumdouble
entropySecond()
Get the entropy of the second clustering (not normalized, 0 = equal).double
expectedMutualInformation()
Get the expected mutual information.double
geometricNMI()
Get the geometric mean normalized mutual information (using the square root).double
jointNMI()
Get the joint-normalized mutual information.private static double
lfac(int i, double[] lfac)
Get log(fac(i)) from the cache.private static double
log(int i, double[] logs)
Cached access to log values.private static int
maxClusterSize(int[][] contingency, int r, int c)
Get the maximum cluster size of a contingency table.double
maxNMI()
Get the max-normalized mutual information.double
minNMI()
Get the min-normalized mutual information.double
mutualInformation()
Get the mutual information (not normalized, small values are good).double
normalizedInformationDistance()
Get the normalized information distance (normalized, small values are good).double
normalizedVariationOfInformation()
Get the normalized variation of information (normalized, small values are good).double
upperBoundMI()
Get an upper bound for the mutual information (for scaling).double
upperBoundVI()
Get an upper bound for the VI (for scaling).double
variationOfInformation()
Get the variation of information (not normalized, small values are good).
-
-
-
Field Detail
-
entropyFirst
protected double entropyFirst
Entropy in first
-
entropySecond
protected double entropySecond
Entropy in second
-
entropyJoint
protected double entropyJoint
Joint entropy
-
mutualInformation
protected double mutualInformation
Mutual information (computed directly)
-
variationOfInformation
protected double variationOfInformation
Variation of information (computed directly)
-
expectedMutualInformation
protected double expectedMutualInformation
Expected mutual information
-
vibound
protected double vibound
log(n), for bounding VI
-
-
Constructor Detail
-
Entropy
protected Entropy(ClusterContingencyTable table)
Constructor.- Parameters:
table
- Contingency table
-
-
Method Detail
-
computeMILarge
private void computeMILarge(int[][] contingency, int r, int c, double byn, double mlogn)
Compute mutual information measures, but skip expensive computation of AMI/EMI for large data sets, where they do not differ much.- Parameters:
contingency
- Contingency tabler
- Rowsc
- Columnsbyn
- 1/N factormlogn
- -log(N)
-
computeMIFull
private void computeMIFull(int[][] contingency, int r, int c, int n, int m, double byn, double mlogn, double[] logs)
Full computation of mutual information measures, including AMI/EMI.- Parameters:
contingency
- Contingency tabler
- Rowsc
- Columnsn
- Total sizem
- Maximum cluster sizebyn
- 1/N factormlogn
- -log(N)logs
- Logarithm cache
-
log
private static double log(int i, double[] logs)
Cached access to log values.- Parameters:
i
- Parameterlogs
- Logarithm cache- Returns:
- Logarithm
-
lfac
private static double lfac(int i, double[] lfac)
Get log(fac(i)) from the cache.- Parameters:
i
- Factoriallfac
- Cache- Returns:
- Cached or computed value.
-
maxClusterSize
private static int maxClusterSize(int[][] contingency, int r, int c)
Get the maximum cluster size of a contingency table.- Parameters:
contingency
- Contingency tabler
- Rowsc
- Columns- Returns:
- Maximum
-
computeEntropyFirst
private static double computeEntropyFirst(int[][] contingency, int r, int c, double byn, double mlogn, double[] logs)
Compute entropy of first clustering.- Parameters:
contingency
- Contingency tabler
- Rowsc
- Columnsbyn
- 1 / Nmlogn
- -log(N)logs
- log value cache- Returns:
- entropy of first clustering
-
computeEntropySecond
private static double computeEntropySecond(int[][] contingency, int r, int c, double byn, double mlogn, double[] logs)
Compute entropy of second clustering.- Parameters:
contingency
- Contingency tabler
- Rowsc
- Columnsbyn
- 1 / Nmlogn
- -log(N)logs
- log value cache- Returns:
- entropy of second clustering
-
entropyFirst
public double entropyFirst()
Get the entropy of the first clustering (not normalized, 0 = equal).- Returns:
- Entropy of first clustering
-
entropySecond
public double entropySecond()
Get the entropy of the second clustering (not normalized, 0 = equal).- Returns:
- Entropy of second clustering
-
entropyJoint
public double entropyJoint()
Get the joint entropy of both clusterings (not normalized, 0 = equal).- Returns:
- Joint entropy of both clusterings
-
conditionalEntropyFirst
public double conditionalEntropyFirst()
Get the conditional entropy of the first clustering (not normalized, 0 = equal).- Returns:
- Conditional entropy of first clustering
-
conditionalEntropySecond
public double conditionalEntropySecond()
Get the conditional entropy of the first clustering (not normalized, 0 = equal).- Returns:
- Conditional entropy of second clustering
-
entropyPowers
public double entropyPowers()
Get Powers entropy (normalized, 0 = equal) Powers = 1 - NMI_Sum- Returns:
- Powers
-
mutualInformation
public double mutualInformation()
Get the mutual information (not normalized, small values are good). Beware of the interpretability issues, so use an adjusted or at least normalized version instead.- Returns:
- Mutual information
-
upperBoundMI
public double upperBoundMI()
Get an upper bound for the mutual information (for scaling).- Returns:
- Upper bound, using the individual entropies
-
jointNMI
@Reference(authors="Y. Y. Yao", title="Information-Theoretic Measures for Knowledge Discovery and Data Mining", booktitle="Entropy Measures, Maximum Entropy Principle and Emerging Applications", url="https://doi.org/10.1007/978-3-540-36212-8_6", bibkey="doi:10.1007/978-3-540-36212-8_6") public double jointNMI()
Get the joint-normalized mutual information. Values close to 1 are good.Reference:
Y. Y. Yao
Information-Theoretic Measures for Knowledge Discovery and Data Mining
Entropy Measures, Maximum Entropy Principle and Emerging Applications- Returns:
- Joint Normalized Mutual information
-
minNMI
@Reference(authors="Tarald O. Kv\u00e5lseth", title="Entropy and Correlation: Some Comments", booktitle="IEEE Trans. Systems, Man, and Cybernetics 17(3)", url="https://doi.org/10.1109/TSMC.1987.4309069", bibkey="DBLP:journals/tsmc/Kvalseth87") public double minNMI()
Get the min-normalized mutual information. Values close to 1 are good.Reference:
T. O. Kvålseth
Entropy and Correlation: Some Comments
IEEE Trans. Systems, Man, and Cybernetics 17(3)- Returns:
- Min Normalized Mutual information
-
maxNMI
@Reference(authors="Tarald O. Kv\u00e5lseth", title="Entropy and Correlation: Some Comments", booktitle="IEEE Trans. Systems, Man, and Cybernetics 17(3)", url="https://doi.org/10.1109/TSMC.1987.4309069", bibkey="DBLP:journals/tsmc/Kvalseth87") public double maxNMI()
Get the max-normalized mutual information. Values close to 1 are good.Reference:
T. O. Kvålseth
Entropy and Correlation: Some Comments
IEEE Trans. Systems, Man, and Cybernetics 17(3)- Returns:
- Max Normalized Mutual information
-
arithmeticNMI
@Reference(authors="Tarald O. Kv\u00e5lseth",title="Entropy and Correlation: Some Comments",booktitle="IEEE Trans. Systems, Man, and Cybernetics 17(3)",url="https://doi.org/10.1109/TSMC.1987.4309069",bibkey="DBLP:journals/tsmc/Kvalseth87") @Reference(authors="A. Rosenberg, J. Hirschberg",title="V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure",booktitle="EMNLP-CoNLL 2007",url="https://www.aclweb.org/anthology/D07-1043/",bibkey="DBLP:conf/emnlp/RosenbergH07") public double arithmeticNMI()
Get the arithmetic averaged normalized mutual information. Values close to 1 are good. This is equivalent to the V-Measure for beta=1.Reference:
T. O. Kvålseth
Entropy and Correlation: Some Comments
IEEE Trans. Systems, Man, and Cybernetics 17(3)A. Rosenberg, J. Hirschberg
V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure
EMNLP-CoNLL 2007- Returns:
- Sum Normalized Mutual information
-
geometricNMI
@Reference(authors="A. Strehl, J. Ghosh", title="Cluster Ensembles -A Knowledge Reuse Framework for Combining Multiple Partitions", booktitle="J. Mach. Learn. Res. 3", url="http://jmlr.org/papers/v3/strehl02a.html", bibkey="DBLP:journals/jmlr/StrehlG02") public double geometricNMI()
Get the geometric mean normalized mutual information (using the square root). Values close to 1 are good. This was proposed, for example, by:A. Strehl, J. Ghosh
Cluster Ensembles -A Knowledge Reuse Framework for Combining Multiple Partitions
J. Mach. Learn. Res. 3- Returns:
- Sqrt Normalized Mutual information
-
variationOfInformation
public double variationOfInformation()
Get the variation of information (not normalized, small values are good).- Returns:
- Variation of information
-
upperBoundVI
public double upperBoundVI()
Get an upper bound for the VI (for scaling).- Returns:
- Upper bound, based on the number of points and clusters.
-
normalizedVariationOfInformation
public double normalizedVariationOfInformation()
Get the normalized variation of information (normalized, small values are good). This is1-jointNMI()
.- Returns:
- Normalized Variation of information
-
normalizedInformationDistance
public double normalizedInformationDistance()
Get the normalized information distance (normalized, small values are good). This is1-maxNMI()
.- Returns:
- Normalized Variation of information
-
expectedMutualInformation
public double expectedMutualInformation()
Get the expected mutual information.- Returns:
- Expected mutual information
-
adjustedJointMI
public double adjustedJointMI()
Get the adjusted mutual information using the joint version.- Returns:
- Adjusted mutual information
-
adjustedArithmeticMI
public double adjustedArithmeticMI()
Get the adjusted mutual information using the arithmetic version.- Returns:
- Adjusted mutual information
-
adjustedGeometricMI
public double adjustedGeometricMI()
Get the adjusted mutual information using the geometric version.- Returns:
- Adjusted mutual information
-
adjustedMinMI
public double adjustedMinMI()
Get the adjusted mutual information using the min version.- Returns:
- Adjusted mutual information
-
adjustedMaxMI
public double adjustedMaxMI()
Get the adjusted mutual information using the max version.- Returns:
- Adjusted mutual information
-
-