Package elki.math

Class PearsonCorrelation


  • public class PearsonCorrelation
    extends java.lang.Object
    Class to compute the Pearson correlation coefficient (PCC) also known as Pearson product-moment correlation coefficient (PPMCC).

    This computes Var(X), Var(Y) and Cov(X, Y), all of which can be obtained from this class. If you need more than two variables, use CovarianceMatrix which uses slightly more memory (by using arrays) but essentially does the same.

    This method used a numerically more stable approach than the popular \( E[XY]-E[X]E[Y] \) based version.

    Since:
    0.5.0
    Author:
    Erich Schubert
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private double sumWe
      Weight sum.
      private double sumX
      Current mean for X and Y.
      private double sumXX
      Aggregation for squared residuals - we are not using sum-of-squares!
      private double sumXY
      Aggregation for squared residuals - we are not using sum-of-squares!
      private double sumY
      Current mean for X and Y.
      private double sumYY
      Aggregation for squared residuals - we are not using sum-of-squares!
    • Field Detail

      • sumXX

        private double sumXX
        Aggregation for squared residuals - we are not using sum-of-squares!
      • sumYY

        private double sumYY
        Aggregation for squared residuals - we are not using sum-of-squares!
      • sumXY

        private double sumXY
        Aggregation for squared residuals - we are not using sum-of-squares!
      • sumX

        private double sumX
        Current mean for X and Y.
      • sumY

        private double sumY
        Current mean for X and Y.
      • sumWe

        private double sumWe
        Weight sum.
    • Constructor Detail

      • PearsonCorrelation

        public PearsonCorrelation()
        Constructor.
    • Method Detail

      • put

        public void put​(double x,
                        double y,
                        double w)
        Put a single value into the correlation statistic.
        Parameters:
        x - Value in X
        y - Value in Y
        w - Weight
      • put

        public void put​(double x,
                        double y)
        Put a single value into the correlation statistic.
        Parameters:
        x - Value in X
        y - Value in Y
      • getCorrelation

        public double getCorrelation()
        Get the Pearson correlation value.
        Returns:
        Correlation value
      • getCount

        public double getCount()
        Get the number of points the average is based on.
        Returns:
        number of data points
      • getMeanX

        public double getMeanX()
        Return mean of X
        Returns:
        mean
      • getMeanY

        public double getMeanY()
        Return mean of Y
        Returns:
        mean
      • getNaiveCovariance

        public double getNaiveCovariance()
        Get the covariance of X and Y (not taking sampling into account)
        Returns:
        Covariance
      • getSampleCovariance

        public double getSampleCovariance()
        Get the covariance of X and Y (with sampling correction)
        Returns:
        Covariance
      • getPopulationVarianceX

        public double getPopulationVarianceX()
        Return the naive variance (not taking sampling into account)

        Note: often you should be using getSampleVarianceX() instead!

        Returns:
        variance
      • getSampleVarianceX

        public double getSampleVarianceX()
        Return sample variance.
        Returns:
        sample variance
      • getPopulationStddevX

        public double getPopulationStddevX()
        Return standard deviation using the non-sample variance

        Note: often you should be using getSampleStddevX() instead!

        Returns:
        standard deviation
      • getSampleStddevX

        public double getSampleStddevX()
        Return standard deviation
        Returns:
        standard deviation
      • getPopulationVarianceY

        public double getPopulationVarianceY()
        Return the naive variance (not taking sampling into account)

        Note: often you should be using getSampleVarianceY() instead!

        Returns:
        variance
      • getSampleVarianceY

        public double getSampleVarianceY()
        Return sample variance.
        Returns:
        sample variance
      • getPopulationStddevY

        public double getPopulationStddevY()
        Return standard deviation using the non-sample variance

        Note: often you should be using getSampleStddevY() instead!

        Returns:
        stddev
      • getSampleStddevY

        public double getSampleStddevY()
        Return standard deviation
        Returns:
        stddev
      • reset

        public void reset()
        Reset the value.
      • coefficient

        public static double coefficient​(double[] x,
                                         double[] y)
        Compute the Pearson product-moment correlation coefficient.
        Parameters:
        x - first data array
        y - second data array
        Returns:
        the Pearson product-moment correlation coefficient for x and y
      • coefficient

        public static double coefficient​(NumberVector x,
                                         NumberVector y)
        Compute the Pearson product-moment correlation coefficient for two NumberVectors.
        Parameters:
        x - first NumberVector
        y - second NumberVector
        Returns:
        the Pearson product-moment correlation coefficient for x and y
      • weightedCoefficient

        public static double weightedCoefficient​(double[] x,
                                                 double[] y,
                                                 double[] weights)
        Compute the Pearson product-moment correlation coefficient.
        Parameters:
        x - first data array
        y - second data array
        weights - Weights
        Returns:
        the Pearson product-moment correlation coefficient for x and y
      • weightedCoefficient

        public static double weightedCoefficient​(NumberVector x,
                                                 NumberVector y,
                                                 double[] weights)
        Compute the Pearson product-moment correlation coefficient for two NumberVectors.
        Parameters:
        x - first NumberVector
        y - second NumberVector
        weights - Weights
        Returns:
        the Pearson product-moment correlation coefficient for x and y
      • weightedCoefficient

        public static double weightedCoefficient​(NumberVector x,
                                                 NumberVector y,
                                                 NumberVector weights)
        Compute the Pearson product-moment correlation coefficient,
        Parameters:
        x - first NumberVector
        y - second NumerVector
        weights - Weights
        Returns:
        the Pearson product-moment correlation coefficient for x and y