Package elki.math

Class MeanVariance

  • Direct Known Subclasses:
    MeanVarianceMinMax

    @Reference(authors="Erich Schubert, Michael Gertz",title="Numerically Stable Parallel Computation of (Co-)Variance",booktitle="Proc. 30th Int. Conf. Scientific and Statistical Database Management (SSDBM 2018)",url="https://doi.org/10.1145/3221269.3223036",bibkey="DBLP:conf/ssdbm/SchubertG18") @Reference(authors="E. A. Youngs, E. M. Cramer",title="Some Results Relevant to Choice of Sum and Sum-of-Product Algorithms",booktitle="Technometrics 13(3)",url="https://doi.org/10.1080/00401706.1971.10488826",bibkey="doi:10.1080/00401706.1971.10488826") @Reference(authors="B. P. Welford",title="Note on a method for calculating corrected sums of squares and products",booktitle="Technometrics 4(3)",url="https://doi.org/10.2307/1266577",bibkey="doi:10.2307/1266577") @Reference(authors="D. H. D. West",title="Updating Mean and Variance Estimates: An Improved Method",booktitle="Communications of the ACM 22(9)",url="https://doi.org/10.1145/359146.359153",bibkey="DBLP:journals/cacm/West79")
    public class MeanVariance
    extends Mean
    Do some simple statistics (mean, variance) using a numerically stable online algorithm.

    This class can repeatedly be fed with data using the add() methods, the resulting values for mean and average can be queried at any time using Mean.getMean() and getSampleVariance().

    Make sure you have understood variance correctly when using getPopulationVariance() - since this class is fed with samples and estimates the mean from the samples, getSampleVariance() is often the more appropriate version.

    As experimentally studied in

    Erich Schubert, Michael Gertz
    Numerically Stable Parallel Computation of (Co-)Variance
    Proc. 30th Int. Conf. Scientific and Statistical Database Management (SSDBM 2018)

    the current approach is based on:

    E. A. Youngs and E. M. Cramer
    Some Results Relevant to Choice of Sum and Sum-of-Product Algorithms
    Technometrics 13(3), 1971

    We have originally experimented with:

    B. P. Welford
    Note on a method for calculating corrected sums of squares and products
    Technometrics 4(3), 1962

    D. H. D. West
    Updating Mean and Variance Estimates: An Improved Method
    Communications of the ACM 22(9)

    Since:
    0.2
    Author:
    Erich Schubert
    • Field Detail

      • m2

        protected double m2
        n times Variance
    • Constructor Detail

      • MeanVariance

        public MeanVariance()
        Empty constructor
      • MeanVariance

        public MeanVariance​(MeanVariance other)
        Constructor from other instance
        Parameters:
        other - other instance to copy data from.
    • Method Detail

      • put

        public void put​(double val)
        Add a single value with weight 1.0
        Overrides:
        put in class Mean
        Parameters:
        val - Value
      • put

        public void put​(double val,
                        double weight)
        Add data with a given weight.
        Overrides:
        put in class Mean
        Parameters:
        val - data
        weight - weight
      • put

        public void put​(Mean other)
        Join the data of another MeanVariance instance.
        Overrides:
        put in class Mean
        Parameters:
        other - Data to join with
      • put

        public MeanVariance put​(double[] vals)
        Add values with weight 1.0
        Overrides:
        put in class Mean
        Parameters:
        vals - Values
        Returns:
        this
      • put

        public MeanVariance put​(double[] vals,
                                double[] weights)
        Description copied from class: Mean
        Add values with weight 1.0
        Overrides:
        put in class Mean
        Parameters:
        vals - Values
        Returns:
        this
      • getPopulationVariance

        public double getPopulationVariance()
        Return the population variance (scaled by 1/N).

        Note: often you should be using getSampleVariance() instead!

        Returns:
        variance
      • getSampleVariance

        public double getSampleVariance()
        Return sample variance (scaled by 1/(N-1)).
        Returns:
        sample variance
      • getSumOfSquares

        public double getSumOfSquares()
        Get the sum of squares.
        Returns:
        sum of squared deviations
      • getPopulationStddev

        public double getPopulationStddev()
        Return standard deviation using the population variance (scaled by 1/N).

        Note: often, you should be using getSampleStddev() instead!

        Returns:
        stddev
      • getSampleStddev

        public double getSampleStddev()
        Return sample standard deviation (scaled by 1/(N-1)).
        Returns:
        stddev
      • newArray

        public static MeanVariance[] newArray​(int dimensionality)
        Create and initialize a new array of MeanVariance.
        Parameters:
        dimensionality - Dimensionality
        Returns:
        New and initialized Array
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class Mean
      • reset

        public MeanVariance reset()
        Description copied from class: Mean
        Reset the value.
        Overrides:
        reset in class Mean
        Returns:
        this accumulator
      • of

        public static MeanVariance[] of​(Relation<? extends NumberVector> relation)
        Compute the variances of a relation.
        Parameters:
        relation - Data relation
        Returns:
        Variances