Class WeightedCovarianceMatrixBuilder

  • All Implemented Interfaces:
    CovarianceMatrixBuilder

    @Title("Weighted Covariance Matrix / PCA")
    @Description("A PCA modification by using weights while building the covariance matrix, to obtain more stable results")
    @Reference(authors="Hans-Peter Kriegel, Peer Kr\u00f6ger, Erich Schubert, Arthur Zimek",
               title="A General Framework for Increasing the Robustness of PCA-based Correlation Clustering Algorithms",
               booktitle="Proc. 20th Intl. Conf. on Scientific and Statistical Database Management (SSDBM)",
               url="https://doi.org/10.1007/978-3-540-69497-7_27",
               bibkey="DBLP:conf/ssdbm/KriegelKSZ08")
    public class WeightedCovarianceMatrixBuilder
    extends java.lang.Object
    implements CovarianceMatrixBuilder
    CovarianceMatrixBuilder with weights.

    This builder uses a weight function to weight points differently during build a covariance matrix. Covariance can be canonically extended with weights, as shown in the article

    Reference:

    A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms
    Hans-Peter Kriegel and Peer Kröger and Erich Schubert and Arthur Zimek
    Proc. 20th Int. Conf. on Scientific and Statistical Database Management (SSDBM)

    Since:
    0.2
    Author:
    Erich Schubert
    • Field Detail

      • weightfunction

        protected WeightFunction weightfunction
        Holds the weight function.
    • Constructor Detail

      • WeightedCovarianceMatrixBuilder

        public WeightedCovarianceMatrixBuilder​(WeightFunction weightfunction)
        Constructor.
        Parameters:
        weightfunction - Weighting function
    • Method Detail

      • processIds

        public double[][] processIds​(DBIDs ids,
                                     Relation<? extends NumberVector> relation)
        Weighted Covariance Matrix for a set of IDs. Since we are not supplied any distance information, we'll need to compute it ourselves. Covariance is tied to Euclidean distance, so it probably does not make much sense to add support for other distance functions?
        Specified by:
        processIds in interface CovarianceMatrixBuilder
        Parameters:
        ids - Database ids to process
        relation - Relation to process
        Returns:
        Covariance matrix
      • processQueryResults

        public double[][] processQueryResults​(DoubleDBIDList results,
                                              Relation<? extends NumberVector> database,
                                              int k)
        Compute Covariance Matrix for a QueryResult Collection. By default it will just collect the ids and run processIds
        Specified by:
        processQueryResults in interface CovarianceMatrixBuilder
        Parameters:
        results - a collection of QueryResults
        database - the database used
        k - number of elements to process
        Returns:
        Covariance Matrix