Class DependencyDerivator<V extends NumberVector>

  • Type Parameters:
    V - the type of FeatureVector handled by this Algorithm
    All Implemented Interfaces:
    Algorithm

    @Title("Dependency Derivator: Deriving numerical inter-dependencies on data")
    @Description("Derives an equality-system describing dependencies between attributes in a correlation-cluster")
    @Reference(authors="Elke Achtert, Christian B\u00f6hm, Hans-Peter Kriegel, Peer Kr\u00f6ger, Arthur Zimek",
               title="Deriving Quantitative Dependencies for Correlation Clusters",
               booktitle="Proc. 12th Int. Conf. on Knowledge Discovery and Data Mining (KDD \'06)",
               url="https://doi.org/10.1145/1150402.1150408",
               bibkey="DBLP:conf/kdd/AchtertBKKZ06")
    @Priority(-5)
    public class DependencyDerivator<V extends NumberVector>
    extends java.lang.Object
    implements Algorithm
    Dependency derivator computes quantitatively linear dependencies among attributes of a given dataset based on a linear correlation PCA.

    Reference:

    Elke Achtert, Christian Böhm, Hans-Peter Kriegel, Peer Kröger, Arthur Zimek
    Deriving Quantitative Dependencies for Correlation Clusters
    Proc. 12th Int. Conf. on Knowledge Discovery and Data Mining (KDD '06)

    Since:
    0.1
    Author:
    Arthur Zimek
    • Field Detail

      • LOG

        private static final Logging LOG
        The logger for this class.
      • sampleSize

        private final int sampleSize
        The number of samples to draw.
      • pca

        private final PCARunner pca
        Holds the object performing the pca.
      • filter

        private final EigenPairFilter filter
        Filter to select eigenvectors.
      • nf

        private final java.text.NumberFormat nf
        Number format for output of solution.
      • randomsample

        private final boolean randomsample
        Flag for random sampling vs. kNN
    • Constructor Detail

      • DependencyDerivator

        public DependencyDerivator​(NumberVectorDistance<? super V> distance,
                                   java.text.NumberFormat nf,
                                   PCARunner pca,
                                   EigenPairFilter filter,
                                   int sampleSize,
                                   boolean randomsample)
        Constructor.
        Parameters:
        distance - distance function
        nf - Number format
        pca - PCA runner
        filter - Eigenvector filter
        sampleSize - sample size
        randomsample - flag for random sampling
    • Method Detail

      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in interface Algorithm
        Returns:
        Type restriction
      • run

        public CorrelationAnalysisSolution run​(Relation<V> relation)
        Computes quantitatively linear dependencies among the attributes of the given database based on a linear correlation PCA.
        Parameters:
        relation - the relation to process
        Returns:
        the CorrelationAnalysisSolution computed by this DependencyDerivator
      • generateModel

        public CorrelationAnalysisSolution generateModel​(Relation<V> db,
                                                         DBIDs ids)
        Runs the pca on the given set of IDs. The centroid is computed from the given ids.
        Parameters:
        db - the database
        ids - the set of ids
        Returns:
        a matrix of equations describing the dependencies
      • generateModel

        public CorrelationAnalysisSolution generateModel​(Relation<V> relation,
                                                         DBIDs ids,
                                                         double[] centroid)
        Runs the pca on the given set of IDs and for the given centroid.
        Parameters:
        relation - the database
        ids - the set of ids
        centroid - the centroid
        Returns:
        a matrix of equations describing the dependencies