Class SURFINGDependence

  • All Implemented Interfaces:
    Dependence

    @Reference(authors="Elke Achtert, Hans-Peter Kriegel, Erich Schubert, Arthur Zimek",title="Interactive Data Mining with 3D-Parallel-Coordinate-Trees",booktitle="Proc. 2013 ACM Int. Conf. on Management of Data (SIGMOD 2013)",url="https://doi.org/10.1145/2463676.2463696",bibkey="DBLP:conf/sigmod/AchtertKSZ13") @Reference(authors="Christian Baumgartner, Claudia Plant, Karin Kailing, Hans-Peter Kriegel, Peer Kr\u00f6ger",title="Subspace Selection for Clustering High-Dimensional Data",booktitle="Proc. IEEE International Conference on Data Mining (ICDM 2004)",url="https://doi.org/10.1109/ICDM.2004.10112",bibkey="DBLP:conf/icdm/BaumgartnerPKKK04")
    @Priority(-100)
    public class SURFINGDependence
    extends java.lang.Object
    implements Dependence
    Compute the similarity of dimensions using the SURFING score. The parameter k for the k nearest neighbors is currently hard-coded to 10% of the set size.

    Note that the complexity is roughly O(n n k), so this is a rather slow method, and with k at 10% of n, is actually cubic: O(0.1 * n²).

    This version cannot use index support, as the API operates without database attachment. However, it should be possible to implement some trivial sorted-list indexes to get a reasonable speedup!

    Reference:

    Elke Achtert, Hans-Peter Kriegel, Erich Schubert, Arthur Zimek
    Interactive Data Mining with 3D-Parallel-Coordinate-Trees
    Proc. 2013 ACM Int. Conf. on Management of Data (SIGMOD 2013)

    Based on:

    Christian Baumgartner, Claudia Plant, Karin Kailing, Hans-Peter Kriegel, Peer Kröger
    Subspace Selection for Clustering High-Dimensional Data
    Proc. IEEE International Conference on Data Mining (ICDM 2004)

    TODO: make the subspace distance function and k parameterizable.

    TODO: results are not convincing, maybe try inserting points.

    Since:
    0.5.5
    Author:
    Robert Rödler, Erich Schubert
    • Constructor Detail

      • SURFINGDependence

        protected SURFINGDependence()
        Constructor. Use static instance instead!
    • Method Detail

      • dependence

        public <A,​B> double dependence​(NumberArrayAdapter<?,​A> adapter1,
                                             A data1,
                                             NumberArrayAdapter<?,​B> adapter2,
                                             B data2)
        Description copied from interface: Dependence
        Measure the dependence of two variables.

        This is the more flexible API, which allows using different internal data representations.

        Specified by:
        dependence in interface Dependence
        Type Parameters:
        A - First array type
        B - Second array type
        Parameters:
        adapter1 - First data adapter
        data1 - First data set
        adapter2 - Second data adapter
        data2 - Second data set
        Returns:
        Dependence measure