Class ReferenceBasedOutlierDetection

  • All Implemented Interfaces:
    Algorithm, OutlierAlgorithm

    @Title("An Efficient Reference-based Approach to Outlier Detection in Large Datasets")
    @Description("Computes kNN distances approximately, using reference points with various reference point strategies.")
    @Reference(authors="Y. Pei, O. R. Zaiane, Y. Gao",
               title="An Efficient Reference-based Approach to Outlier Detection in Large Datasets",
               booktitle="Proc. 6th IEEE Int. Conf. on Data Mining (ICDM \'06)",
               url="https://doi.org/10.1109/ICDM.2006.17",
               bibkey="DBLP:conf/icdm/PeiZG06")
    public class ReferenceBasedOutlierDetection
    extends java.lang.Object
    implements OutlierAlgorithm
    Reference-Based Outlier Detection algorithm, an algorithm that computes kNN distances approximately, using reference points.

    kNN distances are approximated by the difference in distance from a reference point. For this approximation to be of high quality, triangle inequality is required; but the algorithm can also process non-metric distances.

    Reference:

    Y. Pei, O. R. Zaiane, Y. Gao
    An Efficient Reference-Based Approach to Outlier Detection in Large Datasets
    Proc. IEEE Int. Conf. on Data Mining (ICDM'06)

    Since:
    0.3
    Author:
    Lisa Reichert, Erich Schubert
    • Constructor Detail

      • ReferenceBasedOutlierDetection

        public ReferenceBasedOutlierDetection​(int k,
                                              NumberVectorDistance<? super NumberVector> distance,
                                              ReferencePointsHeuristic refp)
        Constructor with parameters.
        Parameters:
        k - number of neighbors
        distance - distance function
        refp - Reference points heuristic
    • Method Detail

      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in interface Algorithm
        Returns:
        Type restriction
      • run

        public OutlierResult run​(Relation<? extends NumberVector> relation)
        Run the algorithm on the given relation.
        Parameters:
        relation - Relation to process
        Returns:
        Outlier result
      • computeDistanceVector

        protected DoubleDBIDList computeDistanceVector​(NumberVector refPoint,
                                                       Relation<? extends NumberVector> database,
                                                       PrimitiveDistanceQuery<? super NumberVector> distFunc)
        Computes for each object the distance to one reference point. (one dimensional representation of the data set)
        Parameters:
        refPoint - Reference Point Feature Vector
        database - database to work on
        distFunc - Distance function to use
        Returns:
        array containing the distance to one reference point for each database object and the object id
      • updateDensities

        protected void updateDensities​(WritableDoubleDataStore rbod_score,
                                       DoubleDBIDList referenceDists)
        Update the density estimates for each object.
        Parameters:
        rbod_score - Density storage
        referenceDists - Distances from current reference point
      • computeDensity

        protected double computeDensity​(DoubleDBIDList referenceDists,
                                        DoubleDBIDListIter iter,
                                        int index)
        Computes the density of an object. The density of an object is the distances to the k nearest neighbors. Neighbors and distances are computed approximately. (approximation for kNN distance: instead of a normal NN search the NN of an object are those objects that have a similar distance to a reference point. The k-nearest neighbors of an object are those objects that lay close to the object in the reference distance vector)
        Parameters:
        referenceDists - vector of the reference distances
        iter - Iterator to this list (will be reused)
        index - index of the current object
        Returns:
        density for one object and reference point