Class KDEOS<O>

  • Type Parameters:
    O - Object type
    All Implemented Interfaces:
    Algorithm, OutlierAlgorithm

    @Title("KDEOS: Kernel Density Estimator Outlier Score")
    @Reference(authors="Erich Schubert, Arthur Zimek, Hans-Peter Kriegel",
               title="Generalized Outlier Detection with Flexible Kernel Density Estimates",
               booktitle="Proc. 14th SIAM International Conference on Data Mining (SDM 2014)",
               url="https://doi.org/10.1137/1.9781611973440.63",
               bibkey="DBLP:conf/sdm/SchubertZK14")
    public class KDEOS<O>
    extends java.lang.Object
    implements OutlierAlgorithm
    Generalized Outlier Detection with Flexible Kernel Density Estimates.

    This is an outlier detection inspired by LOF, but using kernel density estimation (KDE) from statistics. Unfortunately, for higher dimensional data, kernel density estimation itself becomes difficult. At this point, the kdeos.idim parameter can become useful, which allows to either disable dimensionality adjustment completely (0) or to set it to a lower dimensionality than the data representation. This may sound like a hack at first, but real data is often of lower intrinsic dimensionality, and embedded into a higher data representation. Adjusting the kernel to account for the representation seems to yield worse results than using a lower, intrinsic, dimensionality.

    If your data set has many duplicates, the kdeos.kernel.minbw parameter sets a minimum kernel bandwidth, which may improve results in these cases, as it prevents kernels from degenerating to single points.

    Reference:

    Erich Schubert, Arthur Zimek, Hans-Peter Kriegel
    Generalized Outlier Detection with Flexible Kernel Density Estimates
    Proc. 14th SIAM International Conference on Data Mining (SDM 2014)

    Since:
    0.7.0
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
      • CUTOFF

        private static final double CUTOFF
        Significance cutoff when computing kernel density.
        See Also:
        Constant Field Values
      • distance

        protected Distance<? super O> distance
        Distance function used.
      • kmin

        protected int kmin
        Minimum number of neighbors to use.
      • kmax

        protected int kmax
        Maximum number of neighbors to use.
      • scale

        protected double scale
        Kernel scaling parameter.
      • minBandwidth

        protected double minBandwidth
        Kernel minimum bandwidth.
      • idim

        protected int idim
        Intrinsic dimensionality.
    • Constructor Detail

      • KDEOS

        public KDEOS​(Distance<? super O> distance,
                     int kmin,
                     int kmax,
                     KernelDensityFunction kernel,
                     double minBandwidth,
                     double scale,
                     int idim)
        Constructor.
        Parameters:
        distance - Distance function
        kmin - Minimum number of neighbors
        kmax - Maximum number of neighbors
        kernel - Kernel function
        minBandwidth - Minimum bandwidth
        scale - Kernel scaling parameter
        idim - Intrinsic dimensionality (use 0 to use real dimensionality)
    • Method Detail

      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in interface Algorithm
        Returns:
        Type restriction
      • run

        public OutlierResult run​(Relation<O> rel)
        Run the KDEOS outlier detection algorithm.
        Parameters:
        rel - Relation to process
        Returns:
        Outlier detection result
      • estimateDensities

        protected void estimateDensities​(Relation<O> rel,
                                         KNNSearcher<DBIDRef> knnq,
                                         DBIDs ids,
                                         WritableDataStore<double[]> densities)
        Perform the kernel density estimation step.
        Parameters:
        rel - Relation to query
        knnq - kNN query
        ids - IDs to process
        densities - Density storage
      • dimensionality

        private int dimensionality​(Relation<O> rel)
        Ugly hack to allow using this implementation without having a well-defined dimensionality.
        Parameters:
        rel - Data relation
        Returns:
        Dimensionality