Class KDEOS<O>
- java.lang.Object
-
- elki.outlier.lof.KDEOS<O>
-
- Type Parameters:
O
- Object type
- All Implemented Interfaces:
Algorithm
,OutlierAlgorithm
@Title("KDEOS: Kernel Density Estimator Outlier Score") @Reference(authors="Erich Schubert, Arthur Zimek, Hans-Peter Kriegel", title="Generalized Outlier Detection with Flexible Kernel Density Estimates", booktitle="Proc. 14th SIAM International Conference on Data Mining (SDM 2014)", url="https://doi.org/10.1137/1.9781611973440.63", bibkey="DBLP:conf/sdm/SchubertZK14") public class KDEOS<O> extends java.lang.Object implements OutlierAlgorithm
Generalized Outlier Detection with Flexible Kernel Density Estimates.This is an outlier detection inspired by LOF, but using kernel density estimation (KDE) from statistics. Unfortunately, for higher dimensional data, kernel density estimation itself becomes difficult. At this point, the kdeos.idim parameter can become useful, which allows to either disable dimensionality adjustment completely (0) or to set it to a lower dimensionality than the data representation. This may sound like a hack at first, but real data is often of lower intrinsic dimensionality, and embedded into a higher data representation. Adjusting the kernel to account for the representation seems to yield worse results than using a lower, intrinsic, dimensionality.
If your data set has many duplicates, the kdeos.kernel.minbw parameter sets a minimum kernel bandwidth, which may improve results in these cases, as it prevents kernels from degenerating to single points.
Reference:
Erich Schubert, Arthur Zimek, Hans-Peter Kriegel
Generalized Outlier Detection with Flexible Kernel Density Estimates
Proc. 14th SIAM International Conference on Data Mining (SDM 2014)- Since:
- 0.7.0
- Author:
- Erich Schubert
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description private static double
CUTOFF
Significance cutoff when computing kernel density.protected Distance<? super O>
distance
Distance function used.protected int
idim
Intrinsic dimensionality.protected KernelDensityFunction
kernel
Kernel function to use for density estimation.protected int
kmax
Maximum number of neighbors to use.protected int
kmin
Minimum number of neighbors to use.private static Logging
LOG
Class logger.protected double
minBandwidth
Kernel minimum bandwidth.protected double
scale
Kernel scaling parameter.
-
Constructor Summary
Constructors Constructor Description KDEOS(Distance<? super O> distance, int kmin, int kmax, KernelDensityFunction kernel, double minBandwidth, double scale, int idim)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
computeOutlierScores(KNNSearcher<DBIDRef> knnq, DBIDs ids, WritableDataStore<double[]> densities, WritableDoubleDataStore kdeos, DoubleMinMax minmax)
Compute the final KDEOS scores.private int
dimensionality(Relation<O> rel)
Ugly hack to allow using this implementation without having a well-defined dimensionality.protected void
estimateDensities(Relation<O> rel, KNNSearcher<DBIDRef> knnq, DBIDs ids, WritableDataStore<double[]> densities)
Perform the kernel density estimation step.TypeInformation[]
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.OutlierResult
run(Relation<O> rel)
Run the KDEOS outlier detection algorithm.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.outlier.OutlierAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger.
-
CUTOFF
private static final double CUTOFF
Significance cutoff when computing kernel density.- See Also:
- Constant Field Values
-
kernel
protected KernelDensityFunction kernel
Kernel function to use for density estimation.
-
kmin
protected int kmin
Minimum number of neighbors to use.
-
kmax
protected int kmax
Maximum number of neighbors to use.
-
scale
protected double scale
Kernel scaling parameter.
-
minBandwidth
protected double minBandwidth
Kernel minimum bandwidth.
-
idim
protected int idim
Intrinsic dimensionality.
-
-
Constructor Detail
-
KDEOS
public KDEOS(Distance<? super O> distance, int kmin, int kmax, KernelDensityFunction kernel, double minBandwidth, double scale, int idim)
Constructor.- Parameters:
distance
- Distance functionkmin
- Minimum number of neighborskmax
- Maximum number of neighborskernel
- Kernel functionminBandwidth
- Minimum bandwidthscale
- Kernel scaling parameteridim
- Intrinsic dimensionality (use 0 to use real dimensionality)
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:Algorithm
Get the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestriction
in interfaceAlgorithm
- Returns:
- Type restriction
-
run
public OutlierResult run(Relation<O> rel)
Run the KDEOS outlier detection algorithm.- Parameters:
rel
- Relation to process- Returns:
- Outlier detection result
-
estimateDensities
protected void estimateDensities(Relation<O> rel, KNNSearcher<DBIDRef> knnq, DBIDs ids, WritableDataStore<double[]> densities)
Perform the kernel density estimation step.- Parameters:
rel
- Relation to queryknnq
- kNN queryids
- IDs to processdensities
- Density storage
-
dimensionality
private int dimensionality(Relation<O> rel)
Ugly hack to allow using this implementation without having a well-defined dimensionality.- Parameters:
rel
- Data relation- Returns:
- Dimensionality
-
computeOutlierScores
protected void computeOutlierScores(KNNSearcher<DBIDRef> knnq, DBIDs ids, WritableDataStore<double[]> densities, WritableDoubleDataStore kdeos, DoubleMinMax minmax)
Compute the final KDEOS scores.- Parameters:
knnq
- kNN queryids
- IDs to processdensities
- Density estimateskdeos
- Score outputsminmax
- Minimum and maximum scores
-
-