Package elki.outlier.distance
Class ReferenceBasedOutlierDetection
- java.lang.Object
-
- elki.outlier.distance.ReferenceBasedOutlierDetection
-
- All Implemented Interfaces:
Algorithm,OutlierAlgorithm
@Title("An Efficient Reference-based Approach to Outlier Detection in Large Datasets") @Description("Computes kNN distances approximately, using reference points with various reference point strategies.") @Reference(authors="Y. Pei, O. R. Zaiane, Y. Gao", title="An Efficient Reference-based Approach to Outlier Detection in Large Datasets", booktitle="Proc. 6th IEEE Int. Conf. on Data Mining (ICDM \'06)", url="https://doi.org/10.1109/ICDM.2006.17", bibkey="DBLP:conf/icdm/PeiZG06") public class ReferenceBasedOutlierDetection extends java.lang.Object implements OutlierAlgorithm
Reference-Based Outlier Detection algorithm, an algorithm that computes kNN distances approximately, using reference points.kNN distances are approximated by the difference in distance from a reference point. For this approximation to be of high quality, triangle inequality is required; but the algorithm can also process non-metric distances.
Reference:
Y. Pei, O. R. Zaiane, Y. Gao
An Efficient Reference-Based Approach to Outlier Detection in Large Datasets
Proc. IEEE Int. Conf. on Data Mining (ICDM'06)- Since:
- 0.3
- Author:
- Lisa Reichert, Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classReferenceBasedOutlierDetection.ParParameterization class.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description protected NumberVectorDistance<? super NumberVector>distanceDistance function used.protected intkHolds the number of neighbors to use for density estimation.protected ReferencePointsHeuristicrefpStores the reference point strategy.
-
Constructor Summary
Constructors Constructor Description ReferenceBasedOutlierDetection(int k, NumberVectorDistance<? super NumberVector> distance, ReferencePointsHeuristic refp)Constructor with parameters.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected doublecomputeDensity(DoubleDBIDList referenceDists, DoubleDBIDListIter iter, int index)Computes the density of an object.protected DoubleDBIDListcomputeDistanceVector(NumberVector refPoint, Relation<? extends NumberVector> database, PrimitiveDistanceQuery<? super NumberVector> distFunc)Computes for each object the distance to one reference point.TypeInformation[]getInputTypeRestriction()Get the input type restriction used for negotiating the data query.OutlierResultrun(Relation<? extends NumberVector> relation)Run the algorithm on the given relation.protected voidupdateDensities(WritableDoubleDataStore rbod_score, DoubleDBIDList referenceDists)Update the density estimates for each object.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.outlier.OutlierAlgorithm
autorun
-
-
-
-
Field Detail
-
distance
protected NumberVectorDistance<? super NumberVector> distance
Distance function used.
-
k
protected int k
Holds the number of neighbors to use for density estimation.
-
refp
protected ReferencePointsHeuristic refp
Stores the reference point strategy.
-
-
Constructor Detail
-
ReferenceBasedOutlierDetection
public ReferenceBasedOutlierDetection(int k, NumberVectorDistance<? super NumberVector> distance, ReferencePointsHeuristic refp)Constructor with parameters.- Parameters:
k- number of neighborsdistance- distance functionrefp- Reference points heuristic
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:AlgorithmGet the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestrictionin interfaceAlgorithm- Returns:
- Type restriction
-
run
public OutlierResult run(Relation<? extends NumberVector> relation)
Run the algorithm on the given relation.- Parameters:
relation- Relation to process- Returns:
- Outlier result
-
computeDistanceVector
protected DoubleDBIDList computeDistanceVector(NumberVector refPoint, Relation<? extends NumberVector> database, PrimitiveDistanceQuery<? super NumberVector> distFunc)
Computes for each object the distance to one reference point. (one dimensional representation of the data set)- Parameters:
refPoint- Reference Point Feature Vectordatabase- database to work ondistFunc- Distance function to use- Returns:
- array containing the distance to one reference point for each database object and the object id
-
updateDensities
protected void updateDensities(WritableDoubleDataStore rbod_score, DoubleDBIDList referenceDists)
Update the density estimates for each object.- Parameters:
rbod_score- Density storagereferenceDists- Distances from current reference point
-
computeDensity
protected double computeDensity(DoubleDBIDList referenceDists, DoubleDBIDListIter iter, int index)
Computes the density of an object. The density of an object is the distances to the k nearest neighbors. Neighbors and distances are computed approximately. (approximation for kNN distance: instead of a normal NN search the NN of an object are those objects that have a similar distance to a reference point. The k-nearest neighbors of an object are those objects that lay close to the object in the reference distance vector)- Parameters:
referenceDists- vector of the reference distancesiter- Iterator to this list (will be reused)index- index of the current object- Returns:
- density for one object and reference point
-
-