Package elki.outlier.distance
Class ReferenceBasedOutlierDetection
- java.lang.Object
-
- elki.outlier.distance.ReferenceBasedOutlierDetection
-
- All Implemented Interfaces:
Algorithm
,OutlierAlgorithm
@Title("An Efficient Reference-based Approach to Outlier Detection in Large Datasets") @Description("Computes kNN distances approximately, using reference points with various reference point strategies.") @Reference(authors="Y. Pei, O. R. Zaiane, Y. Gao", title="An Efficient Reference-based Approach to Outlier Detection in Large Datasets", booktitle="Proc. 6th IEEE Int. Conf. on Data Mining (ICDM \'06)", url="https://doi.org/10.1109/ICDM.2006.17", bibkey="DBLP:conf/icdm/PeiZG06") public class ReferenceBasedOutlierDetection extends java.lang.Object implements OutlierAlgorithm
Reference-Based Outlier Detection algorithm, an algorithm that computes kNN distances approximately, using reference points.kNN distances are approximated by the difference in distance from a reference point. For this approximation to be of high quality, triangle inequality is required; but the algorithm can also process non-metric distances.
Reference:
Y. Pei, O. R. Zaiane, Y. Gao
An Efficient Reference-Based Approach to Outlier Detection in Large Datasets
Proc. IEEE Int. Conf. on Data Mining (ICDM'06)- Since:
- 0.3
- Author:
- Lisa Reichert, Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
ReferenceBasedOutlierDetection.Par
Parameterization class.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description protected NumberVectorDistance<? super NumberVector>
distance
Distance function used.protected int
k
Holds the number of neighbors to use for density estimation.protected ReferencePointsHeuristic
refp
Stores the reference point strategy.
-
Constructor Summary
Constructors Constructor Description ReferenceBasedOutlierDetection(int k, NumberVectorDistance<? super NumberVector> distance, ReferencePointsHeuristic refp)
Constructor with parameters.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected double
computeDensity(DoubleDBIDList referenceDists, DoubleDBIDListIter iter, int index)
Computes the density of an object.protected DoubleDBIDList
computeDistanceVector(NumberVector refPoint, Relation<? extends NumberVector> database, PrimitiveDistanceQuery<? super NumberVector> distFunc)
Computes for each object the distance to one reference point.TypeInformation[]
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.OutlierResult
run(Relation<? extends NumberVector> relation)
Run the algorithm on the given relation.protected void
updateDensities(WritableDoubleDataStore rbod_score, DoubleDBIDList referenceDists)
Update the density estimates for each object.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.outlier.OutlierAlgorithm
autorun
-
-
-
-
Field Detail
-
distance
protected NumberVectorDistance<? super NumberVector> distance
Distance function used.
-
k
protected int k
Holds the number of neighbors to use for density estimation.
-
refp
protected ReferencePointsHeuristic refp
Stores the reference point strategy.
-
-
Constructor Detail
-
ReferenceBasedOutlierDetection
public ReferenceBasedOutlierDetection(int k, NumberVectorDistance<? super NumberVector> distance, ReferencePointsHeuristic refp)
Constructor with parameters.- Parameters:
k
- number of neighborsdistance
- distance functionrefp
- Reference points heuristic
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:Algorithm
Get the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestriction
in interfaceAlgorithm
- Returns:
- Type restriction
-
run
public OutlierResult run(Relation<? extends NumberVector> relation)
Run the algorithm on the given relation.- Parameters:
relation
- Relation to process- Returns:
- Outlier result
-
computeDistanceVector
protected DoubleDBIDList computeDistanceVector(NumberVector refPoint, Relation<? extends NumberVector> database, PrimitiveDistanceQuery<? super NumberVector> distFunc)
Computes for each object the distance to one reference point. (one dimensional representation of the data set)- Parameters:
refPoint
- Reference Point Feature Vectordatabase
- database to work ondistFunc
- Distance function to use- Returns:
- array containing the distance to one reference point for each database object and the object id
-
updateDensities
protected void updateDensities(WritableDoubleDataStore rbod_score, DoubleDBIDList referenceDists)
Update the density estimates for each object.- Parameters:
rbod_score
- Density storagereferenceDists
- Distances from current reference point
-
computeDensity
protected double computeDensity(DoubleDBIDList referenceDists, DoubleDBIDListIter iter, int index)
Computes the density of an object. The density of an object is the distances to the k nearest neighbors. Neighbors and distances are computed approximately. (approximation for kNN distance: instead of a normal NN search the NN of an object are those objects that have a similar distance to a reference point. The k-nearest neighbors of an object are those objects that lay close to the object in the reference distance vector)- Parameters:
referenceDists
- vector of the reference distancesiter
- Iterator to this list (will be reused)index
- index of the current object- Returns:
- density for one object and reference point
-
-