Package elki.outlier.clustering
Class KMeansOutlierDetection<O extends NumberVector>
- java.lang.Object
-
- elki.outlier.clustering.KMeansOutlierDetection<O>
-
- Type Parameters:
O
- Object type
- All Implemented Interfaces:
Algorithm
,OutlierAlgorithm
public class KMeansOutlierDetection<O extends NumberVector> extends java.lang.Object implements OutlierAlgorithm
Outlier detection by using k-means clustering.The scores are assigned by the objects distance to the nearest center.
We do not have a clear reference for this approach, but it seems to be a best practice in some areas to remove objects that have the largest distance from their center. This can for example be found mentioned in the book of Han, Kamber and Pei, but our implementation goes beyond their approach when it comes to handling singleton objects (that are a cluster of their own). To cite this approach, please cite the ELKI version you used (use the ELKI publication list for citation information and BibTeX templates).
- Since:
- 0.7.0
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
KMeansOutlierDetection.Rule
Outlier scoring rule-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Constructor Summary
Constructors Constructor Description KMeansOutlierDetection(KMeans<O,?> clusterer, KMeansOutlierDetection.Rule rule)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
distanceScoring(Clustering<?> c, Relation<O> relation, NumberVectorDistance<? super O> distfunc, WritableDoubleDataStore scores, DoubleMinMax mm)
Simple distance-based scoring function.TypeInformation[]
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.OutlierResult
run(Relation<O> relation)
Run the outlier detection algorithm.private void
singletonsScoring(Clustering<?> c, Relation<O> relation, NumberVectorDistance<? super O> distfunc, WritableDoubleDataStore scores, DoubleMinMax mm)
Distance-based scoring that takes singletons into account.private void
varianceScoring(Clustering<?> c, Relation<O> relation, NumberVectorDistance<? super O> distfunc, WritableDoubleDataStore scores, DoubleMinMax mm)
Variance-based scoring function.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.outlier.OutlierAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger.
-
clusterer
KMeans<O extends NumberVector,?> clusterer
K-Means clustering algorithm to use
-
rule
KMeansOutlierDetection.Rule rule
Outlier scoring rule
-
-
Constructor Detail
-
KMeansOutlierDetection
public KMeansOutlierDetection(KMeans<O,?> clusterer, KMeansOutlierDetection.Rule rule)
Constructor.- Parameters:
clusterer
- Clustering algorithmrule
- Decision rule
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:Algorithm
Get the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestriction
in interfaceAlgorithm
- Returns:
- Type restriction
-
run
public OutlierResult run(Relation<O> relation)
Run the outlier detection algorithm.- Parameters:
relation
- Relation- Returns:
- Outlier detection result
-
distanceScoring
private void distanceScoring(Clustering<?> c, Relation<O> relation, NumberVectorDistance<? super O> distfunc, WritableDoubleDataStore scores, DoubleMinMax mm)
Simple distance-based scoring function.- Parameters:
c
- Clusteringrelation
- data relationdistfunc
- Distance functionscores
- Scores outputmm
- Minimum and maximum
-
singletonsScoring
private void singletonsScoring(Clustering<?> c, Relation<O> relation, NumberVectorDistance<? super O> distfunc, WritableDoubleDataStore scores, DoubleMinMax mm)
Distance-based scoring that takes singletons into account.- Parameters:
c
- Clusteringrelation
- data relationdistfunc
- Distance functionscores
- Scores outputmm
- Minimum and maximum
-
varianceScoring
private void varianceScoring(Clustering<?> c, Relation<O> relation, NumberVectorDistance<? super O> distfunc, WritableDoubleDataStore scores, DoubleMinMax mm)
Variance-based scoring function.- Parameters:
c
- Clusteringrelation
- data relationdistfunc
- Distance functionscores
- Scores outputmm
- Minimum and maximum
-
-