Package elki.outlier.distance
Class SOS<O>
- java.lang.Object
-
- elki.outlier.distance.SOS<O>
-
- Type Parameters:
O- Object type
- All Implemented Interfaces:
Algorithm,OutlierAlgorithm
@Title("SOS: Stochastic Outlier Selection") @Reference(authors="J. Janssens, F. Husz\u00e1r, E. Postma, J. van den Herik", title="Stochastic Outlier Selection", booktitle="TiCC TR 2012\u2013001", url="https://www.tilburguniversity.edu/upload/b7bac5b2-9b00-402a-9261-7849aa019fbb_sostr.pdf", bibkey="tr/tilburg/JanssensHPv12") public class SOS<O> extends java.lang.Object implements OutlierAlgorithm
Stochastic Outlier Selection.Reference:
J. Janssens, F. Huszár, E. Postma, J. van den Herik
Stochastic Outlier Selection
TiCC TR 2012–001- Since:
- 0.7.5
- Author:
- Erich Schubert
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description protected Distance<? super O>distanceDistance function used.private static LoggingLOGClass logger.protected doubleperplexityPerplexityprotected static doublePERPLEXITY_ERRORThreshold for optimizing perplexity.protected static intPERPLEXITY_MAXITERMaximum number of iterations when optimizing perplexity.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected static doublecomputeH(DBIDRef ignore, DoubleDBIDListIter it, double[] p, double mbeta)Compute H (observed perplexity) for row i, and the row pij_i.static doublecomputePi(DBIDRef ignore, DoubleDBIDListIter it, double[] p, double perplexity, double logPerp)Compute row p[i], using binary search on the kernel bandwidth sigma to obtain the desired perplexity.protected static doubleestimateInitialBeta(DBIDRef ignore, DoubleDBIDListIter it, double perplexity)Estimate beta from the distances in a row.TypeInformation[]getInputTypeRestriction()Get the input type restriction used for negotiating the data query.static voidnominateNeighbors(DBIDIter ignore, DBIDArrayIter di, double[] p, double norm, WritableDoubleDataStore scores)Vote for neighbors not being outliers.OutlierResultrun(Relation<O> relation)Run the algorithm.static doublesumOfProbabilities(DBIDIter ignore, DBIDArrayIter di, double[] p)Compute the sum of probabilities, stop at first 0, ignore query object.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.outlier.OutlierAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger.
-
PERPLEXITY_ERROR
protected static final double PERPLEXITY_ERROR
Threshold for optimizing perplexity.- See Also:
- Constant Field Values
-
PERPLEXITY_MAXITER
protected static final int PERPLEXITY_MAXITER
Maximum number of iterations when optimizing perplexity.- See Also:
- Constant Field Values
-
perplexity
protected double perplexity
Perplexity
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:AlgorithmGet the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestrictionin interfaceAlgorithm- Returns:
- Type restriction
-
run
public OutlierResult run(Relation<O> relation)
Run the algorithm.- Parameters:
relation- data relation- Returns:
- outlier detection result
-
sumOfProbabilities
public static double sumOfProbabilities(DBIDIter ignore, DBIDArrayIter di, double[] p)
Compute the sum of probabilities, stop at first 0, ignore query object. Note: while SOS ensures the 'ignore' object is not added in the first place, KNNSOS cannot do so efficiently (yet).- Parameters:
ignore- Object to ignore.di- Object listp- Probabilities- Returns:
- Sum.
-
nominateNeighbors
public static void nominateNeighbors(DBIDIter ignore, DBIDArrayIter di, double[] p, double norm, WritableDoubleDataStore scores)
Vote for neighbors not being outliers. The key method of SOS.- Parameters:
ignore- Object to ignoredi- Neighbor object IDs.p- Probabilitiesnorm- Normalization factor (1/sum)scores- Output score storage
-
computePi
public static double computePi(DBIDRef ignore, DoubleDBIDListIter it, double[] p, double perplexity, double logPerp)
Compute row p[i], using binary search on the kernel bandwidth sigma to obtain the desired perplexity.- Parameters:
ignore- Object to skipit- Distances iteratorp- Output rowperplexity- Desired perplexitylogPerp- Log of desired perplexity- Returns:
- Beta
-
estimateInitialBeta
@Reference(authors="Erich Schubert, Michael Gertz", title="Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection: A Remedy Against the Curse of Dimensionality?", booktitle="Proc. Int. Conf. Similarity Search and Applications, SISAP 2017", url="https://doi.org/10.1007/978-3-319-68474-1_13", bibkey="DBLP:conf/sisap/SchubertG17") protected static double estimateInitialBeta(DBIDRef ignore, DoubleDBIDListIter it, double perplexity)
Estimate beta from the distances in a row.This lacks a thorough mathematical argument, but is a handcrafted heuristic to avoid numerical problems. The average distance is usually too large, so we scale the average distance by 2*N/perplexity. Then estimate beta as 1/x.
- Parameters:
ignore- Object to skipit- Distance iteratorperplexity- Desired perplexity- Returns:
- Estimated beta.
-
computeH
protected static double computeH(DBIDRef ignore, DoubleDBIDListIter it, double[] p, double mbeta)
Compute H (observed perplexity) for row i, and the row pij_i.- Parameters:
ignore- Object to skipit- Distances listp- Output probabilitiesmbeta--1. / (2 * sigma * sigma)- Returns:
- Observed perplexity
-
-