Package elki.outlier.distance
Class SOS<O>
- java.lang.Object
-
- elki.outlier.distance.SOS<O>
-
- Type Parameters:
O
- Object type
- All Implemented Interfaces:
Algorithm
,OutlierAlgorithm
@Title("SOS: Stochastic Outlier Selection") @Reference(authors="J. Janssens, F. Husz\u00e1r, E. Postma, J. van den Herik", title="Stochastic Outlier Selection", booktitle="TiCC TR 2012\u2013001", url="https://www.tilburguniversity.edu/upload/b7bac5b2-9b00-402a-9261-7849aa019fbb_sostr.pdf", bibkey="tr/tilburg/JanssensHPv12") public class SOS<O> extends java.lang.Object implements OutlierAlgorithm
Stochastic Outlier Selection.Reference:
J. Janssens, F. Huszár, E. Postma, J. van den Herik
Stochastic Outlier Selection
TiCC TR 2012–001- Since:
- 0.7.5
- Author:
- Erich Schubert
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description protected Distance<? super O>
distance
Distance function used.private static Logging
LOG
Class logger.protected double
perplexity
Perplexityprotected static double
PERPLEXITY_ERROR
Threshold for optimizing perplexity.protected static int
PERPLEXITY_MAXITER
Maximum number of iterations when optimizing perplexity.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected static double
computeH(DBIDRef ignore, DoubleDBIDListIter it, double[] p, double mbeta)
Compute H (observed perplexity) for row i, and the row pij_i.static double
computePi(DBIDRef ignore, DoubleDBIDListIter it, double[] p, double perplexity, double logPerp)
Compute row p[i], using binary search on the kernel bandwidth sigma to obtain the desired perplexity.protected static double
estimateInitialBeta(DBIDRef ignore, DoubleDBIDListIter it, double perplexity)
Estimate beta from the distances in a row.TypeInformation[]
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.static void
nominateNeighbors(DBIDIter ignore, DBIDArrayIter di, double[] p, double norm, WritableDoubleDataStore scores)
Vote for neighbors not being outliers.OutlierResult
run(Relation<O> relation)
Run the algorithm.static double
sumOfProbabilities(DBIDIter ignore, DBIDArrayIter di, double[] p)
Compute the sum of probabilities, stop at first 0, ignore query object.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.outlier.OutlierAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger.
-
PERPLEXITY_ERROR
protected static final double PERPLEXITY_ERROR
Threshold for optimizing perplexity.- See Also:
- Constant Field Values
-
PERPLEXITY_MAXITER
protected static final int PERPLEXITY_MAXITER
Maximum number of iterations when optimizing perplexity.- See Also:
- Constant Field Values
-
perplexity
protected double perplexity
Perplexity
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:Algorithm
Get the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestriction
in interfaceAlgorithm
- Returns:
- Type restriction
-
run
public OutlierResult run(Relation<O> relation)
Run the algorithm.- Parameters:
relation
- data relation- Returns:
- outlier detection result
-
sumOfProbabilities
public static double sumOfProbabilities(DBIDIter ignore, DBIDArrayIter di, double[] p)
Compute the sum of probabilities, stop at first 0, ignore query object. Note: while SOS ensures the 'ignore' object is not added in the first place, KNNSOS cannot do so efficiently (yet).- Parameters:
ignore
- Object to ignore.di
- Object listp
- Probabilities- Returns:
- Sum.
-
nominateNeighbors
public static void nominateNeighbors(DBIDIter ignore, DBIDArrayIter di, double[] p, double norm, WritableDoubleDataStore scores)
Vote for neighbors not being outliers. The key method of SOS.- Parameters:
ignore
- Object to ignoredi
- Neighbor object IDs.p
- Probabilitiesnorm
- Normalization factor (1/sum)scores
- Output score storage
-
computePi
public static double computePi(DBIDRef ignore, DoubleDBIDListIter it, double[] p, double perplexity, double logPerp)
Compute row p[i], using binary search on the kernel bandwidth sigma to obtain the desired perplexity.- Parameters:
ignore
- Object to skipit
- Distances iteratorp
- Output rowperplexity
- Desired perplexitylogPerp
- Log of desired perplexity- Returns:
- Beta
-
estimateInitialBeta
@Reference(authors="Erich Schubert, Michael Gertz", title="Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection: A Remedy Against the Curse of Dimensionality?", booktitle="Proc. Int. Conf. Similarity Search and Applications, SISAP 2017", url="https://doi.org/10.1007/978-3-319-68474-1_13", bibkey="DBLP:conf/sisap/SchubertG17") protected static double estimateInitialBeta(DBIDRef ignore, DoubleDBIDListIter it, double perplexity)
Estimate beta from the distances in a row.This lacks a thorough mathematical argument, but is a handcrafted heuristic to avoid numerical problems. The average distance is usually too large, so we scale the average distance by 2*N/perplexity. Then estimate beta as 1/x.
- Parameters:
ignore
- Object to skipit
- Distance iteratorperplexity
- Desired perplexity- Returns:
- Estimated beta.
-
computeH
protected static double computeH(DBIDRef ignore, DoubleDBIDListIter it, double[] p, double mbeta)
Compute H (observed perplexity) for row i, and the row pij_i.- Parameters:
ignore
- Object to skipit
- Distances listp
- Output probabilitiesmbeta
--1. / (2 * sigma * sigma)
- Returns:
- Observed perplexity
-
-