Class SOS<O>

  • Type Parameters:
    O - Object type
    All Implemented Interfaces:
    Algorithm, OutlierAlgorithm

    @Title("SOS: Stochastic Outlier Selection")
    @Reference(authors="J. Janssens, F. Husz\u00e1r, E. Postma, J. van den Herik",
               title="Stochastic Outlier Selection",
               booktitle="TiCC TR 2012\u2013001",
               url="https://www.tilburguniversity.edu/upload/b7bac5b2-9b00-402a-9261-7849aa019fbb_sostr.pdf",
               bibkey="tr/tilburg/JanssensHPv12")
    public class SOS<O>
    extends java.lang.Object
    implements OutlierAlgorithm
    Stochastic Outlier Selection.

    Reference:

    J. Janssens, F. Huszár, E. Postma, J. van den Herik
    Stochastic Outlier Selection
    TiCC TR 2012–001

    Since:
    0.7.5
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
      • PERPLEXITY_ERROR

        protected static final double PERPLEXITY_ERROR
        Threshold for optimizing perplexity.
        See Also:
        Constant Field Values
      • PERPLEXITY_MAXITER

        protected static final int PERPLEXITY_MAXITER
        Maximum number of iterations when optimizing perplexity.
        See Also:
        Constant Field Values
      • distance

        protected Distance<? super O> distance
        Distance function used.
      • perplexity

        protected double perplexity
        Perplexity
    • Constructor Detail

      • SOS

        public SOS​(Distance<? super O> distance,
                   double h)
        Constructor.
        Parameters:
        distance - Distance function
        h - Perplexity
    • Method Detail

      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in interface Algorithm
        Returns:
        Type restriction
      • run

        public OutlierResult run​(Relation<O> relation)
        Run the algorithm.
        Parameters:
        relation - data relation
        Returns:
        outlier detection result
      • sumOfProbabilities

        public static double sumOfProbabilities​(DBIDIter ignore,
                                                DBIDArrayIter di,
                                                double[] p)
        Compute the sum of probabilities, stop at first 0, ignore query object. Note: while SOS ensures the 'ignore' object is not added in the first place, KNNSOS cannot do so efficiently (yet).
        Parameters:
        ignore - Object to ignore.
        di - Object list
        p - Probabilities
        Returns:
        Sum.
      • nominateNeighbors

        public static void nominateNeighbors​(DBIDIter ignore,
                                             DBIDArrayIter di,
                                             double[] p,
                                             double norm,
                                             WritableDoubleDataStore scores)
        Vote for neighbors not being outliers. The key method of SOS.
        Parameters:
        ignore - Object to ignore
        di - Neighbor object IDs.
        p - Probabilities
        norm - Normalization factor (1/sum)
        scores - Output score storage
      • computePi

        public static double computePi​(DBIDRef ignore,
                                       DoubleDBIDListIter it,
                                       double[] p,
                                       double perplexity,
                                       double logPerp)
        Compute row p[i], using binary search on the kernel bandwidth sigma to obtain the desired perplexity.
        Parameters:
        ignore - Object to skip
        it - Distances iterator
        p - Output row
        perplexity - Desired perplexity
        logPerp - Log of desired perplexity
        Returns:
        Beta
      • estimateInitialBeta

        @Reference(authors="Erich Schubert, Michael Gertz",
                   title="Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection: A Remedy Against the Curse of Dimensionality?",
                   booktitle="Proc. Int. Conf. Similarity Search and Applications, SISAP 2017",
                   url="https://doi.org/10.1007/978-3-319-68474-1_13",
                   bibkey="DBLP:conf/sisap/SchubertG17")
        protected static double estimateInitialBeta​(DBIDRef ignore,
                                                    DoubleDBIDListIter it,
                                                    double perplexity)
        Estimate beta from the distances in a row.

        This lacks a thorough mathematical argument, but is a handcrafted heuristic to avoid numerical problems. The average distance is usually too large, so we scale the average distance by 2*N/perplexity. Then estimate beta as 1/x.

        Parameters:
        ignore - Object to skip
        it - Distance iterator
        perplexity - Desired perplexity
        Returns:
        Estimated beta.
      • computeH

        protected static double computeH​(DBIDRef ignore,
                                         DoubleDBIDListIter it,
                                         double[] p,
                                         double mbeta)
        Compute H (observed perplexity) for row i, and the row pij_i.
        Parameters:
        ignore - Object to skip
        it - Distances list
        p - Output probabilities
        mbeta - -1. / (2 * sigma * sigma)
        Returns:
        Observed perplexity