Class ComputeKNNOutlierScores<O extends NumberVector>

  • Type Parameters:
    O - Vector type

    @Reference(authors="Erich Schubert, Remigius Wojdanowski, Arthur Zimek, Hans-Peter Kriegel",
               title="On Evaluation of Outlier Rankings and Outlier Scores",
               booktitle="Proc. 12th SIAM Int. Conf. on Data Mining (SDM 2012)",
               url="https://doi.org/10.1137/1.9781611972825.90",
               bibkey="DBLP:conf/sdm/SchubertWZK12")
    public class ComputeKNNOutlierScores<O extends NumberVector>
    extends AbstractDistanceBasedApplication<O>
    Application that runs a series of kNN-based algorithms on a data set, for building an ensemble in a second step. The output file consists of a label and one score value for each object.

    Since some algorithms can be too slow to run on large data sets and for large values of k, they can be disabled. For example -disable '(LDOF|DWOF|COF|FastABOD)' disables these two methods completely. Alternatively, you can use the parameter -ksquaremax to control the maximum k for these four methods separately.

    For methods where k=1 does not make sense, this value will be skipped, and the procedure will commence at 1+stepsize.

    Reference:

    Erich Schubert, Remigius Wojdanowski, Arthur Zimek, Hans-Peter Kriegel
    On Evaluation of Outlier Rankings and Outlier Scores
    Proc. 12th SIAM Int. Conf. on Data Mining (SDM 2012)

    Since:
    0.5.0
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Our logger class.
      • outfile

        java.nio.file.Path outfile
        Output file
      • bylabel

        ByLabelOutlier bylabel
        By label outlier detection - reference
      • disable

        java.util.regex.Pattern disable
        Pattern for disabling (skipping) methods.
      • ksquarestop

        int ksquarestop
        Maximum k for O(k^2) methods.
      • timelimit

        long timelimit
        Timelimit for computation (not strictly enforced). In ms.
    • Constructor Detail

      • ComputeKNNOutlierScores

        public ComputeKNNOutlierScores​(InputStep inputstep,
                                       Distance<? super O> distance,
                                       IntGenerator krange,
                                       ByLabelOutlier bylabel,
                                       java.nio.file.Path outfile,
                                       ScalingFunction scaling,
                                       java.util.regex.Pattern disable,
                                       int ksquarestop,
                                       long timelimit)
        Constructor.
        Parameters:
        inputstep - Input step
        distance - Distance function
        krange - K parameter range
        bylabel - By label outlier (reference)
        outfile - Output file
        scaling - Scaling function
        disable - Pattern for disabling methods
        ksquarestop - Maximum k for O(k^2) methods
        timelimit - Time limit in seconds
    • Method Detail

      • writeResult

        void writeResult​(java.lang.Appendable out,
                         DBIDs ids,
                         OutlierResult result,
                         ScalingFunction scaling,
                         java.lang.String label)
        Write a single output line.
        Parameters:
        out - Output stream
        ids - DBIDs
        result - Outlier result
        scaling - Scaling function
        label - Identification label
      • runForEachK

        private void runForEachK​(java.lang.String prefix,
                                 int mink,
                                 int maxk,
                                 java.util.function.IntFunction<OutlierResult> runner,
                                 java.util.function.BiConsumer<java.lang.String,​OutlierResult> out)
        Iterate over the k range.
        Parameters:
        prefix - Prefix string
        mink - Minimum value of k for this method
        maxk - Maximum value of k for this method
        runner - Runner to run
        out - Output function
      • isDisabled

        protected boolean isDisabled​(java.lang.String name)
        Test if a given algorithm is disabled.
        Parameters:
        name - Algorithm name
        Returns:
        true if disabled
      • main

        public static void main​(java.lang.String[] args)
        Main method.
        Parameters:
        args - Command line parameters.