Class GreedyEnsembleExperiment


  • @Reference(authors="Erich Schubert, Remigius Wojdanowski, Arthur Zimek, Hans-Peter Kriegel",
               title="On Evaluation of Outlier Rankings and Outlier Scores",
               booktitle="Proc. 12th SIAM Int. Conf. on Data Mining (SDM 2012)",
               url="https://doi.org/10.1137/1.9781611972825.90",
               bibkey="DBLP:conf/sdm/SchubertWZK12")
    public class GreedyEnsembleExperiment
    extends AbstractApplication
    Class to load an outlier detection summary file, as produced by ComputeKNNOutlierScores, and compute a naive ensemble for it. Based on this initial estimation, and optimized ensemble is built using a greedy strategy. Starting with the best candidate only as initial ensemble, the most diverse candidate is investigated at each step. If it improves towards the (estimated) target vector, it is added, otherwise it is discarded.

    This approach is naive, and it may be surprising that it can improve results. The reason is probably that diversity will result in a comparable ensemble, while the reduced ensemble size is actually responsible for the improvements, by being more decisive and less noisy due to dropping "unhelpful" members.

    This still leaves quite a bit of room for improvement. If you build upon this basic approach, please acknowledge our proof of concept work.

    Reference:

    Erich Schubert, Remigius Wojdanowski, Arthur Zimek, Hans-Peter Kriegel
    On Evaluation of Outlier Rankings and Outlier Scores
    Proc. 12th SIAM Int. Conf. on Data Mining (SDM 2012)

    Since:
    0.5.0
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Get static logger.
      • inputstep

        private InputStep inputstep
        The data input part.
      • refine_truth

        boolean refine_truth
        Variant, where the truth vector is also updated.
      • prescaling

        ScalingFunction prescaling
        Outlier scaling to apply during preprocessing.
      • scaling

        ScalingFunction scaling
        Outlier scaling to apply to constructed ensembles.
      • rate

        double rate
        Expected rate of outliers.
      • minvote

        int minvote
        Minimum votes.
    • Constructor Detail

      • GreedyEnsembleExperiment

        public GreedyEnsembleExperiment​(InputStep inputstep,
                                        EnsembleVoting voting,
                                        GreedyEnsembleExperiment.Distance distance,
                                        ScalingFunction prescaling,
                                        ScalingFunction scaling,
                                        double rate)
        Constructor.
        Parameters:
        inputstep - Input step
        voting - Ensemble voting
        distance - Distance function
        prescaling - Scaling to apply to input data
        scaling - Scaling to apply to ensemble members
        rate - Expected rate of outliers
    • Method Detail

      • singleEnsemble

        protected void singleEnsemble​(double[] ensemble,
                                      NumberVector vec)
        Build a single-element "ensemble".
        Parameters:
        ensemble -
        vec -
      • applyPrescaling

        public static Relation<NumberVector> applyPrescaling​(ScalingFunction scaling,
                                                             Relation<NumberVector> relation,
                                                             DBIDs skip)
        Prescale each vector (except when in skip) with the given scaling function.
        Parameters:
        scaling - Scaling function
        relation - Relation to read
        skip - DBIDs to pass unmodified
        Returns:
        New relation
      • applyScaling

        private static void applyScaling​(double[] raw,
                                         ScalingFunction scaling)
      • updateEstimations

        protected void updateEstimations​(int[] outliers,
                                         int numoutliers,
                                         double[] weights,
                                         double[] truth)
      • gain

        double gain​(double score,
                    double ref,
                    double optimal)
        Compute the gain coefficient.
        Parameters:
        score - New score
        ref - Reference score
        optimal - Maximum score possible
        Returns:
        Gain
      • main

        public static void main​(java.lang.String[] args)
        Main method.
        Parameters:
        args - Command line parameters.