java.lang.Object
- elki.application.AbstractApplication
- - elki.application.greedyensemble.GreedyEnsembleExperiment

```
@Reference(authors="Erich Schubert, Remigius Wojdanowski, Arthur Zimek, Hans-Peter Kriegel",
           title="On Evaluation of Outlier Rankings and Outlier Scores",
           booktitle="Proc. 12th SIAM Int. Conf. on Data Mining (SDM 2012)",
           url="https://doi.org/10.1137/1.9781611972825.90",
           bibkey="DBLP:conf/sdm/SchubertWZK12")
public class GreedyEnsembleExperiment
extends AbstractApplication
```
Class to load an outlier detection summary file, as produced by ComputeKNNOutlierScores, and compute a naive ensemble for it. Based on this initial estimation, and optimized ensemble is built using a greedy strategy. Starting with the best candidate only as initial ensemble, the most diverse candidate is investigated at each step. If it improves towards the (estimated) target vector, it is added, otherwise it is discarded.
This approach is naive, and it may be surprising that it can improve results. The reason is probably that diversity will result in a comparable ensemble, while the reduced ensemble size is actually responsible for the improvements, by being more decisive and less noisy due to dropping "unhelpful" members.
This still leaves quite a bit of room for improvement. If you build upon this basic approach, please acknowledge our proof of concept work.
Reference:
Erich Schubert, Remigius Wojdanowski, Arthur Zimek, Hans-Peter Kriegel
On Evaluation of Outlier Rankings and Outlier Scores
Proc. 12th SIAM Int. Conf. on Data Mining (SDM 2012)

Since:

0.5.0

Author:

Erich Schubert

Nested Class Summary

Nested Classes
Modifier and Type Class Description

static class GreedyEnsembleExperiment.Distance
Distance modes.

static class GreedyEnsembleExperiment.Par
Parameterization class.

Field Summary

Fields
Modifier and Type	Field	Description
`(package private) GreedyEnsembleExperiment.Distance`	`distance`	Distance in use.
`private InputStep`	`inputstep`	The data input part.
`private static Logging`	`LOG`	Get static logger.
`(package private) int`	`minvote`	Minimum votes.
`(package private) ScalingFunction`	`prescaling`	Outlier scaling to apply during preprocessing.
`(package private) double`	`rate`	Expected rate of outliers.
`(package private) boolean`	`refine_truth`	Variant, where the truth vector is also updated.
`(package private) ScalingFunction`	`scaling`	Outlier scaling to apply to constructed ensembles.
`(package private) EnsembleVoting`	`voting`	Ensemble voting method.

Fields inherited from class elki.application.AbstractApplication
REFERENCE, VERSION

Constructor Summary

Constructors
Constructor	Description
`GreedyEnsembleExperiment(InputStep inputstep, EnsembleVoting voting, GreedyEnsembleExperiment.Distance distance, ScalingFunction prescaling, ScalingFunction scaling, double rate)`	Constructor.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`static Relation<NumberVector>`	`applyPrescaling(ScalingFunction scaling, Relation<NumberVector> relation, DBIDs skip)`	Prescale each vector (except when in `skip`) with the given scaling function.
`private static void`	`applyScaling(double[] raw, ScalingFunction scaling)`
`(package private) double`	`gain(double score, double ref, double optimal)`	Compute the gain coefficient.
`private PrimitiveDistance<NumberVector>`	`getDistance(double[] estimated_weights)`
`static void`	`main(java.lang.String[] args)`	Main method.
`void`	`run()`	Runs the application.
`protected void`	`singleEnsemble(double[] ensemble, NumberVector vec)`	Build a single-element "ensemble".
`protected void`	`updateEstimations(int[] outliers, int numoutliers, double[] weights, double[] truth)`

Methods inherited from class elki.application.AbstractApplication
printErrorMessage, runCLIApplication, usage

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail
- LOG
```
private static final Logging LOG
```
  Get static logger.
- inputstep
```
private InputStep inputstep
```
  The data input part.
- refine_truth
```
boolean refine_truth
```
  Variant, where the truth vector is also updated.
- voting
```
EnsembleVoting voting
```
  Ensemble voting method.
- prescaling
```
ScalingFunction prescaling
```
  Outlier scaling to apply during preprocessing.
- scaling
```
ScalingFunction scaling
```
  Outlier scaling to apply to constructed ensembles.
- rate
```
double rate
```
  Expected rate of outliers.
- minvote
```
int minvote
```
  Minimum votes.
- distance
```
GreedyEnsembleExperiment.Distance distance
```
  Distance in use.

Constructor Detail

GreedyEnsembleExperiment

public GreedyEnsembleExperiment(InputStep inputstep,
                                EnsembleVoting voting,
                                GreedyEnsembleExperiment.Distance distance,
                                ScalingFunction prescaling,
                                ScalingFunction scaling,
                                double rate)

Constructor.

Parameters:: inputstep - Input step; voting - Ensemble voting; distance - Distance function; prescaling - Scaling to apply to input data; scaling - Scaling to apply to ensemble members; rate - Expected rate of outliers

Method Detail

run
```
public void run()
```
Description copied from class: AbstractApplication

Runs the application.

Specified by:

run in class AbstractApplication

singleEnsemble

protected void singleEnsemble(double[] ensemble,
                              NumberVector vec)

Build a single-element "ensemble".

Parameters:: ensemble -; vec -

applyPrescaling

public static Relation<NumberVector> applyPrescaling(ScalingFunction scaling,
                                                     Relation<NumberVector> relation,
                                                     DBIDs skip)

Prescale each vector (except when in skip) with the given scaling function.

Parameters:: scaling - Scaling function; relation - Relation to read; skip - DBIDs to pass unmodified
Returns:: New relation

applyScaling

private static void applyScaling(double[] raw,
                                 ScalingFunction scaling)

updateEstimations

protected void updateEstimations(int[] outliers,
                                 int numoutliers,
                                 double[] weights,
                                 double[] truth)

getDistance

private PrimitiveDistance<NumberVector> getDistance(double[] estimated_weights)

gain
```
double gain(double score,
            double ref,
            double optimal)
```
Compute the gain coefficient.

Parameters:

score - New score

ref - Reference score

optimal - Maximum score possible

Returns:

Gain

main
```
public static void main(java.lang.String[] args)
```
Main method.

Parameters:

args - Command line parameters.

Modifier and Type	Class	Description
`static class`	`GreedyEnsembleExperiment.Distance`	Distance modes.
`static class`	`GreedyEnsembleExperiment.Par`	Parameterization class.

Class GreedyEnsembleExperiment

Nested Class Summary

Field Summary

Fields inherited from class elki.application.AbstractApplication

Constructor Summary

Method Summary

Methods inherited from class elki.application.AbstractApplication

Methods inherited from class java.lang.Object

Field Detail

LOG

inputstep

refine_truth

voting

prescaling

scaling

rate

minvote

distance

Constructor Detail

GreedyEnsembleExperiment

Method Detail

run

singleEnsemble

applyPrescaling

applyScaling

updateEstimations

getDistance

gain

main