Package elki.application.greedyensemble
Class GreedyEnsembleExperiment
- java.lang.Object
-
- elki.application.AbstractApplication
-
- elki.application.greedyensemble.GreedyEnsembleExperiment
-
@Reference(authors="Erich Schubert, Remigius Wojdanowski, Arthur Zimek, Hans-Peter Kriegel", title="On Evaluation of Outlier Rankings and Outlier Scores", booktitle="Proc. 12th SIAM Int. Conf. on Data Mining (SDM 2012)", url="https://doi.org/10.1137/1.9781611972825.90", bibkey="DBLP:conf/sdm/SchubertWZK12") public class GreedyEnsembleExperiment extends AbstractApplication
Class to load an outlier detection summary file, as produced byComputeKNNOutlierScores
, and compute a naive ensemble for it. Based on this initial estimation, and optimized ensemble is built using a greedy strategy. Starting with the best candidate only as initial ensemble, the most diverse candidate is investigated at each step. If it improves towards the (estimated) target vector, it is added, otherwise it is discarded.This approach is naive, and it may be surprising that it can improve results. The reason is probably that diversity will result in a comparable ensemble, while the reduced ensemble size is actually responsible for the improvements, by being more decisive and less noisy due to dropping "unhelpful" members.
This still leaves quite a bit of room for improvement. If you build upon this basic approach, please acknowledge our proof of concept work.
Reference:
Erich Schubert, Remigius Wojdanowski, Arthur Zimek, Hans-Peter Kriegel
On Evaluation of Outlier Rankings and Outlier Scores
Proc. 12th SIAM Int. Conf. on Data Mining (SDM 2012)- Since:
- 0.5.0
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
GreedyEnsembleExperiment.Distance
Distance modes.static class
GreedyEnsembleExperiment.Par
Parameterization class.
-
Field Summary
Fields Modifier and Type Field Description (package private) GreedyEnsembleExperiment.Distance
distance
Distance in use.private InputStep
inputstep
The data input part.private static Logging
LOG
Get static logger.(package private) int
minvote
Minimum votes.(package private) ScalingFunction
prescaling
Outlier scaling to apply during preprocessing.(package private) double
rate
Expected rate of outliers.(package private) boolean
refine_truth
Variant, where the truth vector is also updated.(package private) ScalingFunction
scaling
Outlier scaling to apply to constructed ensembles.(package private) EnsembleVoting
voting
Ensemble voting method.-
Fields inherited from class elki.application.AbstractApplication
REFERENCE, VERSION
-
-
Constructor Summary
Constructors Constructor Description GreedyEnsembleExperiment(InputStep inputstep, EnsembleVoting voting, GreedyEnsembleExperiment.Distance distance, ScalingFunction prescaling, ScalingFunction scaling, double rate)
Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static Relation<NumberVector>
applyPrescaling(ScalingFunction scaling, Relation<NumberVector> relation, DBIDs skip)
Prescale each vector (except when inskip
) with the given scaling function.private static void
applyScaling(double[] raw, ScalingFunction scaling)
(package private) double
gain(double score, double ref, double optimal)
Compute the gain coefficient.private PrimitiveDistance<NumberVector>
getDistance(double[] estimated_weights)
static void
main(java.lang.String[] args)
Main method.void
run()
Runs the application.protected void
singleEnsemble(double[] ensemble, NumberVector vec)
Build a single-element "ensemble".protected void
updateEstimations(int[] outliers, int numoutliers, double[] weights, double[] truth)
-
Methods inherited from class elki.application.AbstractApplication
printErrorMessage, runCLIApplication, usage
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Get static logger.
-
inputstep
private InputStep inputstep
The data input part.
-
refine_truth
boolean refine_truth
Variant, where the truth vector is also updated.
-
voting
EnsembleVoting voting
Ensemble voting method.
-
prescaling
ScalingFunction prescaling
Outlier scaling to apply during preprocessing.
-
scaling
ScalingFunction scaling
Outlier scaling to apply to constructed ensembles.
-
rate
double rate
Expected rate of outliers.
-
minvote
int minvote
Minimum votes.
-
distance
GreedyEnsembleExperiment.Distance distance
Distance in use.
-
-
Constructor Detail
-
GreedyEnsembleExperiment
public GreedyEnsembleExperiment(InputStep inputstep, EnsembleVoting voting, GreedyEnsembleExperiment.Distance distance, ScalingFunction prescaling, ScalingFunction scaling, double rate)
Constructor.- Parameters:
inputstep
- Input stepvoting
- Ensemble votingdistance
- Distance functionprescaling
- Scaling to apply to input datascaling
- Scaling to apply to ensemble membersrate
- Expected rate of outliers
-
-
Method Detail
-
run
public void run()
Description copied from class:AbstractApplication
Runs the application.- Specified by:
run
in classAbstractApplication
-
singleEnsemble
protected void singleEnsemble(double[] ensemble, NumberVector vec)
Build a single-element "ensemble".- Parameters:
ensemble
-vec
-
-
applyPrescaling
public static Relation<NumberVector> applyPrescaling(ScalingFunction scaling, Relation<NumberVector> relation, DBIDs skip)
Prescale each vector (except when inskip
) with the given scaling function.- Parameters:
scaling
- Scaling functionrelation
- Relation to readskip
- DBIDs to pass unmodified- Returns:
- New relation
-
applyScaling
private static void applyScaling(double[] raw, ScalingFunction scaling)
-
updateEstimations
protected void updateEstimations(int[] outliers, int numoutliers, double[] weights, double[] truth)
-
getDistance
private PrimitiveDistance<NumberVector> getDistance(double[] estimated_weights)
-
gain
double gain(double score, double ref, double optimal)
Compute the gain coefficient.- Parameters:
score
- New scoreref
- Reference scoreoptimal
- Maximum score possible- Returns:
- Gain
-
main
public static void main(java.lang.String[] args)
Main method.- Parameters:
args
- Command line parameters.
-
-