Package elki.datasource.filter.transform
Class PerturbationFilter<V extends NumberVector>
- java.lang.Object
-
- elki.datasource.filter.AbstractConversionFilter<I,O>
-
- elki.datasource.filter.AbstractVectorConversionFilter<V,V>
-
- elki.datasource.filter.transform.PerturbationFilter<V>
-
- All Implemented Interfaces:
ObjectFilter
@Title("Data Perturbation for Outlier Detection Ensembles") @Description("A filter to perturb a datasset on read by an additive noise component, implemented for use in an outlier ensemble (this reference).") @Reference(authors="A. Zimek, R. J. G. B. Campello, J. Sander", title="Data Perturbation for Outlier Detection Ensembles", booktitle="Proc. 26th International Conference on Scientific and Statistical Database Management (SSDBM), Aalborg, Denmark, 2014", url="https://doi.org/10.1145/2618243.2618257", bibkey="DBLP:conf/ssdbm/ZimekCS14") public class PerturbationFilter<V extends NumberVector> extends AbstractVectorConversionFilter<V,V>
A filter to perturb the values by adding micro-noise.The added noise is generated, attribute-wise, by a Gaussian with mean=0 and a specified standard deviation or by a uniform distribution with a specified range. The standard deviation or the range can be scaled, attribute-wise, to a given percentage of the original standard deviation in the data distribution (assuming a Gaussian distribution there), or to a percentage of the extension in each attribute (
maximumValue - minimumValue).This filter has a potentially wide use but has been implemented for the following publication:
Reference:
A. Zimek, R. J. G. B. Campello, J. Sander
Data Perturbation for Outlier Detection Ensemble
Proc. 26th Int. Conf. on Scientific and Statistical Database Management (SSDBM 2014)- Since:
- 0.7.0
- Author:
- Arthur Zimek
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classPerturbationFilter.NoiseDistributionNature of the noise distribution.static classPerturbationFilter.Par<V extends NumberVector>Parameterization class.static classPerturbationFilter.ScalingReferenceScaling reference options.
-
Field Summary
Fields Modifier and Type Field Description private intdimensionalityStores the dimensionality from the preprocessing.private static LoggingLOGClass loggerprivate double[]maximaStores the maximum in each dimension.private double[]minimaStores the minimum in each dimension.private MeanVarianceMinMax[]mvsTemporary storage used during initialization.private PerturbationFilter.NoiseDistributionnoisedistributionNature of the noise distribution.private doublepercentagePercentage of the variance of the random noise generation, given the variance of the corresponding attribute in the data.private java.util.RandomRANDOMRandom object to generate the attribute-wise seeds for the noise.private java.util.Random[]randomPerAttributeThe random objects to generate noise distributions independently for each attribute.private PerturbationFilter.ScalingReferencescalingreferenceWhich reference to use for scaling the noise.private double[]scalingreferencevaluesStores the scaling reference in each dimension.-
Fields inherited from class elki.datasource.filter.AbstractVectorConversionFilter
factory
-
-
Constructor Summary
Constructors Constructor Description PerturbationFilter(java.lang.Long seed, double percentage, PerturbationFilter.ScalingReference scalingreference, double[] minima, double[] maxima, PerturbationFilter.NoiseDistribution noisedistribution)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected SimpleTypeInformation<? super V>convertedType(SimpleTypeInformation<V> in)Get the output type from the input type after conversion.protected VfilterSingleObject(V featureVector)Normalize a single instance.protected SimpleTypeInformation<? super V>getInputTypeRestriction()Get the input type restriction used for negotiating the data query.protected LogginggetLogger()Class logger.protected voidprepareComplete()Complete the initialization phase.protected voidprepareProcessInstance(V featureVector)Process a single object during initialization.protected booleanprepareStart(SimpleTypeInformation<V> in)Return "true" when the normalization needs initialization (two-pass filtering!).-
Methods inherited from class elki.datasource.filter.AbstractVectorConversionFilter
initializeOutputType
-
Methods inherited from class elki.datasource.filter.AbstractConversionFilter
filter, toString
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger
-
scalingreference
private PerturbationFilter.ScalingReference scalingreference
Which reference to use for scaling the noise.
-
noisedistribution
private PerturbationFilter.NoiseDistribution noisedistribution
Nature of the noise distribution.
-
RANDOM
private final java.util.Random RANDOM
Random object to generate the attribute-wise seeds for the noise.
-
percentage
private double percentage
Percentage of the variance of the random noise generation, given the variance of the corresponding attribute in the data.
-
mvs
private MeanVarianceMinMax[] mvs
Temporary storage used during initialization.
-
scalingreferencevalues
private double[] scalingreferencevalues
Stores the scaling reference in each dimension.
-
randomPerAttribute
private java.util.Random[] randomPerAttribute
The random objects to generate noise distributions independently for each attribute.
-
maxima
private double[] maxima
Stores the maximum in each dimension.
-
minima
private double[] minima
Stores the minimum in each dimension.
-
dimensionality
private int dimensionality
Stores the dimensionality from the preprocessing.
-
-
Constructor Detail
-
PerturbationFilter
public PerturbationFilter(java.lang.Long seed, double percentage, PerturbationFilter.ScalingReference scalingreference, double[] minima, double[] maxima, PerturbationFilter.NoiseDistribution noisedistribution)Constructor.- Parameters:
seed- Seed value, may benullfor a random seed.percentage- Relative amount of jitter to addscalingreference- Scaling referenceminima- Preset minimum values. May benull.maxima- Preset maximum values. May benull.noisedistribution- Nature of the noise distribution.
-
-
Method Detail
-
prepareStart
protected boolean prepareStart(SimpleTypeInformation<V> in)
Description copied from class:AbstractConversionFilterReturn "true" when the normalization needs initialization (two-pass filtering!).- Overrides:
prepareStartin classAbstractConversionFilter<V extends NumberVector,V extends NumberVector>- Parameters:
in- Input type information- Returns:
- true or false
-
prepareProcessInstance
protected void prepareProcessInstance(V featureVector)
Description copied from class:AbstractConversionFilterProcess a single object during initialization.- Overrides:
prepareProcessInstancein classAbstractConversionFilter<V extends NumberVector,V extends NumberVector>- Parameters:
featureVector- Object to process
-
prepareComplete
protected void prepareComplete()
Description copied from class:AbstractConversionFilterComplete the initialization phase.- Overrides:
prepareCompletein classAbstractConversionFilter<V extends NumberVector,V extends NumberVector>
-
getInputTypeRestriction
protected SimpleTypeInformation<? super V> getInputTypeRestriction()
Description copied from class:AbstractConversionFilterGet the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestrictionin classAbstractConversionFilter<V extends NumberVector,V extends NumberVector>- Returns:
- Type restriction
-
filterSingleObject
protected V filterSingleObject(V featureVector)
Description copied from class:AbstractConversionFilterNormalize a single instance. You can implement this as UnsupportedOperationException if you override both public "normalize" functions!- Specified by:
filterSingleObjectin classAbstractConversionFilter<V extends NumberVector,V extends NumberVector>- Parameters:
featureVector- Database object to normalize- Returns:
- Normalized database object
-
convertedType
protected SimpleTypeInformation<? super V> convertedType(SimpleTypeInformation<V> in)
Description copied from class:AbstractConversionFilterGet the output type from the input type after conversion.- Specified by:
convertedTypein classAbstractConversionFilter<V extends NumberVector,V extends NumberVector>- Parameters:
in- input type restriction- Returns:
- output type restriction
-
getLogger
protected Logging getLogger()
Description copied from class:AbstractConversionFilterClass logger.- Specified by:
getLoggerin classAbstractConversionFilter<V extends NumberVector,V extends NumberVector>- Returns:
- Logger
-
-