Package elki.datasource.filter.transform
Class PerturbationFilter<V extends NumberVector>
- java.lang.Object
-
- elki.datasource.filter.AbstractConversionFilter<I,O>
-
- elki.datasource.filter.AbstractVectorConversionFilter<V,V>
-
- elki.datasource.filter.transform.PerturbationFilter<V>
-
- All Implemented Interfaces:
ObjectFilter
@Title("Data Perturbation for Outlier Detection Ensembles") @Description("A filter to perturb a datasset on read by an additive noise component, implemented for use in an outlier ensemble (this reference).") @Reference(authors="A. Zimek, R. J. G. B. Campello, J. Sander", title="Data Perturbation for Outlier Detection Ensembles", booktitle="Proc. 26th International Conference on Scientific and Statistical Database Management (SSDBM), Aalborg, Denmark, 2014", url="https://doi.org/10.1145/2618243.2618257", bibkey="DBLP:conf/ssdbm/ZimekCS14") public class PerturbationFilter<V extends NumberVector> extends AbstractVectorConversionFilter<V,V>
A filter to perturb the values by adding micro-noise.The added noise is generated, attribute-wise, by a Gaussian with mean=0 and a specified standard deviation or by a uniform distribution with a specified range. The standard deviation or the range can be scaled, attribute-wise, to a given percentage of the original standard deviation in the data distribution (assuming a Gaussian distribution there), or to a percentage of the extension in each attribute (
maximumValue - minimumValue
).This filter has a potentially wide use but has been implemented for the following publication:
Reference:
A. Zimek, R. J. G. B. Campello, J. Sander
Data Perturbation for Outlier Detection Ensemble
Proc. 26th Int. Conf. on Scientific and Statistical Database Management (SSDBM 2014)- Since:
- 0.7.0
- Author:
- Arthur Zimek
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
PerturbationFilter.NoiseDistribution
Nature of the noise distribution.static class
PerturbationFilter.Par<V extends NumberVector>
Parameterization class.static class
PerturbationFilter.ScalingReference
Scaling reference options.
-
Field Summary
Fields Modifier and Type Field Description private int
dimensionality
Stores the dimensionality from the preprocessing.private static Logging
LOG
Class loggerprivate double[]
maxima
Stores the maximum in each dimension.private double[]
minima
Stores the minimum in each dimension.private MeanVarianceMinMax[]
mvs
Temporary storage used during initialization.private PerturbationFilter.NoiseDistribution
noisedistribution
Nature of the noise distribution.private double
percentage
Percentage of the variance of the random noise generation, given the variance of the corresponding attribute in the data.private java.util.Random
RANDOM
Random object to generate the attribute-wise seeds for the noise.private java.util.Random[]
randomPerAttribute
The random objects to generate noise distributions independently for each attribute.private PerturbationFilter.ScalingReference
scalingreference
Which reference to use for scaling the noise.private double[]
scalingreferencevalues
Stores the scaling reference in each dimension.-
Fields inherited from class elki.datasource.filter.AbstractVectorConversionFilter
factory
-
-
Constructor Summary
Constructors Constructor Description PerturbationFilter(java.lang.Long seed, double percentage, PerturbationFilter.ScalingReference scalingreference, double[] minima, double[] maxima, PerturbationFilter.NoiseDistribution noisedistribution)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected SimpleTypeInformation<? super V>
convertedType(SimpleTypeInformation<V> in)
Get the output type from the input type after conversion.protected V
filterSingleObject(V featureVector)
Normalize a single instance.protected SimpleTypeInformation<? super V>
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.protected Logging
getLogger()
Class logger.protected void
prepareComplete()
Complete the initialization phase.protected void
prepareProcessInstance(V featureVector)
Process a single object during initialization.protected boolean
prepareStart(SimpleTypeInformation<V> in)
Return "true" when the normalization needs initialization (two-pass filtering!).-
Methods inherited from class elki.datasource.filter.AbstractVectorConversionFilter
initializeOutputType
-
Methods inherited from class elki.datasource.filter.AbstractConversionFilter
filter, toString
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger
-
scalingreference
private PerturbationFilter.ScalingReference scalingreference
Which reference to use for scaling the noise.
-
noisedistribution
private PerturbationFilter.NoiseDistribution noisedistribution
Nature of the noise distribution.
-
RANDOM
private final java.util.Random RANDOM
Random object to generate the attribute-wise seeds for the noise.
-
percentage
private double percentage
Percentage of the variance of the random noise generation, given the variance of the corresponding attribute in the data.
-
mvs
private MeanVarianceMinMax[] mvs
Temporary storage used during initialization.
-
scalingreferencevalues
private double[] scalingreferencevalues
Stores the scaling reference in each dimension.
-
randomPerAttribute
private java.util.Random[] randomPerAttribute
The random objects to generate noise distributions independently for each attribute.
-
maxima
private double[] maxima
Stores the maximum in each dimension.
-
minima
private double[] minima
Stores the minimum in each dimension.
-
dimensionality
private int dimensionality
Stores the dimensionality from the preprocessing.
-
-
Constructor Detail
-
PerturbationFilter
public PerturbationFilter(java.lang.Long seed, double percentage, PerturbationFilter.ScalingReference scalingreference, double[] minima, double[] maxima, PerturbationFilter.NoiseDistribution noisedistribution)
Constructor.- Parameters:
seed
- Seed value, may benull
for a random seed.percentage
- Relative amount of jitter to addscalingreference
- Scaling referenceminima
- Preset minimum values. May benull
.maxima
- Preset maximum values. May benull
.noisedistribution
- Nature of the noise distribution.
-
-
Method Detail
-
prepareStart
protected boolean prepareStart(SimpleTypeInformation<V> in)
Description copied from class:AbstractConversionFilter
Return "true" when the normalization needs initialization (two-pass filtering!).- Overrides:
prepareStart
in classAbstractConversionFilter<V extends NumberVector,V extends NumberVector>
- Parameters:
in
- Input type information- Returns:
- true or false
-
prepareProcessInstance
protected void prepareProcessInstance(V featureVector)
Description copied from class:AbstractConversionFilter
Process a single object during initialization.- Overrides:
prepareProcessInstance
in classAbstractConversionFilter<V extends NumberVector,V extends NumberVector>
- Parameters:
featureVector
- Object to process
-
prepareComplete
protected void prepareComplete()
Description copied from class:AbstractConversionFilter
Complete the initialization phase.- Overrides:
prepareComplete
in classAbstractConversionFilter<V extends NumberVector,V extends NumberVector>
-
getInputTypeRestriction
protected SimpleTypeInformation<? super V> getInputTypeRestriction()
Description copied from class:AbstractConversionFilter
Get the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestriction
in classAbstractConversionFilter<V extends NumberVector,V extends NumberVector>
- Returns:
- Type restriction
-
filterSingleObject
protected V filterSingleObject(V featureVector)
Description copied from class:AbstractConversionFilter
Normalize a single instance. You can implement this as UnsupportedOperationException if you override both public "normalize" functions!- Specified by:
filterSingleObject
in classAbstractConversionFilter<V extends NumberVector,V extends NumberVector>
- Parameters:
featureVector
- Database object to normalize- Returns:
- Normalized database object
-
convertedType
protected SimpleTypeInformation<? super V> convertedType(SimpleTypeInformation<V> in)
Description copied from class:AbstractConversionFilter
Get the output type from the input type after conversion.- Specified by:
convertedType
in classAbstractConversionFilter<V extends NumberVector,V extends NumberVector>
- Parameters:
in
- input type restriction- Returns:
- output type restriction
-
getLogger
protected Logging getLogger()
Description copied from class:AbstractConversionFilter
Class logger.- Specified by:
getLogger
in classAbstractConversionFilter<V extends NumberVector,V extends NumberVector>
- Returns:
- Logger
-
-