Class GeneratorMain
- java.lang.Object
-
- elki.data.synthetic.bymodel.GeneratorMain
-
public class GeneratorMain extends java.lang.Object
Generate a data set according to a given model.Key idea of this generator is to re-generate points if they are more likely to belong to a different cluster than the one they were generated for. The benefit is that we should end up with a data set that follows closely the model that we specified.
The drawbacks are that on one hand, specifications might be unsatisfiable. For this a retry count is kept and an
AbortException
is thrown when the maximum number of retries is exceeded.On the other hand, the model might not be exactly as specified. When the generator reports an "Density correction factor estimation" that differs from 1.0 this is an indication that the result is not exact.
On the third hand, rejecting points introduces effects where one generator can influence others, so random generator results will not be stable with respect to the addition of new dimensions and similar if there are any rejects involved. So this generator is not entirely optimal for generating data sets for scalability tests on the number of dimensions, although if clusters overlap little enough (so that no rejects happen) the results should be as expected.
- Since:
- 0.2
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private class
GeneratorMain.AssignLabelsByDensity
Reassign objects in certain labels; but also always test against the model.private class
GeneratorMain.AssignLabelsByDistance
Reassign objects in certain labels; but also always test against the model.private class
GeneratorMain.TestModel
Reject objects with a higher density in another generator.
-
Field Summary
Fields Modifier and Type Field Description protected java.util.ArrayList<GeneratorInterface>
generators
List of clusters to generate.private static Logging
LOG
Class logger.protected java.util.regex.Pattern
relabelClusters
Pattern, which clusters (e.g., "Noise") to relabel by the second best cluster.protected boolean
relabelDistance
Relabel objects by distance.protected boolean
testAgainstModel
Controls whether points are tested against the model during generation.
-
Constructor Summary
Constructors Constructor Description GeneratorMain()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addCluster(GeneratorInterface c)
Add a cluster to the cluster list.MultipleObjectsBundle
generate()
Main loop to generate data set.java.util.List<GeneratorInterface>
getGenerators()
Access the generators.private void
initLabelsAndModels(java.util.ArrayList<GeneratorInterface> generators, ClassLabel[] labels, Model[] models, java.util.regex.Pattern reassign)
Initialize cluster labels and models.boolean
isTestAgainstModel()
Return value of thetestAgainstModel
flag.void
setReassignByDistance(boolean bydistance)
Relabel objects by distance, instead of by density.void
setReassignPattern(java.util.regex.Pattern reassign)
Set the reassignment pattern.void
setTestAgainstModel(boolean testAgainstModel)
Set the value of thetestAgainstModel
flag.
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger.
-
generators
protected java.util.ArrayList<GeneratorInterface> generators
List of clusters to generate.
-
testAgainstModel
protected boolean testAgainstModel
Controls whether points are tested against the model during generation.
-
relabelClusters
protected java.util.regex.Pattern relabelClusters
Pattern, which clusters (e.g., "Noise") to relabel by the second best cluster.
-
relabelDistance
protected boolean relabelDistance
Relabel objects by distance.
-
-
Method Detail
-
addCluster
public void addCluster(GeneratorInterface c)
Add a cluster to the cluster list.- Parameters:
c
- cluster to add
-
generate
public MultipleObjectsBundle generate()
Main loop to generate data set.- Returns:
- Generated data set
-
initLabelsAndModels
private void initLabelsAndModels(java.util.ArrayList<GeneratorInterface> generators, ClassLabel[] labels, Model[] models, java.util.regex.Pattern reassign)
Initialize cluster labels and models.Clusters that are set to "reassign" will have their labels set to null, or if there is only one possible reassignment, to this target label.
- Parameters:
generators
- Cluster generatorslabels
- Labels (output)models
- Models (output)reassign
- Pattern for clusters to reassign.
-
isTestAgainstModel
public boolean isTestAgainstModel()
Return value of thetestAgainstModel
flag.- Returns:
- value of testAgainstModel
-
setTestAgainstModel
public void setTestAgainstModel(boolean testAgainstModel)
Set the value of thetestAgainstModel
flag.- Parameters:
testAgainstModel
- New value
-
getGenerators
public java.util.List<GeneratorInterface> getGenerators()
Access the generators.- Returns:
- generators
-
setReassignPattern
public void setReassignPattern(java.util.regex.Pattern reassign)
Set the reassignment pattern.- Parameters:
reassign
- Reassignment pattern.
-
setReassignByDistance
public void setReassignByDistance(boolean bydistance)
Relabel objects by distance, instead of by density.- Parameters:
bydistance
- Boolean when to use distances.
-
-