Class GeneratorMain


  • public class GeneratorMain
    extends java.lang.Object
    Generate a data set according to a given model.

    Key idea of this generator is to re-generate points if they are more likely to belong to a different cluster than the one they were generated for. The benefit is that we should end up with a data set that follows closely the model that we specified.

    The drawbacks are that on one hand, specifications might be unsatisfiable. For this a retry count is kept and an AbortException is thrown when the maximum number of retries is exceeded.

    On the other hand, the model might not be exactly as specified. When the generator reports an "Density correction factor estimation" that differs from 1.0 this is an indication that the result is not exact.

    On the third hand, rejecting points introduces effects where one generator can influence others, so random generator results will not be stable with respect to the addition of new dimensions and similar if there are any rejects involved. So this generator is not entirely optimal for generating data sets for scalability tests on the number of dimensions, although if clusters overlap little enough (so that no rejects happen) the results should be as expected.

    Since:
    0.2
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
      • generators

        protected java.util.ArrayList<GeneratorInterface> generators
        List of clusters to generate.
      • testAgainstModel

        protected boolean testAgainstModel
        Controls whether points are tested against the model during generation.
      • relabelClusters

        protected java.util.regex.Pattern relabelClusters
        Pattern, which clusters (e.g., "Noise") to relabel by the second best cluster.
      • relabelDistance

        protected boolean relabelDistance
        Relabel objects by distance.
    • Constructor Detail

      • GeneratorMain

        public GeneratorMain()
    • Method Detail

      • addCluster

        public void addCluster​(GeneratorInterface c)
        Add a cluster to the cluster list.
        Parameters:
        c - cluster to add
      • generate

        public MultipleObjectsBundle generate()
        Main loop to generate data set.
        Returns:
        Generated data set
      • initLabelsAndModels

        private void initLabelsAndModels​(java.util.ArrayList<GeneratorInterface> generators,
                                         ClassLabel[] labels,
                                         Model[] models,
                                         java.util.regex.Pattern reassign)
        Initialize cluster labels and models.

        Clusters that are set to "reassign" will have their labels set to null, or if there is only one possible reassignment, to this target label.

        Parameters:
        generators - Cluster generators
        labels - Labels (output)
        models - Models (output)
        reassign - Pattern for clusters to reassign.
      • isTestAgainstModel

        public boolean isTestAgainstModel()
        Return value of the testAgainstModel flag.
        Returns:
        value of testAgainstModel
      • setTestAgainstModel

        public void setTestAgainstModel​(boolean testAgainstModel)
        Set the value of the testAgainstModel flag.
        Parameters:
        testAgainstModel - New value
      • getGenerators

        public java.util.List<GeneratorInterface> getGenerators()
        Access the generators.
        Returns:
        generators
      • setReassignPattern

        public void setReassignPattern​(java.util.regex.Pattern reassign)
        Set the reassignment pattern.
        Parameters:
        reassign - Reassignment pattern.
      • setReassignByDistance

        public void setReassignByDistance​(boolean bydistance)
        Relabel objects by distance, instead of by density.
        Parameters:
        bydistance - Boolean when to use distances.