Class ByLabelClustering

  • All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<Model>>
    Direct Known Subclasses:
    ByLabelOrAllInOneClustering

    @Title("Clustering by label")
    @Description("Cluster points by a (pre-assigned!) label. For comparing results with a reference clustering.")
    @Priority(-100)
    public class ByLabelClustering
    extends java.lang.Object
    implements ClusteringAlgorithm<Clustering<Model>>
    Pseudo clustering using labels.

    This "algorithm" puts elements into the same cluster when they agree in their labels. I.e. it just uses a predefined clustering, and is mostly useful for testing and evaluation (e.g., comparing the result of a real algorithm to a reference result / golden standard).

    If an assignment of an object to multiple clusters is desired, the labels of the object indicating the clusters need to be separated by blanks and the flag ByLabelClustering.Par.MULTIPLE_ID needs to be set.

    TODO: handling of data sets with no labels?

    Since:
    0.2
    Author:
    Erich Schubert
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private boolean multiple
      Allow multiple cluster assignment.
      private java.util.regex.Pattern noisepattern
      Pattern to recognize noise clusters by.
    • Constructor Summary

      Constructors 
      Constructor Description
      ByLabelClustering()
      Constructor without parameters
      ByLabelClustering​(boolean multiple, java.util.regex.Pattern noisepattern)
      Constructor.
    • Field Detail

      • multiple

        private boolean multiple
        Allow multiple cluster assignment.
      • noisepattern

        private java.util.regex.Pattern noisepattern
        Pattern to recognize noise clusters by.
    • Constructor Detail

      • ByLabelClustering

        public ByLabelClustering​(boolean multiple,
                                 java.util.regex.Pattern noisepattern)
        Constructor.
        Parameters:
        multiple - Allow multiple cluster assignments
        noisepattern - Noise pattern
      • ByLabelClustering

        public ByLabelClustering()
        Constructor without parameters
    • Method Detail

      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in interface Algorithm
        Returns:
        Type restriction
      • run

        public Clustering<Model> run​(Relation<?> relation)
        Run the actual clustering algorithm.
        Parameters:
        relation - The data input we use
      • singleAssignment

        private java.util.HashMap<java.lang.String,​DBIDs> singleAssignment​(Relation<?> data)
        Assigns the objects of the database to single clusters according to their labels.
        Parameters:
        data - the database storing the objects
        Returns:
        a mapping of labels to ids
      • multipleAssignment

        private java.util.HashMap<java.lang.String,​DBIDs> multipleAssignment​(Relation<?> data)
        Assigns the objects of the database to multiple clusters according to their labels.
        Parameters:
        data - the database storing the objects
        Returns:
        a mapping of labels to ids
      • assign

        private void assign​(java.util.HashMap<java.lang.String,​DBIDs> labelMap,
                            java.lang.String label,
                            DBIDRef id)
        Assigns the specified id to the labelMap according to its label
        Parameters:
        labelMap - the mapping of label to ids
        label - the label of the object to be assigned
        id - the id of the object to be assigned