Class ClusterMergeHistory

  • Direct Known Subclasses:
    ClusterDensityMergeHistory, ClusterPrototypeMergeHistory

    public class ClusterMergeHistory
    extends java.lang.Object
    Merge history representing a hierarchical clustering.

    In this model, each merge is represented by a four-tuple of (i,j,d,s) where i and j are the two previous clusters, d is the merge distance, and s is the resulting cluster size. The first n cluster ids are the initial data points, subsequent numbers n to 2n-1 indicate the clusters generated by performing the subsequent merges. It is not allowed to reference a later merge, however it is not guaranteed that the distances d come in increasing order because of irreducible linkages (centroid and median linkage).

    As some algorithms (such as the extraction by dendrogram height) rely on this order, they might produce unexpected results then. Also, some algorithms such as NNchain can produce merge histories without back references for other linkages because they operate locally. These are automatically reordered by ClusterMergeHistoryBuilder.optimizeOrder().

    Since:
    0.8.0
    Author:
    Erich Schubert
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected double[] distances
      Distance to the parent object.
      protected ArrayDBIDs ids
      The initial DBIDs
      (package private) boolean isSquared
      Flag to indicate squared values
      protected int[] merges
      Store merge order (two cluster references per merge).
      protected int[] positions
      Positions for layouting.
      protected int[] sizes
      Cluster size storage.
    • Constructor Summary

      Constructors 
      Constructor Description
      ClusterMergeHistory​(ArrayDBIDs ids, int[] merges, double[] distances, int[] sizes, boolean isSquared)
      Constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      DBIDVar assignVar​(int i, DBIDVar var)
      Access the i'th singleton via a variable.
      ArrayDBIDs getDBIDs()
      Get the object ids in this clustering.
      int getMergeA​(int i)
      Get the first partner of merge i.
      int getMergeB​(int i)
      Get the second partner of merge i.
      double getMergeHeight​(int i)
      Get merge distance / height
      int[] getPositions()
      Get / compute the positions.
      int getSize​(int i)
      Get the size of the cluster merged in step i.
      boolean isSquared()
      Indicate whether the stored values are squared values.
      int numMerges()
      Number of merges, usually n-1.
      int size()
      Number of elements clustered.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • distances

        protected double[] distances
        Distance to the parent object.
      • sizes

        protected int[] sizes
        Cluster size storage.
      • merges

        protected int[] merges
        Store merge order (two cluster references per merge).
      • positions

        protected int[] positions
        Positions for layouting.
      • isSquared

        boolean isSquared
        Flag to indicate squared values
    • Constructor Detail

      • ClusterMergeHistory

        public ClusterMergeHistory​(ArrayDBIDs ids,
                                   int[] merges,
                                   double[] distances,
                                   int[] sizes,
                                   boolean isSquared)
        Constructor.
        Parameters:
        ids - Initial object ids
        merges - Merge history 2*(N-1) values
        distances - Distances
        sizes - Cluster sizes
        isSquared - If distances are squared distances
    • Method Detail

      • assignVar

        public DBIDVar assignVar​(int i,
                                 DBIDVar var)
        Access the i'th singleton via a variable.

        Note: variables are used here to avoid reallocations.

        TODO: use an array iterator instead?

        Parameters:
        i - Index
        var - Variable
        Returns:
        Value
      • getMergeA

        public int getMergeA​(int i)
        Get the first partner of merge i.
        Parameters:
        i - Merge number
        Returns:
        First cluster id to be merged
      • getMergeB

        public int getMergeB​(int i)
        Get the second partner of merge i.
        Parameters:
        i - Merge number
        Returns:
        Second cluster id to be merged
      • getMergeHeight

        public double getMergeHeight​(int i)
        Get merge distance / height
        Parameters:
        i - Merge index
        Returns:
        Height
      • getSize

        public int getSize​(int i)
        Get the size of the cluster merged in step i.
        Parameters:
        i - Step number
        Returns:
        Cluster size
      • size

        public int size()
        Number of elements clustered.
        Returns:
        Size of ids
      • numMerges

        public int numMerges()
        Number of merges, usually n-1.

        Note: currently untested to use incomplete results.

        Returns:
        Number of merges
      • isSquared

        public boolean isSquared()
        Indicate whether the stored values are squared values.
        Returns:
        boolean flag
      • getDBIDs

        public ArrayDBIDs getDBIDs()
        Get the object ids in this clustering.
        Returns:
        object ids
      • getPositions

        public int[] getPositions()
        Get / compute the positions.
        Returns:
        Dendrogram positions