Class ClusterMergeHistoryBuilder


  • public class ClusterMergeHistoryBuilder
    extends java.lang.Object
    Class to help building a pointer hierarchy.
    Since:
    0.7.1
    Author:
    Erich Schubert
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected int[] csize
      Cluster size storage.
      protected ArrayDBIDs ids
      The DBIDs in this result.
      protected boolean isSquared
      Flag to indicate squared distances.
      private static Logging LOG
      Class logger.
      protected int mergecount
      Merge counter (for merge ordering).
      protected double[] mergeDistance
      Distance to the parent object.
      protected int[] merges
      Store merge order.
      protected int[] parent
      Parent for union-find, may be uninitialized.
      protected ArrayModifiableDBIDs prototypes
      Prototype storage, may be null.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int add​(int i, double dist, int j)
      A more robust "add" operation (involving a union-find) where we may use arbitrary objects i and j to refer to clusters, not only the largest ID in each cluster.
      private int addRecursive​(int[] order, int size, byte[] seen, int it, int n)
      Recursively add merges (children first) to the order, to obtain a monotone ordering.
      private boolean checkMonotone()
      Check if merge distances are monotone.
      ClusterMergeHistory complete()
      Finalize the result.
      ClusterDensityMergeHistory complete​(WritableDoubleDataStore coredists)
      Build a result with additional coredists information.
      int getSize​(int id)
      Get the cluster size of the current object.
      int[] optimizeOrder()
      Find an optimized processing order, where merges are by ascending depth where possible (not guaranteed for non-reducible linkages).
      void setSize​(int id, int size)
      Set the cluster size of an object.
      int strictAdd​(int source, double distance, int target)
      Add a merge to the pointer representation.
      int strictAdd​(int source, double distance, int target, DBIDRef prototype)
      Add an element to the pointer representation.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
      • ids

        protected final ArrayDBIDs ids
        The DBIDs in this result.
      • mergeDistance

        protected double[] mergeDistance
        Distance to the parent object.
      • csize

        protected int[] csize
        Cluster size storage.
      • merges

        protected int[] merges
        Store merge order.
      • parent

        protected int[] parent
        Parent for union-find, may be uninitialized.
      • mergecount

        protected int mergecount
        Merge counter (for merge ordering).
      • isSquared

        protected boolean isSquared
        Flag to indicate squared distances.
    • Constructor Detail

      • ClusterMergeHistoryBuilder

        public ClusterMergeHistoryBuilder​(ArrayDBIDs ids,
                                          boolean isSquared)
        Constructor.
        Parameters:
        ids - IDs
        isSquared - Flag to indicate squared distances
    • Method Detail

      • add

        public int add​(int i,
                       double dist,
                       int j)
        A more robust "add" operation (involving a union-find) where we may use arbitrary objects i and j to refer to clusters, not only the largest ID in each cluster.

        TODO: further improve the union-find used here?

        Parameters:
        i - First cluster
        dist - Link distance
        j - Second cluster
        Returns:
        new cluster id
      • strictAdd

        public int strictAdd​(int source,
                             double distance,
                             int target)
        Add a merge to the pointer representation. This API requires that the source object is not linked yet, and has a smaller ID than the target, because of the pointer structure representation used by SLINK.
        Parameters:
        source - Current object
        distance - Link distance
        target - Parent
        Returns:
      • strictAdd

        public int strictAdd​(int source,
                             double distance,
                             int target,
                             DBIDRef prototype)
        Add an element to the pointer representation.
        Parameters:
        source - Current object
        distance - Link distance
        target - Parent
        prototype - Cluster prototype
      • complete

        public ClusterMergeHistory complete()
        Finalize the result.
        Returns:
        Completed result
      • getSize

        public int getSize​(int id)
        Get the cluster size of the current object.
        Parameters:
        id - cluster id
        Returns:
        Cluster size (initially 1).
      • setSize

        public void setSize​(int id,
                            int size)
        Set the cluster size of an object.
        Parameters:
        id - Object to set
        size - Cluster size
      • optimizeOrder

        public int[] optimizeOrder()
        Find an optimized processing order, where merges are by ascending depth where possible (not guaranteed for non-reducible linkages).

        E.g., (a,b,2), (c,d,1), (ab,cd,3) is a valid merge sequence, but not well ordered because the n largest splits do not come last.

        Returns:
        reverse map to the optimized order, or null if well ordered, in case you need to remap additional data structures.
      • checkMonotone

        private boolean checkMonotone()
        Check if merge distances are monotone.
        Returns:
        boolean result
      • addRecursive

        private int addRecursive​(int[] order,
                                 int size,
                                 byte[] seen,
                                 int it,
                                 int n)
        Recursively add merges (children first) to the order, to obtain a monotone ordering. As we processed entries with smaller distances first, this usually will not require much recursion, and subtrees with smaller height should already come first. Using the max height helps with inversions in irreducible linkages.
        Parameters:
        order - Output order
        size - Current size
        seen - Mask indicating processed entries
        it - Current entry
        n - Number of primary objects
        Returns:
        Size after additions