Class NearestNeighborAffinityMatrixBuilder<O>

  • Type Parameters:
    O - Object type
    All Implemented Interfaces:
    AffinityMatrixBuilder<O>
    Direct Known Subclasses:
    IntrinsicNearestNeighborAffinityMatrixBuilder

    @Reference(authors="L. J. P. van der Maaten",
               title="Accelerating t-SNE using Tree-Based Algorithms",
               booktitle="Journal of Machine Learning Research 15",
               url="http://dl.acm.org/citation.cfm?id=2697068",
               bibkey="DBLP:journals/jmlr/Maaten14")
    public class NearestNeighborAffinityMatrixBuilder<O>
    extends PerplexityAffinityMatrixBuilder<O>
    Build sparse affinity matrix using the nearest neighbors only.

    Reference:

    L. J. P. van der Maaten
    Accelerating t-SNE using Tree-Based Algorithms
    Journal of Machine Learning Research 15

    TODO: this implementation currently differs in one major point: we do not symmetrize the sparse \(p_{ij}\) matrix.

    Since:
    0.7.5
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
      • numberOfNeighbours

        protected int numberOfNeighbours
        Number of neighbors to use.
    • Constructor Detail

      • NearestNeighborAffinityMatrixBuilder

        public NearestNeighborAffinityMatrixBuilder​(Distance<? super O> distance,
                                                    double perplexity)
        Constructor.
        Parameters:
        distance - Distance function
        perplexity - Desired perplexity (will use 3*perplexity neighbors)
      • NearestNeighborAffinityMatrixBuilder

        public NearestNeighborAffinityMatrixBuilder​(Distance<? super O> distance,
                                                    double perplexity,
                                                    int neighbors)
        Constructor.
        Parameters:
        distance - Distance function
        perplexity - Desired perplexity
        neighbors - Number of neighbors to use
    • Method Detail

      • computePij

        protected void computePij​(DBIDRange ids,
                                  KNNSearcher<DBIDRef> knnq,
                                  boolean square,
                                  int numberOfNeighbours,
                                  double[][] pij,
                                  int[][] indices,
                                  double initialScale)
        Compute the sparse pij using the nearest neighbors only.
        Parameters:
        ids - ID range
        knnq - kNN query
        square - Use squared distances
        numberOfNeighbours - Number of neighbors to get
        pij - Output of distances
        indices - Output of indexes
        initialScale - Initial scaling factor
      • convertNeighbors

        protected void convertNeighbors​(DBIDRange ids,
                                        DBIDRef ix,
                                        boolean square,
                                        KNNList neighbours,
                                        DoubleArray dist,
                                        IntegerArray ind)
        Load a neighbor query result into a double and and integer array, also removing the query point. This is necessary, because we have to modify the distances. TODO: sort by index, not distance
        Parameters:
        ids - Indexes
        ix - Current Object
        square - Use squared distances
        neighbours - Neighbor list
        dist - Output distance array
        ind - Output index array
      • computeSigma

        protected static double computeSigma​(int i,
                                             DoubleArray pij_row,
                                             double perplexity,
                                             double log_perp,
                                             double[] pij_i)
        Compute row pij[i], using binary search on the kernel bandwidth sigma to obtain the desired perplexity.
        Parameters:
        i - Current point
        pij_row - Distance matrix row pij[i]
        perplexity - Desired perplexity
        log_perp - Log of desired perplexity
        pij_i - Output row
        Returns:
        beta
      • computeH

        protected static double computeH​(DoubleArray dist_i,
                                         double[] pij_row,
                                         double mbeta)
        Compute H (observed perplexity) for row i, and the row pij_i.
        Parameters:
        dist_i - Distances to neighbors
        pij_row - Row pij[i] (output)
        mbeta - -1. / (2 * sigma * sigma)
        Returns:
        Observed perplexity
      • containsIndex

        protected static int containsIndex​(int[] is,
                                           int i)
        Check if the index array contains i. TODO: sort arrays, use binary search!
        Parameters:
        is - Array to search
        i - Index to search
        Returns:
        Position of index i, or -1 if not found.