Class TSNE<O>

  • Type Parameters:
    O - Object type
    All Implemented Interfaces:
    Algorithm
    Direct Known Subclasses:
    BarnesHutTSNE

    @Title("t-SNE")
    @Reference(authors="L. J. P. van der Maaten, G. E. Hinton",
               title="Visualizing High-Dimensional Data Using t-SNE",
               booktitle="Journal of Machine Learning Research 9",
               url="http://www.jmlr.org/papers/v9/vandermaaten08a.html",
               bibkey="journals/jmlr/MaatenH08")
    @Alias({"t-SNE","tSNE"})
    public class TSNE<O>
    extends AbstractProjectionAlgorithm<Relation<DoubleVector>>
    t-Stochastic Neighbor Embedding is a projection technique designed for visualization that tries to preserve the nearest neighbor structure.

    Reference:

    L. J. P. van der Maaten, G. E. Hinton
    Visualizing High-Dimensional Data Using t-SNE
    Journal of Machine Learning Research 9

    Since:
    0.7.5
    Author:
    Erich Schubert, Dominik Acker
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
      • MIN_QIJ

        protected static final double MIN_QIJ
        Minimum value for qij entries (even when duplicate)
        See Also:
        Constant Field Values
      • EARLY_EXAGGERATION

        protected static final double EARLY_EXAGGERATION
        Early exaggeration factor.

        Barnes-Hut tSNE implementation used 12.

        See Also:
        Constant Field Values
      • EARLY_EXAGGERATION_ITERATIONS

        protected static final int EARLY_EXAGGERATION_ITERATIONS
        Number of iterations to apply early exaggeration.
        See Also:
        Constant Field Values
      • INITIAL_SOLUTION_SCALE

        protected static final double INITIAL_SOLUTION_SCALE
        Scale of the initial solution.
        See Also:
        Constant Field Values
      • MIN_GAIN

        protected static final double MIN_GAIN
        Minimum gain in learning rate.
        See Also:
        Constant Field Values
      • projectedDistances

        protected long projectedDistances
        Number of distance computations performed in projected space.
      • dim

        protected int dim
        Desired projection dimensionality
      • learningRate

        protected double learningRate
        Initial learning rate.
      • initialMomentum

        protected double initialMomentum
        Final momentum.
      • finalMomentum

        protected double finalMomentum
        Final momentum.
      • momentumSwitch

        protected int momentumSwitch
        Iteration when to switch momentum.
      • iterations

        protected int iterations
        Number of iterations.
    • Constructor Detail

      • TSNE

        public TSNE​(AffinityMatrixBuilder<? super O> affinity,
                    int dim,
                    RandomFactory random)
        Constructor with default values.
        Parameters:
        affinity - Affinity matrix builder
        dim - Output dimensionality
        random - Random generator
      • TSNE

        public TSNE​(AffinityMatrixBuilder<? super O> affinity,
                    int dim,
                    double finalMomentum,
                    double learningRate,
                    int iterations,
                    RandomFactory random,
                    boolean keep)
        Constructor.
        Parameters:
        affinity - Affinity matrix builder
        dim - Output dimensionality
        finalMomentum - Final momentum
        learningRate - Learning rate
        iterations - Number of iterations
        random - Random generator
        keep - Keep the original data (or remove it)
    • Method Detail

      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Returns:
        Type restriction
      • autorun

        public Relation<DoubleVector> autorun​(Database database)
        Description copied from interface: Algorithm
        Try to auto-run the algorithm on a database by calling a method called run, with an optional Database first, and with data relations as specified by Algorithm.getInputTypeRestriction().
        Parameters:
        database - the database to run the algorithm on
        Returns:
        the Result computed by this algorithm
      • run

        public Relation<DoubleVector> run​(Relation<O> relation)
        Perform tSNE projection.
        Parameters:
        relation - Input relation
        Returns:
        Output relation
      • randomInitialSolution

        protected static double[][] randomInitialSolution​(int size,
                                                          int dim,
                                                          java.util.Random random)
        Generate a random initial solution.
        Parameters:
        size - Data set size
        dim - Output dimensionality
        random - Random generator
        Returns:
        Initial solution matrix
      • optimizetSNE

        protected void optimizetSNE​(AffinityMatrix pij,
                                    double[][] sol)
        Perform the actual tSNE optimization.
        Parameters:
        pij - Initial affinity matrix
        sol - Solution output array (preinitialized)
      • computeQij

        protected double computeQij​(double[][] qij,
                                    double[][] solution)
        Compute the qij of the solution, and the sum.
        Parameters:
        qij - Qij matrix (output)
        solution - Solution matrix (input)
        Returns:
        qij sum
      • sqDist

        protected double sqDist​(double[] v1,
                                double[] v2)
        Squared distance, in projection space.
        Parameters:
        v1 - First vector
        v2 - Second vector
        Returns:
        Squared distance
      • computeGradient

        protected void computeGradient​(AffinityMatrix pij,
                                       double[][] qij,
                                       double qij_isum,
                                       double[][] sol,
                                       double[] meta)
        Compute the gradients.
        Parameters:
        pij - Desired affinity matrix
        qij - Projected affinity matrix
        qij_isum - Normalization factor
        sol - Current solution coordinates
        meta - Point metadata
      • updateSolution

        protected void updateSolution​(double[][] sol,
                                      double[] meta,
                                      int it)
        Update the current solution on iteration.
        Parameters:
        sol - Solution matrix
        meta - Metadata array (gradient, momentum, learning rate)
        it - Iteration number, to choose momentum factor.