Class RandomNormalGenerated

  • All Implemented Interfaces:
    KMeansInitialization

    @Priority(-101)
    @Reference(authors="R. C. Jancey",
               title="Multidimensional group analysis",
               booktitle="Australian Journal of Botany 14(1)",
               url="https://doi.org/10.1071/BT9660127",
               bibkey="doi:10.1071/BT9660127")
    public class RandomNormalGenerated
    extends AbstractKMeansInitialization
    Initialize k-means by generating random vectors (normal distributed with \(N(\mu,\sigma)\) in each dimension).

    This is a different interpretation of the work of Jancey, who wrote little more details but "introduced into known but arbitrary positions"; but seemingly worked with standardized scores. In contrast to RandomUniformGenerated (which uses a uniform on the entire value range), this class uses a normal distribution based on the estimated parameters. The resulting means should be more central, and thus a bit less likely to become empty (at least if you assume there is no correlation amongst attributes... it is still not competitive with better methods).

    Warning: this still tends to produce empty clusters in many situations, and is one of the least effective initialization strategies, not recommended for use.

    Reference:

    R. C. Jancey
    Multidimensional group analysis
    Australian Journal of Botany 14(1)

    Since:
    0.7.5
    Author:
    Erich Schubert
    • Constructor Detail

      • RandomNormalGenerated

        public RandomNormalGenerated​(RandomFactory rnd)
        Constructor.
        Parameters:
        rnd - Random generator.
    • Method Detail

      • chooseInitialMeans

        public double[][] chooseInitialMeans​(Relation<? extends NumberVector> relation,
                                             int k,
                                             NumberVectorDistance<?> distance)
        Description copied from interface: KMeansInitialization
        Choose initial means
        Parameters:
        relation - Relation
        k - Parameter k
        distance - Distance function
        Returns:
        List of chosen means for k-means