Class ClusteringVectorParser

  • All Implemented Interfaces:
    BundleStreamSource, Parser, StreamingParser

    public class ClusteringVectorParser
    extends AbstractStreamingParser
    Parser for simple clustering results in vector form, as written by ClusteringVectorDumper.

    This allows reading the output of multiple clustering runs, and analyze the results using ELKI algorithm.

    The input format is very simple, each line containing a sequence of cluster assignments in integer form, and an optional label:

     0 0 1 1 0 First
     0 0 0 1 2 Second
     
    represents two clusterings for 5 objects. The first clustering has two clusters, the second contains three clusters.

    TODO: this parser currently is quite hacky, and could use a cleanup.

    TODO: support noise, via negative cluster numbers?

    Since:
    0.7.0
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
      • numterms

        int numterms
        Number of different terms observed.
      • buf1

        it.unimi.dsi.fastutil.ints.IntArrayList buf1
        Buffers, will be reused.
      • range

        DBIDRange range
        Range of the DBID values.
      • lbl

        java.util.ArrayList<java.lang.String> lbl
        Buffer for labels.
      • haslbl

        boolean haslbl
        Flag if labels are present.
    • Constructor Detail

      • ClusteringVectorParser

        public ClusteringVectorParser​(CSVReaderFormat format)
        Constructor.
        Parameters:
        format - Input format