Class LibSVMFormatParser<V extends SparseNumberVector>

  • Type Parameters:
    V - Vector type
    All Implemented Interfaces:
    BundleStreamSource, Parser, StreamingParser

    @Title("libSVM Format Parser")
    public class LibSVMFormatParser<V extends SparseNumberVector>
    extends SparseNumberVectorLabelParser<V>
    Parser to read libSVM format files.

    The format of libSVM is roughly specified in the README given:

     <label> <index1>:<value1> <index2>:<value2> ...
    i.e. a mandatory integer class label in the beginning followed by a classic sparse vector representation of the data. indexes are integers, starting at 1 (Note that ELKI uses 0-based indexing, so we will map these to index-1) to not always have a constant-0 dimension 0.

    The libSVM FAQ states that you can also put comments into the file, separated by a hash: #, but they must not contain colons and are not officially supported.
    ELKI will simply stop parsing a line when encountering a #.

    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.

        public static final java.util.regex.Pattern WHITESPACE_PATTERN
        LibSVM uses whitespace and colons for separation.

        public static final java.util.regex.Pattern COMMENT_PATTERN
        Comment pattern.
    • Constructor Detail

      • LibSVMFormatParser

        public LibSVMFormatParser​(SparseNumberVector.Factory<V> factory)
        factory - Vector factory