Package elki.datasource.parser
Class LibSVMFormatParser<V extends SparseNumberVector>
- java.lang.Object
-
- elki.datasource.parser.AbstractStreamingParser
-
- elki.datasource.parser.NumberVectorLabelParser<V>
-
- elki.datasource.parser.SparseNumberVectorLabelParser<V>
-
- elki.datasource.parser.LibSVMFormatParser<V>
-
- Type Parameters:
V
- Vector type
- All Implemented Interfaces:
BundleStreamSource
,Parser
,StreamingParser
@Title("libSVM Format Parser") public class LibSVMFormatParser<V extends SparseNumberVector> extends SparseNumberVectorLabelParser<V>
Parser to read libSVM format files.The format of libSVM is roughly specified in the README given:
<label> <index1>:<value1> <index2>:<value2> ...
i.e. a mandatory integer class label in the beginning followed by a classic sparse vector representation of the data. indexes are integers, starting at 1 (Note that ELKI uses 0-based indexing, so we will map these to index-1) to not always have a constant-0 dimension 0.The libSVM FAQ states that you can also put comments into the file, separated by a hash: #, but they must not contain colons and are not officially supported.
ELKI will simply stop parsing a line when encountering a #.- Since:
- 0.7.0
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
LibSVMFormatParser.Par<V extends SparseNumberVector>
Parameterization class.-
Nested classes/interfaces inherited from interface elki.datasource.bundle.BundleStreamSource
BundleStreamSource.Event
-
-
Field Summary
Fields Modifier and Type Field Description static java.util.regex.Pattern
COMMENT_PATTERN
Comment pattern.private static Logging
LOG
Class logger.static java.util.regex.Pattern
WHITESPACE_PATTERN
LibSVM uses whitespace and colons for separation.-
Fields inherited from class elki.datasource.parser.SparseNumberVectorLabelParser
labels, sparsefactory, values
-
Fields inherited from class elki.datasource.parser.NumberVectorLabelParser
attributes, columnnames, curlbl, curvec, factory, haslabels, maxdim, meta, mindim, nextevent, unique, warnedDim, warnedPrecision
-
Fields inherited from class elki.datasource.parser.AbstractStreamingParser
reader, tokenizer
-
-
Constructor Summary
Constructors Constructor Description LibSVMFormatParser(SparseNumberVector.Factory<V> factory)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected Logging
getLogger()
Get the logger for this class.protected boolean
parseLineInternal()
Internal method for parsing a single line.-
Methods inherited from class elki.datasource.parser.SparseNumberVectorLabelParser
getTypeInformation
-
Methods inherited from class elki.datasource.parser.NumberVectorLabelParser
buildMeta, cleanup, createVector, data, getMeta, initStream, isLabelColumn, nextEvent
-
Methods inherited from class elki.datasource.parser.AbstractStreamingParser
asMultipleObjectsBundle, assignDBID, hasDBIDs, parse
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger.
-
WHITESPACE_PATTERN
public static final java.util.regex.Pattern WHITESPACE_PATTERN
LibSVM uses whitespace and colons for separation.
-
COMMENT_PATTERN
public static final java.util.regex.Pattern COMMENT_PATTERN
Comment pattern.
-
-
Constructor Detail
-
LibSVMFormatParser
public LibSVMFormatParser(SparseNumberVector.Factory<V> factory)
Constructor.- Parameters:
factory
- Vector factory
-
-
Method Detail
-
parseLineInternal
protected boolean parseLineInternal()
Description copied from class:NumberVectorLabelParser
Internal method for parsing a single line. Used by both line based parsing as well as block parsing. This saves the building of meta data for each line.- Overrides:
parseLineInternal
in classSparseNumberVectorLabelParser<V extends SparseNumberVector>
- Returns:
true
when a valid line was read,false
on a label row.
-
getLogger
protected Logging getLogger()
Description copied from class:AbstractStreamingParser
Get the logger for this class.- Overrides:
getLogger
in classSparseNumberVectorLabelParser<V extends SparseNumberVector>
- Returns:
- Logger.
-
-