Package elki.datasource.parser
Class LibSVMFormatParser<V extends SparseNumberVector>
- java.lang.Object
-
- elki.datasource.parser.AbstractStreamingParser
-
- elki.datasource.parser.NumberVectorLabelParser<V>
-
- elki.datasource.parser.SparseNumberVectorLabelParser<V>
-
- elki.datasource.parser.LibSVMFormatParser<V>
-
- Type Parameters:
V- Vector type
- All Implemented Interfaces:
BundleStreamSource,Parser,StreamingParser
@Title("libSVM Format Parser") public class LibSVMFormatParser<V extends SparseNumberVector> extends SparseNumberVectorLabelParser<V>
Parser to read libSVM format files.The format of libSVM is roughly specified in the README given:
<label> <index1>:<value1> <index2>:<value2> ...
i.e. a mandatory integer class label in the beginning followed by a classic sparse vector representation of the data. indexes are integers, starting at 1 (Note that ELKI uses 0-based indexing, so we will map these to index-1) to not always have a constant-0 dimension 0.The libSVM FAQ states that you can also put comments into the file, separated by a hash: #, but they must not contain colons and are not officially supported.
ELKI will simply stop parsing a line when encountering a #.- Since:
- 0.7.0
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classLibSVMFormatParser.Par<V extends SparseNumberVector>Parameterization class.-
Nested classes/interfaces inherited from interface elki.datasource.bundle.BundleStreamSource
BundleStreamSource.Event
-
-
Field Summary
Fields Modifier and Type Field Description static java.util.regex.PatternCOMMENT_PATTERNComment pattern.private static LoggingLOGClass logger.static java.util.regex.PatternWHITESPACE_PATTERNLibSVM uses whitespace and colons for separation.-
Fields inherited from class elki.datasource.parser.SparseNumberVectorLabelParser
labels, sparsefactory, values
-
Fields inherited from class elki.datasource.parser.NumberVectorLabelParser
attributes, columnnames, curlbl, curvec, factory, haslabels, maxdim, meta, mindim, nextevent, unique, warnedDim, warnedPrecision
-
Fields inherited from class elki.datasource.parser.AbstractStreamingParser
reader, tokenizer
-
-
Constructor Summary
Constructors Constructor Description LibSVMFormatParser(SparseNumberVector.Factory<V> factory)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected LogginggetLogger()Get the logger for this class.protected booleanparseLineInternal()Internal method for parsing a single line.-
Methods inherited from class elki.datasource.parser.SparseNumberVectorLabelParser
getTypeInformation
-
Methods inherited from class elki.datasource.parser.NumberVectorLabelParser
buildMeta, cleanup, createVector, data, getMeta, initStream, isLabelColumn, nextEvent
-
Methods inherited from class elki.datasource.parser.AbstractStreamingParser
asMultipleObjectsBundle, assignDBID, hasDBIDs, parse
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger.
-
WHITESPACE_PATTERN
public static final java.util.regex.Pattern WHITESPACE_PATTERN
LibSVM uses whitespace and colons for separation.
-
COMMENT_PATTERN
public static final java.util.regex.Pattern COMMENT_PATTERN
Comment pattern.
-
-
Constructor Detail
-
LibSVMFormatParser
public LibSVMFormatParser(SparseNumberVector.Factory<V> factory)
Constructor.- Parameters:
factory- Vector factory
-
-
Method Detail
-
parseLineInternal
protected boolean parseLineInternal()
Description copied from class:NumberVectorLabelParserInternal method for parsing a single line. Used by both line based parsing as well as block parsing. This saves the building of meta data for each line.- Overrides:
parseLineInternalin classSparseNumberVectorLabelParser<V extends SparseNumberVector>- Returns:
truewhen a valid line was read,falseon a label row.
-
getLogger
protected Logging getLogger()
Description copied from class:AbstractStreamingParserGet the logger for this class.- Overrides:
getLoggerin classSparseNumberVectorLabelParser<V extends SparseNumberVector>- Returns:
- Logger.
-
-