Package elki.datasource.parser
Class SparseNumberVectorLabelParser<V extends SparseNumberVector>
- java.lang.Object
-
- elki.datasource.parser.AbstractStreamingParser
-
- elki.datasource.parser.NumberVectorLabelParser<V>
-
- elki.datasource.parser.SparseNumberVectorLabelParser<V>
-
- Type Parameters:
V
- vector type
- All Implemented Interfaces:
BundleStreamSource
,Parser
,StreamingParser
- Direct Known Subclasses:
LibSVMFormatParser
@Title("Sparse Vector Label Parser") @Description("Parser for the following line format:\nA single line provides a single point. Entries are separated by whitespace. The values will be parsed as floats (resulting in a set of SparseFloatVectors).\nA line is expected in the following format:\nThe first entry of each line is the number of attributes with coordinate value not zero. Subsequent entries are of the form (index, value), where index is the number of the corresponding dimension, and value is the value of the corresponding attribute. Any pair of two subsequent substrings not containing whitespace is tried to be read as int and float. If this fails for the first of the pair (interpreted ans index), it will be appended to a label. (Thus, any label must not be parseable as Integer.) If the float component is not parseable, an exception will be thrown. Empty lines and lines beginning with \"#\" will be ignored.") public class SparseNumberVectorLabelParser<V extends SparseNumberVector> extends NumberVectorLabelParser<V>
Parser for parsing one point per line, attributes separated by whitespace.Several labels may be given per point. A label must not be parseable as double. Lines starting with "#" will be ignored.
A line is expected in the following format: The first entry of each line is the number of attributes with coordinate value not zero. Subsequent entries are of the form
index value
each, where index is the number of the corresponding dimension, and value is the value of the corresponding attribute. A complete line then could look like this:3 7 12.34 8 56.78 11 1.234 objectlabel
where3
indicates there are three attributes set,7,8,11
are the attributes indexes and there is a non-numerical object label.An index can be specified to identify an entry to be treated as class label. This index counts all entries (numeric and labels as well) starting with 0.
- Since:
- 0.2
- Author:
- Arthur Zimek
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
SparseNumberVectorLabelParser.Par<V extends SparseNumberVector>
Parameterization class.-
Nested classes/interfaces inherited from interface elki.datasource.bundle.BundleStreamSource
BundleStreamSource.Event
-
-
Field Summary
Fields Modifier and Type Field Description (package private) java.util.ArrayList<java.lang.String>
labels
(Reused) label buffer.private static Logging
LOG
Class logger.protected SparseNumberVector.Factory<V>
sparsefactory
Same asNumberVectorLabelParser.factory
, but subtype.(package private) it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap
values
(Reused) set of values for the number vector.-
Fields inherited from class elki.datasource.parser.NumberVectorLabelParser
attributes, columnnames, curlbl, curvec, factory, haslabels, maxdim, meta, mindim, nextevent, unique, warnedDim, warnedPrecision
-
Fields inherited from class elki.datasource.parser.AbstractStreamingParser
reader, tokenizer
-
-
Constructor Summary
Constructors Constructor Description SparseNumberVectorLabelParser(CSVReaderFormat format, long[] labelIndices, SparseNumberVector.Factory<V> factory)
Constructor.SparseNumberVectorLabelParser(java.util.regex.Pattern colSep, java.lang.String quoteChars, java.util.regex.Pattern comment, long[] labelIndices, SparseNumberVector.Factory<V> factory)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected Logging
getLogger()
Get the logger for this class.protected SimpleTypeInformation<V>
getTypeInformation(int mindim, int maxdim)
Get a prototype object for the given dimensionality.protected boolean
parseLineInternal()
Internal method for parsing a single line.-
Methods inherited from class elki.datasource.parser.NumberVectorLabelParser
buildMeta, cleanup, createVector, data, getMeta, initStream, isLabelColumn, nextEvent
-
Methods inherited from class elki.datasource.parser.AbstractStreamingParser
asMultipleObjectsBundle, assignDBID, hasDBIDs, parse
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger.
-
sparsefactory
protected SparseNumberVector.Factory<V extends SparseNumberVector> sparsefactory
Same asNumberVectorLabelParser.factory
, but subtype.
-
values
it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap values
(Reused) set of values for the number vector.
-
labels
java.util.ArrayList<java.lang.String> labels
(Reused) label buffer.
-
-
Constructor Detail
-
SparseNumberVectorLabelParser
public SparseNumberVectorLabelParser(CSVReaderFormat format, long[] labelIndices, SparseNumberVector.Factory<V> factory)
Constructor.- Parameters:
format
- Input formatlabelIndices
- Indices to use as labelsfactory
- Vector factory
-
SparseNumberVectorLabelParser
public SparseNumberVectorLabelParser(java.util.regex.Pattern colSep, java.lang.String quoteChars, java.util.regex.Pattern comment, long[] labelIndices, SparseNumberVector.Factory<V> factory)
Constructor.- Parameters:
colSep
- Column separatorquoteChars
- Quotation charactercomment
- Comment patternlabelIndices
- Indices to use as labelsfactory
- Vector factory
-
-
Method Detail
-
parseLineInternal
protected boolean parseLineInternal()
Description copied from class:NumberVectorLabelParser
Internal method for parsing a single line. Used by both line based parsing as well as block parsing. This saves the building of meta data for each line.- Overrides:
parseLineInternal
in classNumberVectorLabelParser<V extends SparseNumberVector>
- Returns:
true
when a valid line was read,false
on a label row.
-
getTypeInformation
protected SimpleTypeInformation<V> getTypeInformation(int mindim, int maxdim)
Description copied from class:NumberVectorLabelParser
Get a prototype object for the given dimensionality.- Overrides:
getTypeInformation
in classNumberVectorLabelParser<V extends SparseNumberVector>
- Parameters:
mindim
- Minimum dimensionalitymaxdim
- Maximum dimensionality- Returns:
- Prototype object
-
getLogger
protected Logging getLogger()
Description copied from class:AbstractStreamingParser
Get the logger for this class.- Overrides:
getLogger
in classNumberVectorLabelParser<V extends SparseNumberVector>
- Returns:
- Logger.
-
-