Package elki.datasource.parser
Class NumberVectorLabelParser<V extends NumberVector>
- java.lang.Object
-
- elki.datasource.parser.AbstractStreamingParser
-
- elki.datasource.parser.NumberVectorLabelParser<V>
-
- Type Parameters:
V
- the type of NumberVector used
- All Implemented Interfaces:
BundleStreamSource
,Parser
,StreamingParser
- Direct Known Subclasses:
BitVectorLabelParser
,CategorialDataAsNumberVectorParser
,SparseNumberVectorLabelParser
,TermFrequencyParser
public class NumberVectorLabelParser<V extends NumberVector> extends AbstractStreamingParser
Parser for a simple CSV type of format, with columns separated by the given pattern (default: whitespace).Several labels may be given per point. A label must not be parseable as double. Lines starting with "#" will be ignored.
An index can be specified to identify an entry to be treated as class label. This index counts all entries (numeric and labels as well) starting with 0.
- Since:
- 0.1
- Author:
- Arthur Zimek, Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
NumberVectorLabelParser.Par<V extends NumberVector>
Parameterization class.-
Nested classes/interfaces inherited from interface elki.datasource.bundle.BundleStreamSource
BundleStreamSource.Event
-
-
Field Summary
Fields Modifier and Type Field Description protected DoubleArray
attributes
Double array storing the numerical attributes during parsing.protected java.util.List<java.lang.String>
columnnames
Column names.protected LabelList
curlbl
Current labels.protected V
curvec
Current vector.protected NumberVector.Factory<V>
factory
Vector factory class.protected boolean
haslabels
Whether or not the data set has labels.private long[]
labelIndices
Keeps the indices of the attributes to be treated as a string label.(package private) java.util.ArrayList<java.lang.String>
labels
(Reused) store for labels.private static Logging
LOG
Logging class.protected int
maxdim
Dimensionality reported.protected BundleMeta
meta
Metadata.protected int
mindim
Dimensionality reported.(package private) BundleStreamSource.Event
nextevent
Event to report next.(package private) it.unimi.dsi.fastutil.objects.ObjectOpenHashSet<java.lang.String>
unique
For String unification.(package private) boolean
warnedDim
Emit a dimensionality change warning once.(package private) boolean
warnedPrecision
Emit a double-precision limit warning once.-
Fields inherited from class elki.datasource.parser.AbstractStreamingParser
reader, tokenizer
-
-
Constructor Summary
Constructors Constructor Description NumberVectorLabelParser(NumberVector.Factory<V> factory)
Constructor with defaults.NumberVectorLabelParser(CSVReaderFormat format, long[] labelIndices, NumberVector.Factory<V> factory)
Constructor.NumberVectorLabelParser(java.util.regex.Pattern colSep, java.lang.String quoteChars, java.util.regex.Pattern comment, long[] labelIndices, NumberVector.Factory<V> factory)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
buildMeta()
Update the meta element.void
cleanup()
Perform cleanup operations after parsing.protected V
createVector()
Creates a database object of type V.java.lang.Object
data(int rnum)
Access a particular object and representation.protected Logging
getLogger()
Get the logger for this class.BundleMeta
getMeta()
Get the current meta data.(package private) SimpleTypeInformation<V>
getTypeInformation(int mindim, int maxdim)
Get a prototype object for the given dimensionality.void
initStream(java.io.InputStream in)
Init the streaming parser for the given input stream.protected boolean
isLabelColumn(int col)
Test if the current column is marked as label column.BundleStreamSource.Event
nextEvent()
Get the next eventprotected boolean
parseLineInternal()
Internal method for parsing a single line.-
Methods inherited from class elki.datasource.parser.AbstractStreamingParser
asMultipleObjectsBundle, assignDBID, hasDBIDs, parse
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Logging class.
-
labelIndices
private long[] labelIndices
Keeps the indices of the attributes to be treated as a string label.
-
factory
protected NumberVector.Factory<V extends NumberVector> factory
Vector factory class.
-
mindim
protected int mindim
Dimensionality reported.
-
maxdim
protected int maxdim
Dimensionality reported.
-
meta
protected BundleMeta meta
Metadata.
-
columnnames
protected java.util.List<java.lang.String> columnnames
Column names.
-
haslabels
protected boolean haslabels
Whether or not the data set has labels.
-
curvec
protected V extends NumberVector curvec
Current vector.
-
curlbl
protected LabelList curlbl
Current labels.
-
attributes
protected DoubleArray attributes
Double array storing the numerical attributes during parsing.
-
labels
final java.util.ArrayList<java.lang.String> labels
(Reused) store for labels.
-
unique
it.unimi.dsi.fastutil.objects.ObjectOpenHashSet<java.lang.String> unique
For String unification.
-
nextevent
BundleStreamSource.Event nextevent
Event to report next.
-
warnedPrecision
boolean warnedPrecision
Emit a double-precision limit warning once.
-
warnedDim
boolean warnedDim
Emit a dimensionality change warning once.
-
-
Constructor Detail
-
NumberVectorLabelParser
public NumberVectorLabelParser(CSVReaderFormat format, long[] labelIndices, NumberVector.Factory<V> factory)
Constructor.- Parameters:
format
- Input formatlabelIndices
- Column indexes that are not numeric.factory
- Vector factory
-
NumberVectorLabelParser
public NumberVectorLabelParser(NumberVector.Factory<V> factory)
Constructor with defaults.- Parameters:
factory
- Vector factory
-
NumberVectorLabelParser
public NumberVectorLabelParser(java.util.regex.Pattern colSep, java.lang.String quoteChars, java.util.regex.Pattern comment, long[] labelIndices, NumberVector.Factory<V> factory)
Constructor.- Parameters:
colSep
- Column separatorquoteChars
- Quote charactercomment
- Comment patternlabelIndices
- Column indexes that are not numeric.factory
- Vector factory
-
-
Method Detail
-
isLabelColumn
protected boolean isLabelColumn(int col)
Test if the current column is marked as label column.- Parameters:
col
- Column number- Returns:
true
when a label column.
-
initStream
public void initStream(java.io.InputStream in)
Description copied from interface:StreamingParser
Init the streaming parser for the given input stream.- Specified by:
initStream
in interfaceStreamingParser
- Overrides:
initStream
in classAbstractStreamingParser
- Parameters:
in
- the stream to parse objects from
-
getMeta
public BundleMeta getMeta()
Description copied from interface:BundleStreamSource
Get the current meta data.- Returns:
- Metadata
-
nextEvent
public BundleStreamSource.Event nextEvent()
Description copied from interface:BundleStreamSource
Get the next event- Returns:
- Event type
-
cleanup
public void cleanup()
Description copied from interface:Parser
Perform cleanup operations after parsing.- Specified by:
cleanup
in interfaceParser
- Overrides:
cleanup
in classAbstractStreamingParser
-
buildMeta
protected void buildMeta()
Update the meta element.
-
data
public java.lang.Object data(int rnum)
Description copied from interface:BundleStreamSource
Access a particular object and representation.- Parameters:
rnum
- Representation number- Returns:
- Contained data
-
parseLineInternal
protected boolean parseLineInternal()
Internal method for parsing a single line. Used by both line based parsing as well as block parsing. This saves the building of meta data for each line.- Returns:
true
when a valid line was read,false
on a label row.
-
createVector
protected V createVector()
Creates a database object of type V.- Returns:
- a vector of type V containing the given attribute values
-
getTypeInformation
SimpleTypeInformation<V> getTypeInformation(int mindim, int maxdim)
Get a prototype object for the given dimensionality.- Parameters:
mindim
- Minimum dimensionalitymaxdim
- Maximum dimensionality- Returns:
- Prototype object
-
getLogger
protected Logging getLogger()
Description copied from class:AbstractStreamingParser
Get the logger for this class.- Specified by:
getLogger
in classAbstractStreamingParser
- Returns:
- Logger.
-
-