Package elki.datasource.parser
Class NumberVectorLabelParser<V extends NumberVector>
- java.lang.Object
-
- elki.datasource.parser.AbstractStreamingParser
-
- elki.datasource.parser.NumberVectorLabelParser<V>
-
- Type Parameters:
V- the type of NumberVector used
- All Implemented Interfaces:
BundleStreamSource,Parser,StreamingParser
- Direct Known Subclasses:
BitVectorLabelParser,CategorialDataAsNumberVectorParser,SparseNumberVectorLabelParser,TermFrequencyParser
public class NumberVectorLabelParser<V extends NumberVector> extends AbstractStreamingParser
Parser for a simple CSV type of format, with columns separated by the given pattern (default: whitespace).Several labels may be given per point. A label must not be parseable as double. Lines starting with "#" will be ignored.
An index can be specified to identify an entry to be treated as class label. This index counts all entries (numeric and labels as well) starting with 0.
- Since:
- 0.1
- Author:
- Arthur Zimek, Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classNumberVectorLabelParser.Par<V extends NumberVector>Parameterization class.-
Nested classes/interfaces inherited from interface elki.datasource.bundle.BundleStreamSource
BundleStreamSource.Event
-
-
Field Summary
Fields Modifier and Type Field Description protected DoubleArrayattributesDouble array storing the numerical attributes during parsing.protected java.util.List<java.lang.String>columnnamesColumn names.protected LabelListcurlblCurrent labels.protected VcurvecCurrent vector.protected NumberVector.Factory<V>factoryVector factory class.protected booleanhaslabelsWhether or not the data set has labels.private long[]labelIndicesKeeps the indices of the attributes to be treated as a string label.(package private) java.util.ArrayList<java.lang.String>labels(Reused) store for labels.private static LoggingLOGLogging class.protected intmaxdimDimensionality reported.protected BundleMetametaMetadata.protected intmindimDimensionality reported.(package private) BundleStreamSource.EventnexteventEvent to report next.(package private) it.unimi.dsi.fastutil.objects.ObjectOpenHashSet<java.lang.String>uniqueFor String unification.(package private) booleanwarnedDimEmit a dimensionality change warning once.(package private) booleanwarnedPrecisionEmit a double-precision limit warning once.-
Fields inherited from class elki.datasource.parser.AbstractStreamingParser
reader, tokenizer
-
-
Constructor Summary
Constructors Constructor Description NumberVectorLabelParser(NumberVector.Factory<V> factory)Constructor with defaults.NumberVectorLabelParser(CSVReaderFormat format, long[] labelIndices, NumberVector.Factory<V> factory)Constructor.NumberVectorLabelParser(java.util.regex.Pattern colSep, java.lang.String quoteChars, java.util.regex.Pattern comment, long[] labelIndices, NumberVector.Factory<V> factory)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidbuildMeta()Update the meta element.voidcleanup()Perform cleanup operations after parsing.protected VcreateVector()Creates a database object of type V.java.lang.Objectdata(int rnum)Access a particular object and representation.protected LogginggetLogger()Get the logger for this class.BundleMetagetMeta()Get the current meta data.(package private) SimpleTypeInformation<V>getTypeInformation(int mindim, int maxdim)Get a prototype object for the given dimensionality.voidinitStream(java.io.InputStream in)Init the streaming parser for the given input stream.protected booleanisLabelColumn(int col)Test if the current column is marked as label column.BundleStreamSource.EventnextEvent()Get the next eventprotected booleanparseLineInternal()Internal method for parsing a single line.-
Methods inherited from class elki.datasource.parser.AbstractStreamingParser
asMultipleObjectsBundle, assignDBID, hasDBIDs, parse
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Logging class.
-
labelIndices
private long[] labelIndices
Keeps the indices of the attributes to be treated as a string label.
-
factory
protected NumberVector.Factory<V extends NumberVector> factory
Vector factory class.
-
mindim
protected int mindim
Dimensionality reported.
-
maxdim
protected int maxdim
Dimensionality reported.
-
meta
protected BundleMeta meta
Metadata.
-
columnnames
protected java.util.List<java.lang.String> columnnames
Column names.
-
haslabels
protected boolean haslabels
Whether or not the data set has labels.
-
curvec
protected V extends NumberVector curvec
Current vector.
-
curlbl
protected LabelList curlbl
Current labels.
-
attributes
protected DoubleArray attributes
Double array storing the numerical attributes during parsing.
-
labels
final java.util.ArrayList<java.lang.String> labels
(Reused) store for labels.
-
unique
it.unimi.dsi.fastutil.objects.ObjectOpenHashSet<java.lang.String> unique
For String unification.
-
nextevent
BundleStreamSource.Event nextevent
Event to report next.
-
warnedPrecision
boolean warnedPrecision
Emit a double-precision limit warning once.
-
warnedDim
boolean warnedDim
Emit a dimensionality change warning once.
-
-
Constructor Detail
-
NumberVectorLabelParser
public NumberVectorLabelParser(CSVReaderFormat format, long[] labelIndices, NumberVector.Factory<V> factory)
Constructor.- Parameters:
format- Input formatlabelIndices- Column indexes that are not numeric.factory- Vector factory
-
NumberVectorLabelParser
public NumberVectorLabelParser(NumberVector.Factory<V> factory)
Constructor with defaults.- Parameters:
factory- Vector factory
-
NumberVectorLabelParser
public NumberVectorLabelParser(java.util.regex.Pattern colSep, java.lang.String quoteChars, java.util.regex.Pattern comment, long[] labelIndices, NumberVector.Factory<V> factory)Constructor.- Parameters:
colSep- Column separatorquoteChars- Quote charactercomment- Comment patternlabelIndices- Column indexes that are not numeric.factory- Vector factory
-
-
Method Detail
-
isLabelColumn
protected boolean isLabelColumn(int col)
Test if the current column is marked as label column.- Parameters:
col- Column number- Returns:
truewhen a label column.
-
initStream
public void initStream(java.io.InputStream in)
Description copied from interface:StreamingParserInit the streaming parser for the given input stream.- Specified by:
initStreamin interfaceStreamingParser- Overrides:
initStreamin classAbstractStreamingParser- Parameters:
in- the stream to parse objects from
-
getMeta
public BundleMeta getMeta()
Description copied from interface:BundleStreamSourceGet the current meta data.- Returns:
- Metadata
-
nextEvent
public BundleStreamSource.Event nextEvent()
Description copied from interface:BundleStreamSourceGet the next event- Returns:
- Event type
-
cleanup
public void cleanup()
Description copied from interface:ParserPerform cleanup operations after parsing.- Specified by:
cleanupin interfaceParser- Overrides:
cleanupin classAbstractStreamingParser
-
buildMeta
protected void buildMeta()
Update the meta element.
-
data
public java.lang.Object data(int rnum)
Description copied from interface:BundleStreamSourceAccess a particular object and representation.- Parameters:
rnum- Representation number- Returns:
- Contained data
-
parseLineInternal
protected boolean parseLineInternal()
Internal method for parsing a single line. Used by both line based parsing as well as block parsing. This saves the building of meta data for each line.- Returns:
truewhen a valid line was read,falseon a label row.
-
createVector
protected V createVector()
Creates a database object of type V.- Returns:
- a vector of type V containing the given attribute values
-
getTypeInformation
SimpleTypeInformation<V> getTypeInformation(int mindim, int maxdim)
Get a prototype object for the given dimensionality.- Parameters:
mindim- Minimum dimensionalitymaxdim- Maximum dimensionality- Returns:
- Prototype object
-
getLogger
protected Logging getLogger()
Description copied from class:AbstractStreamingParserGet the logger for this class.- Specified by:
getLoggerin classAbstractStreamingParser- Returns:
- Logger.
-
-