Package elki.datasource.parser
Class CategorialDataAsNumberVectorParser<V extends NumberVector>
- java.lang.Object
-
- elki.datasource.parser.AbstractStreamingParser
-
- elki.datasource.parser.NumberVectorLabelParser<V>
-
- elki.datasource.parser.CategorialDataAsNumberVectorParser<V>
-
- Type Parameters:
V
- the type of NumberVector used
- All Implemented Interfaces:
BundleStreamSource
,Parser
,StreamingParser
@Description("This parser expects data in roughly the same format as the NumberVectorLabelParser,\nexcept that it will enumerate all unique strings to always produce numerical values.\nThis way, it can for example handle files that contain lines like \'y,n,y,y,n,y,n\'.") public class CategorialDataAsNumberVectorParser<V extends NumberVector> extends NumberVectorLabelParser<V>
A very simple parser for categorial data, which will then be encoded as numbers. This is closely modeled after the number vector parser. TODO: specify handling for numerical values.- Since:
- 0.6.0
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
CategorialDataAsNumberVectorParser.Par<V extends NumberVector>
Parameterization class.-
Nested classes/interfaces inherited from interface elki.datasource.bundle.BundleStreamSource
BundleStreamSource.Event
-
-
Field Summary
Fields Modifier and Type Field Description private static Logging
LOG
Logging class.(package private) java.util.regex.Matcher
nanpattern
Pattern for NaN values.(package private) it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap<java.lang.String>
unique
For String unification.(package private) int
ustart
Base for enumerating unique values.-
Fields inherited from class elki.datasource.parser.NumberVectorLabelParser
attributes, columnnames, curlbl, curvec, factory, haslabels, labels, maxdim, meta, mindim, nextevent, warnedDim, warnedPrecision
-
Fields inherited from class elki.datasource.parser.AbstractStreamingParser
reader, tokenizer
-
-
Constructor Summary
Constructors Constructor Description CategorialDataAsNumberVectorParser(NumberVector.Factory<V> factory)
Constructor with defaults.CategorialDataAsNumberVectorParser(CSVReaderFormat format, long[] labelIndices, NumberVector.Factory<V> factory)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected Logging
getLogger()
Get the logger for this class.BundleStreamSource.Event
nextEvent()
Get the next eventprotected boolean
parseLineInternal()
Internal method for parsing a single line.-
Methods inherited from class elki.datasource.parser.NumberVectorLabelParser
buildMeta, cleanup, createVector, data, getMeta, getTypeInformation, initStream, isLabelColumn
-
Methods inherited from class elki.datasource.parser.AbstractStreamingParser
asMultipleObjectsBundle, assignDBID, hasDBIDs, parse
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Logging class.
-
unique
it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap<java.lang.String> unique
For String unification.
-
ustart
int ustart
Base for enumerating unique values.
-
nanpattern
java.util.regex.Matcher nanpattern
Pattern for NaN values.
-
-
Constructor Detail
-
CategorialDataAsNumberVectorParser
public CategorialDataAsNumberVectorParser(NumberVector.Factory<V> factory)
Constructor with defaults.- Parameters:
factory
- Vector factory
-
CategorialDataAsNumberVectorParser
public CategorialDataAsNumberVectorParser(CSVReaderFormat format, long[] labelIndices, NumberVector.Factory<V> factory)
Constructor.- Parameters:
format
- Input formatlabelIndices
- Column indexes that are numeric.factory
- Vector factory
-
-
Method Detail
-
nextEvent
public BundleStreamSource.Event nextEvent()
Description copied from interface:BundleStreamSource
Get the next event- Specified by:
nextEvent
in interfaceBundleStreamSource
- Overrides:
nextEvent
in classNumberVectorLabelParser<V extends NumberVector>
- Returns:
- Event type
-
parseLineInternal
protected boolean parseLineInternal()
Description copied from class:NumberVectorLabelParser
Internal method for parsing a single line. Used by both line based parsing as well as block parsing. This saves the building of meta data for each line.- Overrides:
parseLineInternal
in classNumberVectorLabelParser<V extends NumberVector>
- Returns:
true
when a valid line was read,false
on a label row.
-
getLogger
protected Logging getLogger()
Description copied from class:AbstractStreamingParser
Get the logger for this class.- Overrides:
getLogger
in classNumberVectorLabelParser<V extends NumberVector>
- Returns:
- Logger.
-
-