Package elki.datasource.parser
Class CategorialDataAsNumberVectorParser<V extends NumberVector>
- java.lang.Object
-
- elki.datasource.parser.AbstractStreamingParser
-
- elki.datasource.parser.NumberVectorLabelParser<V>
-
- elki.datasource.parser.CategorialDataAsNumberVectorParser<V>
-
- Type Parameters:
V- the type of NumberVector used
- All Implemented Interfaces:
BundleStreamSource,Parser,StreamingParser
@Description("This parser expects data in roughly the same format as the NumberVectorLabelParser,\nexcept that it will enumerate all unique strings to always produce numerical values.\nThis way, it can for example handle files that contain lines like \'y,n,y,y,n,y,n\'.") public class CategorialDataAsNumberVectorParser<V extends NumberVector> extends NumberVectorLabelParser<V>
A very simple parser for categorial data, which will then be encoded as numbers. This is closely modeled after the number vector parser. TODO: specify handling for numerical values.- Since:
- 0.6.0
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classCategorialDataAsNumberVectorParser.Par<V extends NumberVector>Parameterization class.-
Nested classes/interfaces inherited from interface elki.datasource.bundle.BundleStreamSource
BundleStreamSource.Event
-
-
Field Summary
Fields Modifier and Type Field Description private static LoggingLOGLogging class.(package private) java.util.regex.MatchernanpatternPattern for NaN values.(package private) it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap<java.lang.String>uniqueFor String unification.(package private) intustartBase for enumerating unique values.-
Fields inherited from class elki.datasource.parser.NumberVectorLabelParser
attributes, columnnames, curlbl, curvec, factory, haslabels, labels, maxdim, meta, mindim, nextevent, warnedDim, warnedPrecision
-
Fields inherited from class elki.datasource.parser.AbstractStreamingParser
reader, tokenizer
-
-
Constructor Summary
Constructors Constructor Description CategorialDataAsNumberVectorParser(NumberVector.Factory<V> factory)Constructor with defaults.CategorialDataAsNumberVectorParser(CSVReaderFormat format, long[] labelIndices, NumberVector.Factory<V> factory)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected LogginggetLogger()Get the logger for this class.BundleStreamSource.EventnextEvent()Get the next eventprotected booleanparseLineInternal()Internal method for parsing a single line.-
Methods inherited from class elki.datasource.parser.NumberVectorLabelParser
buildMeta, cleanup, createVector, data, getMeta, getTypeInformation, initStream, isLabelColumn
-
Methods inherited from class elki.datasource.parser.AbstractStreamingParser
asMultipleObjectsBundle, assignDBID, hasDBIDs, parse
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Logging class.
-
unique
it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap<java.lang.String> unique
For String unification.
-
ustart
int ustart
Base for enumerating unique values.
-
nanpattern
java.util.regex.Matcher nanpattern
Pattern for NaN values.
-
-
Constructor Detail
-
CategorialDataAsNumberVectorParser
public CategorialDataAsNumberVectorParser(NumberVector.Factory<V> factory)
Constructor with defaults.- Parameters:
factory- Vector factory
-
CategorialDataAsNumberVectorParser
public CategorialDataAsNumberVectorParser(CSVReaderFormat format, long[] labelIndices, NumberVector.Factory<V> factory)
Constructor.- Parameters:
format- Input formatlabelIndices- Column indexes that are numeric.factory- Vector factory
-
-
Method Detail
-
nextEvent
public BundleStreamSource.Event nextEvent()
Description copied from interface:BundleStreamSourceGet the next event- Specified by:
nextEventin interfaceBundleStreamSource- Overrides:
nextEventin classNumberVectorLabelParser<V extends NumberVector>- Returns:
- Event type
-
parseLineInternal
protected boolean parseLineInternal()
Description copied from class:NumberVectorLabelParserInternal method for parsing a single line. Used by both line based parsing as well as block parsing. This saves the building of meta data for each line.- Overrides:
parseLineInternalin classNumberVectorLabelParser<V extends NumberVector>- Returns:
truewhen a valid line was read,falseon a label row.
-
getLogger
protected Logging getLogger()
Description copied from class:AbstractStreamingParserGet the logger for this class.- Overrides:
getLoggerin classNumberVectorLabelParser<V extends NumberVector>- Returns:
- Logger.
-
-