public class TermFrequencyParser<V extends SparseNumberVector> extends NumberVectorLabelParser<V>
Parse a file containing term frequencies. The expected format is:
rowlabel1 term1 <freq> term2 <freq> ... rowlabel2 term1 <freq> term3 <freq> ...Terms must not contain the separator character!
If your data does not contain frequencies, you can maybe use
SimpleTransactionParser instead.
| Modifier and Type | Class and Description |
|---|---|
static class |
TermFrequencyParser.Parameterizer<V extends SparseNumberVector>
Parameterization class.
|
BundleStreamSource.Event| Modifier and Type | Field and Description |
|---|---|
(package private) it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap<java.lang.String> |
keymap
Map.
|
(package private) java.util.ArrayList<java.lang.String> |
labels
(Reused) label buffer.
|
private static Logging |
LOG
Class logger.
|
(package private) boolean |
normalize
Normalize.
|
(package private) int |
numterms
Number of different terms observed.
|
private SparseNumberVector.Factory<V> |
sparsefactory
Same as
NumberVectorLabelParser.factory, but subtype. |
(package private) it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap |
values
(Reused) set of values for the number vector.
|
attributes, columnnames, curlbl, curvec, factory, haslabels, maxdim, meta, mindim, nextevent, unique, warnedPrecisionreader, tokenizer| Constructor and Description |
|---|
TermFrequencyParser(boolean normalize,
CSVReaderFormat format,
long[] labelIndices,
SparseNumberVector.Factory<V> factory)
Constructor.
|
TermFrequencyParser(boolean normalize,
SparseNumberVector.Factory<V> factory)
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
protected Logging |
getLogger()
Get the logger for this class.
|
protected SimpleTypeInformation<V> |
getTypeInformation(int mindim,
int maxdim)
Get a prototype object for the given dimensionality.
|
protected boolean |
parseLineInternal()
Internal method for parsing a single line.
|
buildMeta, cleanup, createVector, data, getMeta, initStream, isLabelColumn, nextEventasMultipleObjectsBundle, assignDBID, hasDBIDs, parseprivate static final Logging LOG
int numterms
it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap<java.lang.String> keymap
boolean normalize
private SparseNumberVector.Factory<V extends SparseNumberVector> sparsefactory
NumberVectorLabelParser.factory, but subtype.it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap values
java.util.ArrayList<java.lang.String> labels
public TermFrequencyParser(boolean normalize,
SparseNumberVector.Factory<V> factory)
normalize - Normalizefactory - Vector typepublic TermFrequencyParser(boolean normalize,
CSVReaderFormat format,
long[] labelIndices,
SparseNumberVector.Factory<V> factory)
normalize - Normalizeformat - Input formatlabelIndices - Indices to use as labelsfactory - Vector typeprotected boolean parseLineInternal()
NumberVectorLabelParserparseLineInternal in class NumberVectorLabelParser<V extends SparseNumberVector>true when a valid line was read, false on a label
row.protected SimpleTypeInformation<V> getTypeInformation(int mindim, int maxdim)
NumberVectorLabelParsergetTypeInformation in class NumberVectorLabelParser<V extends SparseNumberVector>mindim - Minimum dimensionalitymaxdim - Maximum dimensionalityprotected Logging getLogger()
AbstractStreamingParsergetLogger in class NumberVectorLabelParser<V extends SparseNumberVector>Copyright © 2019 ELKI Development Team. License information.