Class CategorialDataAsNumberVectorParser<V extends NumberVector>

  • Type Parameters:
    V - the type of NumberVector used
    All Implemented Interfaces:
    BundleStreamSource, Parser, StreamingParser

    @Description("This parser expects data in roughly the same format as the NumberVectorLabelParser,\nexcept that it will enumerate all unique strings to always produce numerical values.\nThis way, it can for example handle files that contain lines like \'y,n,y,y,n,y,n\'.")
    public class CategorialDataAsNumberVectorParser<V extends NumberVector>
    extends NumberVectorLabelParser<V>
    A very simple parser for categorial data, which will then be encoded as numbers. This is closely modeled after the number vector parser. TODO: specify handling for numerical values.
    Since:
    0.6.0
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Logging class.
      • unique

        it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap<java.lang.String> unique
        For String unification.
      • ustart

        int ustart
        Base for enumerating unique values.
      • nanpattern

        java.util.regex.Matcher nanpattern
        Pattern for NaN values.
    • Constructor Detail

      • CategorialDataAsNumberVectorParser

        public CategorialDataAsNumberVectorParser​(NumberVector.Factory<V> factory)
        Constructor with defaults.
        Parameters:
        factory - Vector factory
      • CategorialDataAsNumberVectorParser

        public CategorialDataAsNumberVectorParser​(CSVReaderFormat format,
                                                  long[] labelIndices,
                                                  NumberVector.Factory<V> factory)
        Constructor.
        Parameters:
        format - Input format
        labelIndices - Column indexes that are numeric.
        factory - Vector factory