Class NumberVectorLabelParser<V extends NumberVector>

    • Field Detail

      • LOG

        private static final Logging LOG
        Logging class.
      • labelIndices

        private long[] labelIndices
        Keeps the indices of the attributes to be treated as a string label.
      • mindim

        protected int mindim
        Dimensionality reported.
      • maxdim

        protected int maxdim
        Dimensionality reported.
      • columnnames

        protected java.util.List<java.lang.String> columnnames
        Column names.
      • haslabels

        protected boolean haslabels
        Whether or not the data set has labels.
      • curvec

        protected V extends NumberVector curvec
        Current vector.
      • curlbl

        protected LabelList curlbl
        Current labels.
      • attributes

        protected DoubleArray attributes
        Double array storing the numerical attributes during parsing.
      • labels

        final java.util.ArrayList<java.lang.String> labels
        (Reused) store for labels.
      • unique

        it.unimi.dsi.fastutil.objects.ObjectOpenHashSet<java.lang.String> unique
        For String unification.
      • warnedPrecision

        boolean warnedPrecision
        Emit a double-precision limit warning once.
      • warnedDim

        boolean warnedDim
        Emit a dimensionality change warning once.
    • Constructor Detail

      • NumberVectorLabelParser

        public NumberVectorLabelParser​(CSVReaderFormat format,
                                       long[] labelIndices,
                                       NumberVector.Factory<V> factory)
        Constructor.
        Parameters:
        format - Input format
        labelIndices - Column indexes that are not numeric.
        factory - Vector factory
      • NumberVectorLabelParser

        public NumberVectorLabelParser​(NumberVector.Factory<V> factory)
        Constructor with defaults.
        Parameters:
        factory - Vector factory
      • NumberVectorLabelParser

        public NumberVectorLabelParser​(java.util.regex.Pattern colSep,
                                       java.lang.String quoteChars,
                                       java.util.regex.Pattern comment,
                                       long[] labelIndices,
                                       NumberVector.Factory<V> factory)
        Constructor.
        Parameters:
        colSep - Column separator
        quoteChars - Quote character
        comment - Comment pattern
        labelIndices - Column indexes that are not numeric.
        factory - Vector factory
    • Method Detail

      • isLabelColumn

        protected boolean isLabelColumn​(int col)
        Test if the current column is marked as label column.
        Parameters:
        col - Column number
        Returns:
        true when a label column.
      • buildMeta

        protected void buildMeta()
        Update the meta element.
      • data

        public java.lang.Object data​(int rnum)
        Description copied from interface: BundleStreamSource
        Access a particular object and representation.
        Parameters:
        rnum - Representation number
        Returns:
        Contained data
      • parseLineInternal

        protected boolean parseLineInternal()
        Internal method for parsing a single line. Used by both line based parsing as well as block parsing. This saves the building of meta data for each line.
        Returns:
        true when a valid line was read, false on a label row.
      • createVector

        protected V createVector()
        Creates a database object of type V.
        Returns:
        a vector of type V containing the given attribute values
      • getTypeInformation

        SimpleTypeInformation<V> getTypeInformation​(int mindim,
                                                    int maxdim)
        Get a prototype object for the given dimensionality.
        Parameters:
        mindim - Minimum dimensionality
        maxdim - Maximum dimensionality
        Returns:
        Prototype object