Class Tokenizer

  • All Implemented Interfaces:
    Iter

    public class Tokenizer
    extends java.lang.Object
    implements Iter
    String tokenizer.
    Since:
    0.6.0
    Author:
    Erich Schubert
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private int end
      Current positions of result and iterator.
      private int index
      Current positions of result and iterator.
      private java.lang.CharSequence input
      Data currently processed.
      private static Logging LOG
      Class logger.
      private java.util.regex.Matcher matcher
      Regular expression match helper.
      static java.lang.String QUOTE_CHAR
      Quote characters
      private char[] quoteChars
      Stores the quotation character
      private boolean quoted
      Whether the current token is a quoted string.
      private int send
      Substring to process.
      private int start
      Current positions of result and iterator.
    • Constructor Summary

      Constructors 
      Constructor Description
      Tokenizer​(java.util.regex.Pattern colSep, java.lang.String quoteChars)
      Constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      Tokenizer advance()
      Moves the iterator forward to the next entry.
      void cleanup()
      Perform cleanup.
      char getChar​(int off)
      Get a single character.
      double getDouble()
      Get current value as double.
      int getEnd()
      Get end of token.
      int getIntBase10()
      Get current value as int.
      int getLength()
      Get length of token.
      long getLongBase10()
      Get current value as long.
      int getStart()
      Get start of token.
      java.lang.String getStrippedSubstring()
      Get the current part as substring
      java.lang.String getSubstring()
      Get the current part as substring
      void initialize​(java.lang.CharSequence input, int begin, int end)
      Initialize parser with a new string.
      boolean isEmpty()
      Test for empty tokens; usually at end of line.
      private char isQuote​(int index)
      Detect quote characters.
      boolean isQuoted()
      Test if the current string was quoted.
      boolean valid()
      Returns true if the iterator currently points to a valid object.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
      • QUOTE_CHAR

        public static final java.lang.String QUOTE_CHAR
        Quote characters
        See Also:
        Constant Field Values
      • quoteChars

        private char[] quoteChars
        Stores the quotation character
      • matcher

        private java.util.regex.Matcher matcher
        Regular expression match helper.
      • input

        private java.lang.CharSequence input
        Data currently processed.
      • send

        private int send
        Substring to process.
      • start

        private int start
        Current positions of result and iterator.
      • end

        private int end
        Current positions of result and iterator.
      • index

        private int index
        Current positions of result and iterator.
      • quoted

        private boolean quoted
        Whether the current token is a quoted string.
    • Constructor Detail

      • Tokenizer

        public Tokenizer​(java.util.regex.Pattern colSep,
                         java.lang.String quoteChars)
        Constructor.
        Parameters:
        colSep - Column separator pattern.
        quoteChars - Quotation character.
    • Method Detail

      • initialize

        public void initialize​(java.lang.CharSequence input,
                               int begin,
                               int end)
        Initialize parser with a new string.
        Parameters:
        input - New string to parse.
        begin - Begin
        end - End
      • valid

        public boolean valid()
        Description copied from interface: Iter
        Returns true if the iterator currently points to a valid object.
        Specified by:
        valid in interface Iter
        Returns:
        a boolean value, whether the position is valid.
      • advance

        public Tokenizer advance()
        Description copied from interface: Iter
        Moves the iterator forward to the next entry.
        Specified by:
        advance in interface Iter
        Returns:
        The iterator itself.
      • getSubstring

        public java.lang.String getSubstring()
        Get the current part as substring
        Returns:
        Current value as substring.
      • getStrippedSubstring

        public java.lang.String getStrippedSubstring()
        Get the current part as substring
        Returns:
        Current value as substring.
      • getDouble

        public double getDouble()
        Get current value as double.
        Returns:
        double value
        Throws:
        java.lang.NumberFormatException - when current value cannot be parsed as double
      • getIntBase10

        public int getIntBase10()
        Get current value as int.
        Returns:
        int value
        Throws:
        java.lang.NumberFormatException - when current value cannot be parsed as int.
      • getLongBase10

        public long getLongBase10()
        Get current value as long.
        Returns:
        long value
        Throws:
        java.lang.NumberFormatException - when current value cannot be parsed as long.
      • isEmpty

        public boolean isEmpty()
        Test for empty tokens; usually at end of line.
        Returns:
        Empty
      • isQuote

        private char isQuote​(int index)
        Detect quote characters.

        TODO: support more than one quote character, make sure opening and closing quotes match then.

        Parameters:
        index - Position
        Returns:
        1 when a quote character, 0 otherwise.
      • isQuoted

        public boolean isQuoted()
        Test if the current string was quoted.
        Returns:
        true when quoted.
      • getStart

        public int getStart()
        Get start of token.
        Returns:
        Start
      • getEnd

        public int getEnd()
        Get end of token.
        Returns:
        End
      • getLength

        public int getLength()
        Get length of token.
        Returns:
        Token length
      • getChar

        public char getChar​(int off)
        Get a single character.
        Parameters:
        off - Offset
        Returns:
        Character
      • cleanup

        public void cleanup()
        Perform cleanup.