Class SimpleTransactionParser

  • All Implemented Interfaces:
    BundleStreamSource, Parser, StreamingParser

    public class SimpleTransactionParser
    extends AbstractStreamingParser
    Simple parser for transactional data, such as market baskets.

    To keep the input format simple and readable, all tokens are assumed to be of text and separated by whitespace, and each transaction is on a separate line.

    An example file containing two transactions looks like this

     bread butter milk
     paste tomato basil
     
    TODO: add a parameter to, e.g., use the first or last entry as labels instead of tokens.
    Since:
    0.7.0
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
      • numterms

        int numterms
        Number of different terms observed.
      • keymap

        it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap<java.lang.String> keymap
        Map.
      • buf

        it.unimi.dsi.fastutil.longs.LongArrayList buf
        Buffer, will be reused.
    • Constructor Detail

      • SimpleTransactionParser

        public SimpleTransactionParser​(CSVReaderFormat format)
        Constructor.
        Parameters:
        format - Input format