Class APRIORI

  • All Implemented Interfaces:
    Algorithm

    @Title("APRIORI: Algorithm for Mining Association Rules")
    @Description("Searches for frequent itemsets")
    @Reference(authors="R. Agrawal, R. Srikant",
               title="Fast Algorithms for Mining Association Rules",
               booktitle="Proc. 20th Int. Conf. on Very Large Data Bases (VLDB \'94)",
               url="http://www.vldb.org/conf/1994/P487.PDF",
               bibkey="DBLP:conf/vldb/AgrawalS94")
    public class APRIORI
    extends AbstractFrequentItemsetAlgorithm
    The APRIORI algorithm for Mining Association Rules.

    Reference:

    R. Agrawal, R. Srikant
    Fast Algorithms for Mining Association Rules
    In Proc. 20th Int. Conf. on Very Large Data Bases (VLDB '94)

    This implementation uses some simple optimizations for 1- and 2-itemsets, but does not implement the original hash tree (yet, please contribute).

    Note: this algorithm scales well to a large number of transactions, but not so well to a large number of frequent itemsets (items). For best results, use domain-specific preprocessing to aggregate items into groups. Use statistics logging to keep track of candidate set sizes.

    Since:
    0.1
    Author:
    Arthur Zimek, Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        The logger for this class.
      • STAT

        private final java.lang.String STAT
        Statistics logging prefix.
    • Constructor Detail

      • APRIORI

        public APRIORI​(double minfreq,
                       int minlength,
                       int maxlength)
        Constructor with minimum frequency.
        Parameters:
        minfreq - Minimum frequency
        minlength - Minimum length
        maxlength - Maximum length
      • APRIORI

        public APRIORI​(double minfreq)
        Constructor with minimum frequency.
        Parameters:
        minfreq - Minimum frequency
    • Method Detail

      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Returns:
        Type restriction
      • run

        public FrequentItemsetsResult run​(Relation<BitVector> relation)
        Performs the APRIORI algorithm on the given database.
        Parameters:
        relation - the Relation to process
        Returns:
        the AprioriResult learned by this APRIORI
      • buildFrequentOneItemsets

        protected java.util.List<OneItemset> buildFrequentOneItemsets​(Relation<? extends SparseFeatureVector<?>> relation,
                                                                      int dim,
                                                                      int needed)
        Build the 1-itemsets.
        Parameters:
        relation - Data relation
        dim - Maximum dimensionality
        needed - Minimum support needed
        Returns:
        1-itemsets
      • buildFrequentTwoItemsets

        protected java.util.List<SparseItemset> buildFrequentTwoItemsets​(java.util.List<OneItemset> oneitems,
                                                                         Relation<BitVector> relation,
                                                                         int dim,
                                                                         int needed,
                                                                         DBIDs ids,
                                                                         ArrayModifiableDBIDs survivors)
        Build the 2-itemsets.
        Parameters:
        oneitems - Frequent 1-itemsets
        relation - Data relation
        dim - Maximum dimensionality
        needed - Minimum support needed
        ids - Objects to process
        survivors - Output: objects that had at least two 1-frequent items.
        Returns:
        Frequent 2-itemsets
      • aprioriGenerate

        protected java.util.List<Itemset> aprioriGenerate​(java.util.List<? extends Itemset> supported,
                                                          int length,
                                                          int dim)
        Prunes a given set of candidates to keep only those BitSets where all subsets of bits flipping one bit are frequent already.
        Parameters:
        supported - Support map
        length - Itemset length
        dim - Dimensionality
        Returns:
        itemsets that cannot be pruned by apriori
      • frequentItemsets

        protected java.util.List<? extends Itemset> frequentItemsets​(java.util.List<? extends Itemset> candidates,
                                                                     Relation<BitVector> relation,
                                                                     int needed,
                                                                     DBIDs ids,
                                                                     ArrayModifiableDBIDs survivors,
                                                                     int length)
        Returns the frequent BitSets out of the given BitSets with respect to the given database.
        Parameters:
        candidates - the candidates to be evaluated
        relation - the database to evaluate the candidates on
        needed - Minimum support needed
        ids - Objects to process
        survivors - Output: objects that had at least two 1-frequent items.
        length - Itemset length
        Returns:
        Itemsets with sufficient support
      • frequentItemsetsSparse

        protected java.util.List<SparseItemset> frequentItemsetsSparse​(java.util.List<SparseItemset> candidates,
                                                                       Relation<BitVector> relation,
                                                                       int needed,
                                                                       DBIDs ids,
                                                                       ArrayModifiableDBIDs survivors,
                                                                       int length)
        Returns the frequent BitSets out of the given BitSets with respect to the given database. Optimized implementation for SparseItemset.
        Parameters:
        candidates - the candidates to be evaluated
        relation - the database to evaluate the candidates on
        needed - Minimum support needed
        ids - Objects to process
        survivors - Output: objects that had at least two 1-frequent items.
        length - Itemset length
        Returns:
        Itemsets with sufficient support
      • initializeSearchItemset

        private boolean initializeSearchItemset​(BitVector bv,
                                                int[] scratchi,
                                                int[] iters)
        Initialize the scratch itemset.
        Parameters:
        bv - Bit vector data source
        scratchi - Scratch itemset
        iters - Iterator array
        Returns:
        true if the itemset had minimum length
      • nextSearchItemset

        private boolean nextSearchItemset​(BitVector bv,
                                          int[] scratchi,
                                          int[] iters)
        Advance scratch itemset to the next.
        Parameters:
        bv - Bit vector data source
        scratchi - Scratch itemset
        iters - Iterator array
        Returns:
        true if the itemset had minimum length
      • binarySearch

        private int binarySearch​(java.util.List<SparseItemset> candidates,
                                 SparseItemset scratch,
                                 int begin,
                                 int end)
        Binary-search for the next-larger element.
        Parameters:
        candidates - Candidates to search for
        scratch - Scratch space
        begin - Search interval begin
        end - Search interval end
        Returns:
        Position of first equal-or-larger element
      • debugDumpCandidates

        private java.lang.StringBuilder debugDumpCandidates​(java.lang.StringBuilder msg,
                                                            java.util.List<? extends Itemset> candidates,
                                                            VectorFieldTypeInformation<BitVector> meta)
        Debug method: output all itemsets.
        Parameters:
        msg - Output buffer
        candidates - Itemsets to dump
        meta - Metadata for item labels
        Returns:
        Output buffer