Class FPGrowth

  • All Implemented Interfaces:
    Algorithm

    @Reference(authors="J. Han, J. Pei, Y. Yin",
               title="Mining frequent patterns without candidate generation",
               booktitle="Proc. ACM SIGMOD Int. Conf. Management of Data (SIGMOD 2000)",
               url="https://doi.org/10.1145/342009.335372",
               bibkey="DBLP:conf/sigmod/HanPY00")
    @Priority(200)
    public class FPGrowth
    extends AbstractFrequentItemsetAlgorithm
    FP-Growth is an algorithm for mining the frequent itemsets by using a compressed representation of the database called FPGrowth.FPTree.

    FP-Growth first sorts items by the overall frequency, since having high frequent items appear first in the tree leads to a much smaller tree since frequent subsets will likely share the same path in the tree. FP-Growth is beneficial when you have a lot of (near-) duplicate transactions, and are using a not too high support threshold, as it only prunes single items, not item combinations.

    This implementation is in-memory only, and has not yet been carefully optimized.

    The worst case memory use probably is \(O(\min(n\cdot l,i^l))\) where i is the number of items, l the average itemset length, and n the number of items. The worst case scenario is when every item is frequent, and every transaction is unique. The resulting tree will then be larger than the original data.

    Reference:

    J. Han, J. Pei, Y. Yin
    Mining frequent patterns without candidate generation
    In Proc. ACM SIGMOD Int. Conf. Management of Data (SIGMOD 2000)

    Since:
    0.7.0
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
      • STAT

        private static final java.lang.String STAT
        Prefix for statistics.
    • Constructor Detail

      • FPGrowth

        public FPGrowth​(double minsupp,
                        int minlength,
                        int maxlength)
        Constructor.
        Parameters:
        minsupp - Minimum support (relative or absolute)
        minlength - Minimum length
        maxlength - Maximum length
    • Method Detail

      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Returns:
        Type restriction
      • countItemSupport

        private int[] countItemSupport​(Relation<BitVector> relation,
                                       int dim)
        Count the support of each 1-item.
        Parameters:
        relation - Data
        dim - Maximum dimensionality
        Returns:
        Item counts
      • buildFPTree

        private FPGrowth.FPTree buildFPTree​(Relation<BitVector> relation,
                                            int[] iidx,
                                            int items)
        Build the actual FP-tree structure.
        Parameters:
        relation - Data
        iidx - Inverse index (dimension to item rank)
        items - Number of items
        Returns:
        FP-tree
      • buildIndex

        private int[] buildIndex​(int[] counts,
                                 int[] positions,
                                 int minsupp)
        Build a forward map, item id (dimension) to frequency position
        Parameters:
        counts - Item counts
        positions - Position index (output)
        minsupp - Minimum support
        Returns:
        Forward index