Class FPGrowth
- java.lang.Object
-
- elki.itemsetmining.AbstractFrequentItemsetAlgorithm
-
- elki.itemsetmining.FPGrowth
-
- All Implemented Interfaces:
Algorithm
@Reference(authors="J. Han, J. Pei, Y. Yin", title="Mining frequent patterns without candidate generation", booktitle="Proc. ACM SIGMOD Int. Conf. Management of Data (SIGMOD 2000)", url="https://doi.org/10.1145/342009.335372", bibkey="DBLP:conf/sigmod/HanPY00") @Priority(200) public class FPGrowth extends AbstractFrequentItemsetAlgorithm
FP-Growth is an algorithm for mining the frequent itemsets by using a compressed representation of the database calledFPGrowth.FPTree
.FP-Growth first sorts items by the overall frequency, since having high frequent items appear first in the tree leads to a much smaller tree since frequent subsets will likely share the same path in the tree. FP-Growth is beneficial when you have a lot of (near-) duplicate transactions, and are using a not too high support threshold, as it only prunes single items, not item combinations.
This implementation is in-memory only, and has not yet been carefully optimized.
The worst case memory use probably is \(O(\min(n\cdot l,i^l))\) where i is the number of items, l the average itemset length, and n the number of items. The worst case scenario is when every item is frequent, and every transaction is unique. The resulting tree will then be larger than the original data.
Reference:
J. Han, J. Pei, Y. Yin
Mining frequent patterns without candidate generation
In Proc. ACM SIGMOD Int. Conf. Management of Data (SIGMOD 2000)- Since:
- 0.7.0
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
FPGrowth.FPNode
A single node of the FP tree.static class
FPGrowth.FPTree
FP-Tree data structurestatic class
FPGrowth.Par
Parameterization class.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Constructor Summary
Constructors Constructor Description FPGrowth(double minsupp, int minlength, int maxlength)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private FPGrowth.FPTree
buildFPTree(Relation<BitVector> relation, int[] iidx, int items)
Build the actual FP-tree structure.private int[]
buildIndex(int[] counts, int[] positions, int minsupp)
Build a forward map, item id (dimension) to frequency positionprivate int[]
countItemSupport(Relation<BitVector> relation, int dim)
Count the support of each 1-item.TypeInformation[]
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.FrequentItemsetsResult
run(Relation<BitVector> relation)
Run the FP-Growth algorithm-
Methods inherited from class elki.itemsetmining.AbstractFrequentItemsetAlgorithm
autorun, getMinimumSupport
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger.
-
STAT
private static final java.lang.String STAT
Prefix for statistics.
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:Algorithm
Get the input type restriction used for negotiating the data query.- Returns:
- Type restriction
-
run
public FrequentItemsetsResult run(Relation<BitVector> relation)
Run the FP-Growth algorithm- Parameters:
relation
- Bit vector relation- Returns:
- Frequent patterns found
-
countItemSupport
private int[] countItemSupport(Relation<BitVector> relation, int dim)
Count the support of each 1-item.- Parameters:
relation
- Datadim
- Maximum dimensionality- Returns:
- Item counts
-
buildFPTree
private FPGrowth.FPTree buildFPTree(Relation<BitVector> relation, int[] iidx, int items)
Build the actual FP-tree structure.- Parameters:
relation
- Dataiidx
- Inverse index (dimension to item rank)items
- Number of items- Returns:
- FP-tree
-
buildIndex
private int[] buildIndex(int[] counts, int[] positions, int minsupp)
Build a forward map, item id (dimension) to frequency position- Parameters:
counts
- Item countspositions
- Position index (output)minsupp
- Minimum support- Returns:
- Forward index
-
-