Package elki.itemsetmining
Class APRIORI
- java.lang.Object
-
- elki.itemsetmining.AbstractFrequentItemsetAlgorithm
-
- elki.itemsetmining.APRIORI
-
- All Implemented Interfaces:
Algorithm
@Title("APRIORI: Algorithm for Mining Association Rules") @Description("Searches for frequent itemsets") @Reference(authors="R. Agrawal, R. Srikant", title="Fast Algorithms for Mining Association Rules", booktitle="Proc. 20th Int. Conf. on Very Large Data Bases (VLDB \'94)", url="http://www.vldb.org/conf/1994/P487.PDF", bibkey="DBLP:conf/vldb/AgrawalS94") public class APRIORI extends AbstractFrequentItemsetAlgorithm
The APRIORI algorithm for Mining Association Rules.Reference:
R. Agrawal, R. Srikant
Fast Algorithms for Mining Association Rules
In Proc. 20th Int. Conf. on Very Large Data Bases (VLDB '94)This implementation uses some simple optimizations for 1- and 2-itemsets, but does not implement the original hash tree (yet, please contribute).
Note: this algorithm scales well to a large number of transactions, but not so well to a large number of frequent itemsets (items). For best results, use domain-specific preprocessing to aggregate items into groups. Use statistics logging to keep track of candidate set sizes.
- Since:
- 0.1
- Author:
- Arthur Zimek, Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classAPRIORI.ParParameterization class.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected java.util.List<Itemset>aprioriGenerate(java.util.List<? extends Itemset> supported, int length, int dim)Prunes a given set of candidates to keep only those BitSets where all subsets of bits flipping one bit are frequent already.private intbinarySearch(java.util.List<SparseItemset> candidates, SparseItemset scratch, int begin, int end)Binary-search for the next-larger element.protected java.util.List<OneItemset>buildFrequentOneItemsets(Relation<? extends SparseFeatureVector<?>> relation, int dim, int needed)Build the 1-itemsets.protected java.util.List<SparseItemset>buildFrequentTwoItemsets(java.util.List<OneItemset> oneitems, Relation<BitVector> relation, int dim, int needed, DBIDs ids, ArrayModifiableDBIDs survivors)Build the 2-itemsets.private java.lang.StringBuilderdebugDumpCandidates(java.lang.StringBuilder msg, java.util.List<? extends Itemset> candidates, VectorFieldTypeInformation<BitVector> meta)Debug method: output all itemsets.protected java.util.List<? extends Itemset>frequentItemsets(java.util.List<? extends Itemset> candidates, Relation<BitVector> relation, int needed, DBIDs ids, ArrayModifiableDBIDs survivors, int length)Returns the frequent BitSets out of the given BitSets with respect to the given database.protected java.util.List<SparseItemset>frequentItemsetsSparse(java.util.List<SparseItemset> candidates, Relation<BitVector> relation, int needed, DBIDs ids, ArrayModifiableDBIDs survivors, int length)Returns the frequent BitSets out of the given BitSets with respect to the given database.TypeInformation[]getInputTypeRestriction()Get the input type restriction used for negotiating the data query.private booleaninitializeSearchItemset(BitVector bv, int[] scratchi, int[] iters)Initialize the scratch itemset.private booleannextSearchItemset(BitVector bv, int[] scratchi, int[] iters)Advance scratch itemset to the next.FrequentItemsetsResultrun(Relation<BitVector> relation)Performs the APRIORI algorithm on the given database.-
Methods inherited from class elki.itemsetmining.AbstractFrequentItemsetAlgorithm
autorun, getMinimumSupport
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
The logger for this class.
-
STAT
private final java.lang.String STAT
Statistics logging prefix.
-
-
Constructor Detail
-
APRIORI
public APRIORI(double minfreq, int minlength, int maxlength)Constructor with minimum frequency.- Parameters:
minfreq- Minimum frequencyminlength- Minimum lengthmaxlength- Maximum length
-
APRIORI
public APRIORI(double minfreq)
Constructor with minimum frequency.- Parameters:
minfreq- Minimum frequency
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:AlgorithmGet the input type restriction used for negotiating the data query.- Returns:
- Type restriction
-
run
public FrequentItemsetsResult run(Relation<BitVector> relation)
Performs the APRIORI algorithm on the given database.- Parameters:
relation- the Relation to process- Returns:
- the AprioriResult learned by this APRIORI
-
buildFrequentOneItemsets
protected java.util.List<OneItemset> buildFrequentOneItemsets(Relation<? extends SparseFeatureVector<?>> relation, int dim, int needed)
Build the 1-itemsets.- Parameters:
relation- Data relationdim- Maximum dimensionalityneeded- Minimum support needed- Returns:
- 1-itemsets
-
buildFrequentTwoItemsets
protected java.util.List<SparseItemset> buildFrequentTwoItemsets(java.util.List<OneItemset> oneitems, Relation<BitVector> relation, int dim, int needed, DBIDs ids, ArrayModifiableDBIDs survivors)
Build the 2-itemsets.- Parameters:
oneitems- Frequent 1-itemsetsrelation- Data relationdim- Maximum dimensionalityneeded- Minimum support neededids- Objects to processsurvivors- Output: objects that had at least two 1-frequent items.- Returns:
- Frequent 2-itemsets
-
aprioriGenerate
protected java.util.List<Itemset> aprioriGenerate(java.util.List<? extends Itemset> supported, int length, int dim)
Prunes a given set of candidates to keep only those BitSets where all subsets of bits flipping one bit are frequent already.- Parameters:
supported- Support maplength- Itemset lengthdim- Dimensionality- Returns:
- itemsets that cannot be pruned by apriori
-
frequentItemsets
protected java.util.List<? extends Itemset> frequentItemsets(java.util.List<? extends Itemset> candidates, Relation<BitVector> relation, int needed, DBIDs ids, ArrayModifiableDBIDs survivors, int length)
Returns the frequent BitSets out of the given BitSets with respect to the given database.- Parameters:
candidates- the candidates to be evaluatedrelation- the database to evaluate the candidates onneeded- Minimum support neededids- Objects to processsurvivors- Output: objects that had at least two 1-frequent items.length- Itemset length- Returns:
- Itemsets with sufficient support
-
frequentItemsetsSparse
protected java.util.List<SparseItemset> frequentItemsetsSparse(java.util.List<SparseItemset> candidates, Relation<BitVector> relation, int needed, DBIDs ids, ArrayModifiableDBIDs survivors, int length)
Returns the frequent BitSets out of the given BitSets with respect to the given database. Optimized implementation for SparseItemset.- Parameters:
candidates- the candidates to be evaluatedrelation- the database to evaluate the candidates onneeded- Minimum support neededids- Objects to processsurvivors- Output: objects that had at least two 1-frequent items.length- Itemset length- Returns:
- Itemsets with sufficient support
-
initializeSearchItemset
private boolean initializeSearchItemset(BitVector bv, int[] scratchi, int[] iters)
Initialize the scratch itemset.- Parameters:
bv- Bit vector data sourcescratchi- Scratch itemsetiters- Iterator array- Returns:
trueif the itemset had minimum length
-
nextSearchItemset
private boolean nextSearchItemset(BitVector bv, int[] scratchi, int[] iters)
Advance scratch itemset to the next.- Parameters:
bv- Bit vector data sourcescratchi- Scratch itemsetiters- Iterator array- Returns:
trueif the itemset had minimum length
-
binarySearch
private int binarySearch(java.util.List<SparseItemset> candidates, SparseItemset scratch, int begin, int end)
Binary-search for the next-larger element.- Parameters:
candidates- Candidates to search forscratch- Scratch spacebegin- Search interval beginend- Search interval end- Returns:
- Position of first equal-or-larger element
-
debugDumpCandidates
private java.lang.StringBuilder debugDumpCandidates(java.lang.StringBuilder msg, java.util.List<? extends Itemset> candidates, VectorFieldTypeInformation<BitVector> meta)Debug method: output all itemsets.- Parameters:
msg- Output buffercandidates- Itemsets to dumpmeta- Metadata for item labels- Returns:
- Output buffer
-
-