Package elki.itemsetmining
Class APRIORI
- java.lang.Object
-
- elki.itemsetmining.AbstractFrequentItemsetAlgorithm
-
- elki.itemsetmining.APRIORI
-
- All Implemented Interfaces:
Algorithm
@Title("APRIORI: Algorithm for Mining Association Rules") @Description("Searches for frequent itemsets") @Reference(authors="R. Agrawal, R. Srikant", title="Fast Algorithms for Mining Association Rules", booktitle="Proc. 20th Int. Conf. on Very Large Data Bases (VLDB \'94)", url="http://www.vldb.org/conf/1994/P487.PDF", bibkey="DBLP:conf/vldb/AgrawalS94") public class APRIORI extends AbstractFrequentItemsetAlgorithm
The APRIORI algorithm for Mining Association Rules.Reference:
R. Agrawal, R. Srikant
Fast Algorithms for Mining Association Rules
In Proc. 20th Int. Conf. on Very Large Data Bases (VLDB '94)This implementation uses some simple optimizations for 1- and 2-itemsets, but does not implement the original hash tree (yet, please contribute).
Note: this algorithm scales well to a large number of transactions, but not so well to a large number of frequent itemsets (items). For best results, use domain-specific preprocessing to aggregate items into groups. Use statistics logging to keep track of candidate set sizes.
- Since:
- 0.1
- Author:
- Arthur Zimek, Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
APRIORI.Par
Parameterization class.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected java.util.List<Itemset>
aprioriGenerate(java.util.List<? extends Itemset> supported, int length, int dim)
Prunes a given set of candidates to keep only those BitSets where all subsets of bits flipping one bit are frequent already.private int
binarySearch(java.util.List<SparseItemset> candidates, SparseItemset scratch, int begin, int end)
Binary-search for the next-larger element.protected java.util.List<OneItemset>
buildFrequentOneItemsets(Relation<? extends SparseFeatureVector<?>> relation, int dim, int needed)
Build the 1-itemsets.protected java.util.List<SparseItemset>
buildFrequentTwoItemsets(java.util.List<OneItemset> oneitems, Relation<BitVector> relation, int dim, int needed, DBIDs ids, ArrayModifiableDBIDs survivors)
Build the 2-itemsets.private java.lang.StringBuilder
debugDumpCandidates(java.lang.StringBuilder msg, java.util.List<? extends Itemset> candidates, VectorFieldTypeInformation<BitVector> meta)
Debug method: output all itemsets.protected java.util.List<? extends Itemset>
frequentItemsets(java.util.List<? extends Itemset> candidates, Relation<BitVector> relation, int needed, DBIDs ids, ArrayModifiableDBIDs survivors, int length)
Returns the frequent BitSets out of the given BitSets with respect to the given database.protected java.util.List<SparseItemset>
frequentItemsetsSparse(java.util.List<SparseItemset> candidates, Relation<BitVector> relation, int needed, DBIDs ids, ArrayModifiableDBIDs survivors, int length)
Returns the frequent BitSets out of the given BitSets with respect to the given database.TypeInformation[]
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.private boolean
initializeSearchItemset(BitVector bv, int[] scratchi, int[] iters)
Initialize the scratch itemset.private boolean
nextSearchItemset(BitVector bv, int[] scratchi, int[] iters)
Advance scratch itemset to the next.FrequentItemsetsResult
run(Relation<BitVector> relation)
Performs the APRIORI algorithm on the given database.-
Methods inherited from class elki.itemsetmining.AbstractFrequentItemsetAlgorithm
autorun, getMinimumSupport
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
The logger for this class.
-
STAT
private final java.lang.String STAT
Statistics logging prefix.
-
-
Constructor Detail
-
APRIORI
public APRIORI(double minfreq, int minlength, int maxlength)
Constructor with minimum frequency.- Parameters:
minfreq
- Minimum frequencyminlength
- Minimum lengthmaxlength
- Maximum length
-
APRIORI
public APRIORI(double minfreq)
Constructor with minimum frequency.- Parameters:
minfreq
- Minimum frequency
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:Algorithm
Get the input type restriction used for negotiating the data query.- Returns:
- Type restriction
-
run
public FrequentItemsetsResult run(Relation<BitVector> relation)
Performs the APRIORI algorithm on the given database.- Parameters:
relation
- the Relation to process- Returns:
- the AprioriResult learned by this APRIORI
-
buildFrequentOneItemsets
protected java.util.List<OneItemset> buildFrequentOneItemsets(Relation<? extends SparseFeatureVector<?>> relation, int dim, int needed)
Build the 1-itemsets.- Parameters:
relation
- Data relationdim
- Maximum dimensionalityneeded
- Minimum support needed- Returns:
- 1-itemsets
-
buildFrequentTwoItemsets
protected java.util.List<SparseItemset> buildFrequentTwoItemsets(java.util.List<OneItemset> oneitems, Relation<BitVector> relation, int dim, int needed, DBIDs ids, ArrayModifiableDBIDs survivors)
Build the 2-itemsets.- Parameters:
oneitems
- Frequent 1-itemsetsrelation
- Data relationdim
- Maximum dimensionalityneeded
- Minimum support neededids
- Objects to processsurvivors
- Output: objects that had at least two 1-frequent items.- Returns:
- Frequent 2-itemsets
-
aprioriGenerate
protected java.util.List<Itemset> aprioriGenerate(java.util.List<? extends Itemset> supported, int length, int dim)
Prunes a given set of candidates to keep only those BitSets where all subsets of bits flipping one bit are frequent already.- Parameters:
supported
- Support maplength
- Itemset lengthdim
- Dimensionality- Returns:
- itemsets that cannot be pruned by apriori
-
frequentItemsets
protected java.util.List<? extends Itemset> frequentItemsets(java.util.List<? extends Itemset> candidates, Relation<BitVector> relation, int needed, DBIDs ids, ArrayModifiableDBIDs survivors, int length)
Returns the frequent BitSets out of the given BitSets with respect to the given database.- Parameters:
candidates
- the candidates to be evaluatedrelation
- the database to evaluate the candidates onneeded
- Minimum support neededids
- Objects to processsurvivors
- Output: objects that had at least two 1-frequent items.length
- Itemset length- Returns:
- Itemsets with sufficient support
-
frequentItemsetsSparse
protected java.util.List<SparseItemset> frequentItemsetsSparse(java.util.List<SparseItemset> candidates, Relation<BitVector> relation, int needed, DBIDs ids, ArrayModifiableDBIDs survivors, int length)
Returns the frequent BitSets out of the given BitSets with respect to the given database. Optimized implementation for SparseItemset.- Parameters:
candidates
- the candidates to be evaluatedrelation
- the database to evaluate the candidates onneeded
- Minimum support neededids
- Objects to processsurvivors
- Output: objects that had at least two 1-frequent items.length
- Itemset length- Returns:
- Itemsets with sufficient support
-
initializeSearchItemset
private boolean initializeSearchItemset(BitVector bv, int[] scratchi, int[] iters)
Initialize the scratch itemset.- Parameters:
bv
- Bit vector data sourcescratchi
- Scratch itemsetiters
- Iterator array- Returns:
true
if the itemset had minimum length
-
nextSearchItemset
private boolean nextSearchItemset(BitVector bv, int[] scratchi, int[] iters)
Advance scratch itemset to the next.- Parameters:
bv
- Bit vector data sourcescratchi
- Scratch itemsetiters
- Iterator array- Returns:
true
if the itemset had minimum length
-
binarySearch
private int binarySearch(java.util.List<SparseItemset> candidates, SparseItemset scratch, int begin, int end)
Binary-search for the next-larger element.- Parameters:
candidates
- Candidates to search forscratch
- Scratch spacebegin
- Search interval beginend
- Search interval end- Returns:
- Position of first equal-or-larger element
-
debugDumpCandidates
private java.lang.StringBuilder debugDumpCandidates(java.lang.StringBuilder msg, java.util.List<? extends Itemset> candidates, VectorFieldTypeInformation<BitVector> meta)
Debug method: output all itemsets.- Parameters:
msg
- Output buffercandidates
- Itemsets to dumpmeta
- Metadata for item labels- Returns:
- Output buffer
-
-