New framework for itemset generation
Abstract
The problem of finding association rules in a large data-base of sales transactions has been widely studied in the literature. We discuss some of the weaknesses of the large itemset method for association rule generation. A different method for evaluating and finding itemsets referred to as strongly collective itemset is proposed. The concepts of `support' of an itemset and correlation of the items within an itemset are related, though not quite the same. This criterion stresses the importance of the actual correlation of the items with one another rather than the absolute support. Previously proposed methods to provide correlated itemsets are not necessarily applicable to very large databases. We provide an algorithm which provides very good computational efficiency, while maintaining statistical robustness. The fact that this algorithm relies on relative measures rather than absolute measures such as support also implies that the method can be applied to find association rules in datasets in which items may appear in a sizeable percentage of the transactions (dense datasets), datasets in which the items have varying density, or even negative association rules.