Pudi, Vikram and Haritsa, Jayant R (2000) Quantifying the utility of the past in mining large databases. In: Information Systems, 25 (5). pp. 323-343.
Incremental mining algorithms that can efficiently derive the current mining output by utilizing previous mining results are attractive to business organizations since data mining is typically a resource-intensive recurring activity. In this paper, we present the DELTA algorithm for the robust and efficient incremental mining of association rules on large market basket databases. DELTA guarantees efficiency by ensuring that, for any dataset, at most three passes over the increment and one pass over the previous database are required to generate the desired rules. Further, it handles "multi-support" environments where the support requirements for the current mining differ from those used in the previous mining, a feature in tune with the exploratory nature of the mining process. We present a performance evaluation of DELTA on large databases over a range of increment sizes and data distributions, as well as change in support requirements. The experimental results show that DELTA can provide significant improvements in execution times over previously proposed incremental algorithms in all these environments. In fact, for many workloads, its performance is close to that achieved by an optimal, but practically infeasible, algorithm.
|Item Type:||Journal Article|
|Additional Information:||The copyright of this article belongs to Elsevier Science Ltd.|
|Keywords:||Data Mining;Association Rule;Hierarchical Association Rule;Incremental Mining|
|Department/Centre:||Division of Information Sciences > Supercomputer Education & Research Centre|
|Date Deposited:||17 Jul 2004|
|Last Modified:||19 Sep 2010 04:14|
Actions (login required)