AI Article Synopsis

  • Frequent itemset mining (FIM) is crucial for association rule mining but is computationally challenging, especially with large and dense datasets.
  • Researchers have developed various methods to approximate itemset support using small subsets of data to maintain both efficiency and accuracy.
  • The new ProbDF algorithm optimizes memory use by discarding transaction data after determining small itemsets and utilizes a probabilistic support prediction model for larger itemsets, shown to improve performance in experiments.

Article Abstract

Frequent itemset mining (FIM) is a major component in association rule mining, significantly influencing its performance. FIM is a computationally intensive nondeterministic polynomial time (NP)-hard problem. At the core of FIM is the task of computing support of candidate itemsets. This problem becomes more severe when the dataset is dense as the support is computed for millions, or even billions, of candidate itemsets. The rapid growth of data further exacerbates this problem. To achieve high scalability and efficiency, recently, researchers have proposed various approaches to approximate the support of an itemset using as small a subset of transaction data as possible. In addition to efficiency, accuracy is another important metric for these algorithms. They strive to increase true positives and reduce false negatives and false positives. One such recently proposed approximate FIM algorithm is Probabilistic Breadth-First (ProbBF), which is highly efficient for dense data due to its unique approach of not using transactional data beyond 2-size itemsets. Unlike other counterparts, this algorithm requires no additional input parameters beyond the traditional support threshold. However, ProbBF is a breadth-first algorithm, and it is well-established that breadth-first FIM algorithms consume significantly more memory than depth-first algorithms on dense datasets. It is also worth noting that significantly high memory consumption slows run-time performance of an algorithm due to low utilization of locality of reference, thrashing, and aggressive garbage collection . This article proposes a FIM algorithm, ProbDF, that discards transaction data after determining all frequent itemsets of sizes one and two. For frequent itemsets of size three or more, it employs a probabilistic support prediction model (PSPM) to predict their support probabilistically. PSPM, first proposed with ProbBF, uses lightweight calculations that exclude transaction data. Our experiments demonstrate that ProbDF, with its depth-first search strategy tailored to PSPM and other optimizations, is efficient in terms of time and space, and successfully generates the majority of frequent itemsets on real-world benchmark datasets. However, due to the probabilistic nature of ProbDF, some compromise in quality is inevitable.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623086PMC
http://dx.doi.org/10.7717/peerj-cs.2334DOI Listing

Publication Analysis

Top Keywords

transaction data
12
frequent itemsets
12
dense data
8
probabilistic support
8
support prediction
8
candidate itemsets
8
fim algorithm
8
data
7
support
7
fim
6

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!