資訊管理學報

蔡玉娟;張簡雅文;
頁: 113-130
日期: 2005/07
摘要: 資料探勘技術應用於萃取大型資料庫中有用之資訊,以輔助決策之參考。關聯法則是資料探勘技術中最被廣為研究與使用的方法,其在發掘高頻項目集之瓶頸為需多次掃描資料庫並逐層產生大量需比對的候選項目集。本研究提出一個新的矩陣為基礎之關聯法則MBAR(Matrix-Based Association Rule),以克服上述缺點。MBAR之執行步驟為:(1)發掘L1並建立矩陣TMatrix-只需掃描原始資料庫一次以發掘L1,並建立以L1之項目為列(row)與交易記錄編號TID為欄(column)之矩陣TMatrix;(2)發掘高頻項目集L2-將矩陣TMatrix之列兩兩進行AND運算以直接發掘L2,無需產生C2,並建立L2之布林上三角矩陣SMatrix;(3)產生所有長度≧3之候選項目集-利用L2之布林上三角矩陣SMatrix特性配合項目集間遞移性,產生所有長度≧3之候選項目集;(4)發掘長度≧3之高頻項目集-利用TMatrix之列間的AND運算發掘所有長度≧3高頻項目集。實驗結果顯示,MBAR之執行效率與穩定性優於FP-Growth關聯法則。
關鍵字: 資料探勘;關聯法則;矩陣;遞移性;

Matrix-Based Association Rule


Abstract: Algorithms of association rules in techniques of data mining are used to find associations of products or bundles. Association algorithms proposed in the present studies are hedged about with starting at level 1. It costs a lot of time to find suitable bundles via steps of reiterating combination and calculation. In order to overcome the bottlenecks of association rule, we proposed a new Matrix-Based Association Rule (MBAR), finding out large 2-itemsets by intersecting each two rows of the matrix which was made up of large 1-items and transaction numbers without scanning the database, and using transitivity and the property of the upper triangular matrix which was made up of the pairs of large 2-itemsets to discover all potential candidate k-itemsets, where k≧3 Finally, finding out large k-itemsets, where k≧3, by intersecting each two rows of the upper triangular matrix. Experiments with real-life databases show that MBAR outperforms FP-Growth, a well-known and widely used association rule.
Keywords: Data Mining;Association Rules;Matrix;Transitivity;

瀏覽次數: 9083     下載次數: 694

引用     導入Endnote

相關文章推薦

Top Downlaod Papers