資訊管理學報

葉進儀;林彣珊;郭文熙;
頁: 123-149
日期: 2008/10
摘要: 現存於大型資料庫的關聯規則探勘方式,大都利用支持度修剪策略來降低搜尋關聯規則的時間,但此策略於低支持度門檻時,無法有效的找出潛在有價值的樣式,而且因為支持度太低,導致額外的資源(例如記憶體)需求也過大;在高支持度門檻時,則會遺失具有低支持度,但卻有高信賴度與高相關性的樣式。本研究先證明約定值具有跨支持度特性,然後再利用此特性修剪及刪除沒有價值的項目集,加快演算法的執行速度與節省系統的資源,而且如果一個項目集其約定值大於最小約定值門檻,則這一個項目集的支持度會大於某一個程度的底限,由此項目集所延伸出來的關聯規則,其信賴度也會大於某一程度的底限,因此利用約定值所探勘出來的關聯規則是有價值的。本研究最後將此演算機制應用於真實之交易資料上,實驗結果顯示利用約定值跨支持度特性的修剪策略可以減少尋找大型項目集的時間,且所探勘出的大型項目集,其項目間也具有高度的相關性。
關鍵字: 資料探勘;關聯規則;約定值;跨支持度;

Applying Bond-based Algorithm for Mining Association Rules


Abstract: Most current methods of mining association rules for large database use support pruning strategy to reduce searching space of finding out association rules. However, the strategy is not efficient to mine valuable patterns because it consumes lots of resources when the support threshold is low. Meanwhile when the support threshold is high, it will lose valuable itemsets which have lower support, higher confidence, and higher correlation. This paper applies the concept of bond-based threshold to mine association rules for large databases. We first prove that the bond has a cross-support property and then use this property to prune invaluable itemsets. This can improve the efficiency of the algorithm and reserve system resources. If the bond of itemset is greater than the bond-based threshold, the support of this itemset would be greater than some limit. The confidence of the association rules produced by the itemset would also be greater than some limit. The itemset would have high correlation between individual items. Therefore, when we use both bond and support pruning strategy, the association rules will be valuable. Our experiments were performed on real data sets. The experimental results show that this approach can reduce search space and find the valuable patterns, and the valuable patterns have high correlation between individual items.
Keywords: data mining;association rules;bond;cross-support;

瀏覽次數: 17326     下載次數: 155

引用     導入Endnote

相關文章推薦

Top Downlaod Papers