
頁: 161-183
日期: 2014/04
摘要: 在資料探勘的分類問題中,大多數演算法都是設計在資料類別分布平均的情況下去訓練分類模型。然而,在實務應用上,資料類別分布不平衡是常見的狀況,在這樣的資料集設計的分類方法是很重要的研究議題。此外,透過分類模型所找到的規則常瑣碎複雜,透過突顯樣式探勘可以整理篩選出具有區分找出兩個類別之間的顯著差異與獨特識別的規則。然而,過去沒有相關研究在不平衡資料集上作突顯樣式探勘。本研究提出一個新的研究架構,基於關聯規則分類的方法,調整資料的權重於計算支持度,以探勘出不平衡資料集之突顯樣式,並加入不同年份間的突顯樣式變化探勘。本研究以真實之國道交通事故資料集為實證基礎,此資料為一個嚴重不平衡的資料集,死亡事故僅佔全部事故資料的百分之一比例都不到。然而,主管機關一直努力探求了解死亡事故發生原因,希望可以透過各項因應措施,增進行車安全減低死亡事故發生。本研究將透過提出之研究架構,找出一般及稀有死亡事故的肇事因子間關聯,並分析不同年度間肇事因子,找出一些重要的樣式,提供交通管理單位參考。
關鍵字: 關聯規則分類;突顯樣式;不平衡資料集;高速公路事故;權重支持度;

Mining Emerging Patterns from Imbalance Dataset-A Case Study on Freeway Accident Database

Abstract: Traditional associative classification is used to search frequent patterns at the balance datasets. However, most real life datasets are imbalance. To discover special rare patterns from imbalance dataset is an important job. Currently, the freeway becomes the main transportation route at Taiwan. Because of the high speed and heavy traffic, accidents at highway would cause more serious injuries than other roads. The serious injury accidents are very small part among the accident data. The impact factors of these special cases are the most important issue. This study proposes a framework to explore the most significant reasons for serious accidents. The framework combines the associative classification method with the emerging patterns mining to discover rare and serious incidents. The weight of each accident is adjusted by the severity of accident. Since the rare items can be discovered by the proposed formula of calculation support. The results of an experiment that was conducted on a real accidents data demonstrated the efficacy of the proposed approach. After analysing these accidents, we provide some suggestions.
Keywords: Associative Classification;Emerging Patterns;Imbalance Dataset;Freeway Accident;Weight Support;

瀏覽次數: 12313     下載次數: 148

引用     導入Endnote


Top Downlaod Papers