資訊管理學報

葉榮懋;施武榮;徐芳玲;
頁: 157-176
日期: 2010/01
摘要: 資訊科技的日新月異,資料的儲存與處理規模均與過去有相當大的差距。如何從龐大的資料量中擷取出有用的資訊以提供給決策者參考,一直是資料探勘領域裡所關注的重點。決策樹由於其運算容易,又能產生清楚的規則,使其成為資料探勘中最常用的分類技術之一。但是當處理的資料量龐大,且名目屬性的屬性值相當多的情況之下,若每一屬性值都形成一個分支,則決策樹的分支太多將會造成所萃取的規則過於複雜難以解讀,資料在處理上的效率也會大打折扣。 本論文發展一種簡化決策樹的方法,可將資料庫內的名目屬性做二元分割,把資料分成二支,以減少過多與不必要的決策樹分支。本研究採用主成分分析法中,可表示大部分變異的第一主成分,並利用該成分裡經過標準化成分分數的平均值,作為二元分割屬性值的基準,以消除過多的屬性值分支,使得決策樹的外顯知識容易解讀。最後,並以四個UCI資料庫內的資料集作為測試樣本,結果顯示本研究所提的方法,在決策樹的精簡與分類正確性上都有良好的表現。
關鍵字: 決策樹;資料探勘;分類;啟發式方法;主成分分析;

A Modified Heuristic Method to Construct the Binary Decision Tree of Nominal Attributes


Abstract: The ability to extract useful information from a large-scale database to aid decision-making is critical in data mining. Classification is an important problem in data mining. It has been studied extensively as a possible solution to the knowledge acquisition. Decision tree has become one of the most commonly used techniques for classifying data because the algorithm for generating a decision tree can be easily implemented. However, when there are too many distinct values of the nominal attributes in each node of a tree, the branches of the tree become enormous and complicated. As a result, the effectiveness of data processing in a large data set may be compromised. This paper aims to propose a heuristic method to simplify the decision tree by splitting the nominal attributes into two branches. We adopt principal component analysis to present an algorithm for finding a good partition strategy in order to reduce unnecessary branches of a decision tree. Since the principal component can represent most of the variants, the first component scores of each attribute will be utilized as the thresholds for splitting examples. The decision tree can be simplified to a binary tree so that the explicit knowledge of a tree can be easily extracted. We also compare against other heuristic methods and give an analysis of experimental results on four UCI data sets.
Keywords: Decision tree;Data mining;Classification;Heuristic method;Principal component analysis;

瀏覽次數: 11067     下載次數: 215

引用     導入Endnote

相關文章推薦

Top Downlaod Papers