資訊管理學報

翁慈宗;楊乃玉;
頁: 54-75
日期: 2018/01
摘要: 在資料探勘的分類演算法中,簡易貝氏分類器具有運算效率高且分類正確率佳之優勢,已廣泛應用在許多實務上。由於簡易貝氏分類器係以計算條件機率之方式進行分類預測,所以大部分會加入先驗分配之機制提升分類正確率,且一般係採用狄氏分配或廣義狄氏分配當成先驗分配進行資料屬性可能值機率之參數調整。然而,過去研究對於資料檔中類別值的機率卻未有加入先驗分配之機制,如此可能導致分類正確率之提升有所限制。所以本研究提出潛在狄氏配置簡易貝氏分類器(Latent Dirichlet Allocation Naïve Bayes; LDANB),透過潛在狄氏配置模型,將先驗分配機制加入類別值之機率,進行參數調整,使資料更接近原本真實概念,並藉由UCI的20個資料檔進行實證研究測試。研究結果顯示使用潛在狄氏配置模型之簡易貝氏分類器優於僅將屬性可能值加入先驗分配之情況,且廣義狄氏分配優於狄氏分配。惟廣義狄氏分配之運算複雜度較高,是故,建議潛在狄氏配置簡易貝氏分類器之先驗分配模式採用屬性可能值為廣義狄氏分配,結合類別值為狄氏分配機制,更能在有限的運算成本下,提升分類正確率。
關鍵字: 簡易貝氏分類器;狄氏分配;廣義狄氏分配;潛在狄氏配置;

Latent Dirichlet Allocation Naïve Bayes


Abstract: Purpose-Naïve Bayesian classifier is widely employed for classification tasks, because of its computational efficiency and competitive accuracy. The prior distributions of attributes in the naïve Bayesian classifier are implicitly or explicitly assumed to follow either Dirichlet or generalized Dirichlet distributions. However, none of previous studies apply the prior distributions on classes in the naïve Bayesian classifier. The aim of this study is to develop a model based on LDA, called LDANB, that introduces prior distributions for both attributes and classes in the naïve Bayesian classifier so that the performance of this classification method can be improved. Design/methodology/approach-The prior distributions of both attributes and classes in the naïve Bayesian classifier can be Laplace's estimate, Dirichlet distribution, or generalized Dirichlet distribution. Nine combinations of priors for attributes and classes are explored to investigate their impact on the performance of the naïve Bayesian classifier. Findings-The experimental results on 20 data sets demonstrate that the LDANB generally has the best classification accuracy when the priors for both attributes and classes are generalized Dirichlet distributions. When computational efficiency is taken into account, Dirichlet prior can be a proper choice for classes. Research limitations/implications - The multivariate distributions defined on the unit simplex other that Dirichlet and generalized Dirichlet distributions are not considered. Practical implications - The procedure for introducing Dirichlet and generalized Dirichlet priors for attributes and classes is proposed to improve the performance of the naïve Bayesian classifier. The experimental results show that assuming priors for both attributes and classes are beneficial. Originality/value-The LDANB model for introducing priors for the naïve Bayesian classifier is novel, and this model can enhance the competitiveness of the naïve Bayesian classifier in classification tasks.
Keywords: Naïve Bayesian classifiers;Dirichlet distribution;generalized Dirichlet distribution;latent Dirichlet allocation;

瀏覽次數: 10301     下載次數: 146

引用     導入Endnote