資訊管理學報

李瑞庭;林明志;王韻茹;陳國泰;
頁: 27-49
日期: 2010/12
摘要: 隨著資料維度的增加,現有利用全部資料維度的分群方法,已經不適用於分析高維度的資料。因此,近年來子空間分群的方法愈來愈受重視。在本篇論文中,我們提出一個新的方法以探勘重要的子空間。我們所提出的方法包括三個步驟,首先,我們將所有的資料點投影到二維空間,並產生許多頻繁子空間;然後,我們利用遞迴的方式結合這些頻繁子空間,以形成更大的頻繁子空間;最後,我們採用貪婪演算法做總結,從所產生的頻繁子空間中選出重要的子空間。實驗結果顯示,我們所提出的方法在品質方面優於FIRES,在涵蓋率與品質方面,皆優於DUSC。
關鍵字: 子空間探勘;子空間分群;頻繁子空間;資料探勘;貪婪演算法;

重要子空間之資料探勘


Abstract: As both the number of dimensions increases, existing clustering methods in full feature space are not appropriate to cluster data in databases. Thus, the subspace clustering has attracted more and more attention recently. In this paper, we propose a novel method to mine significant subspaces from all frequent subspaces, where a subspace is frequent if it contains enough data points. The proposed method consists of three phases. First, we generate all frequent 2-dimensional subspaces. Second, we recursively combine frequent k-dimensional subspaces to generate frequent (k+1)-dimensional subspaces, k≥2. Finally, we adopt a greedy algorithm to summarize the frequent subspaces generated and select the significant ones. The experimental results show that the proposed method has better quality and coverage than DUSC, and better quality than FIRES.
Keywords: subspace mining;subspace clustering;frequent subspace;data mining;greedy algorithm;

瀏覽次數: 12521     下載次數: 1220

引用     導入Endnote

相關文章推薦

Top Downlaod Papers