資訊管理學報

黃燕萍;許中川;
頁: 219-237
日期: 2007/10
摘要: 資料探勘是從大量資料中擷取隱藏、未知與潛在,但具有實用性的資訊分析方法。在資料探勘領域中,知識探勘的相關研究已有長足的進步。時間序列資料,包含大量未知與潛在的資訊。財務類型的資料庫中,通常存有大量的時間序列資料。過去時間序列相關研究以迴歸分析為主,傳統迴歸分析模型的統計性質,多半建立在線性模型的基礎上;然而,線性模型對於變動幅度不大的非線性模型,尚可作較高準確度的估計,但是,若變動幅度超過某一限度,估計的準確性就會降低,因而減少其應用上的價值。自組映射圖類神經網路,目前是時間序列資料研究中經常使用的分析方法之一。然而,自組映射圖類神經網路為一種高度非線性模式,只能處理數值型資料,無法有效處理混合型的資料。 因此,本研究提出以概念階層為導向之樣版資料探勘模式,利用物以類聚的原理,從分析過去的樣本資料中,針對財務類型資料庫之時間序列資料,學習樣版辨識,利用同類相聚的特性以達分群之目的;更進一步在模式中找出未知、潛在但具有實用性的樣版資訊,及精簡且具代表性的規則,以此協助預估財務資料的變動。
關鍵字: 資料探勘;分群演算法;樣板探勘;時間序列分析;

以概念階層為導向之時間序列模式資料探勘-以財務資料庫為例


Abstract: Data Mining is the process of automatically searching large volumes of data for patterns and it is also a fairly recent and contemporary topic in computing. Nowadays, pattern discovery is a field within the area of data mining. In general, large volumes of time series data are contained in financial database and these data have some useful but not easy finding patterns in it and many financial studies in time series data analysis use linear regression model to estimate the variation and trend of the data. However, traditional methods of time series analysis used special types or linear models to describe the data. Linear models can achieve high accuracy when linear variation of the data is small, however, if the variation range exceeds a certain limit, the linear models has a lower performance in estimated accuracy. SOM is a famous non-linear model and traditional method to extract pattern with numeric data. Many researches extract pattern from numeric data attributes rather than categorical or mixed data. It does not extract the major values from pattern rules, either. The purpose of this study is to provide a novel architecture in mining patterns from mixed data that uses a systematic approach in the financial database information mining, and try to find the patterns for estimate the trend or for special event's occurrence. This study uses ESA algorithm to discover the pattern in the Concept Hierarchy based Pattern Discovery (CHPD) architecture. Specifically, this architecture facilitates the direct handling of mixed data, including categorical and numeric values. This mining architecture can simulate human intelligence and discover patterns automatically, and it also demonstrates knowledge pattern discovery and rule extraction.
Keywords: Data mining;Cluster analysis;Pattern discovery;Time series analysis;

瀏覽次數: 13906     下載次數: 949

引用     導入Endnote

相關文章推薦

Top Downlaod Papers