
頁: 391-415
日期: 2014/10
摘要: 目前各個搜尋引擎所產生的網頁摘要,大多無法提供使用者充足的摘要內容判斷資訊,更可能造成使用者的誤導。本研究希望搜尋引擎將查詢結果回傳給使用者時,不只是給予一些片斷不全的訊息,取而代之的是一個比較有幫助的摘要,使用者可以藉由此自動摘要,了解全文的概要,然後決定是否需要讀取網頁之全文。本研究運用權重技術針對網頁的內容進行文字探勘,藉由中研院所開發的中文斷詞系統(CKIP)進行斷詞,利用TF-ISF與相似度權重技術分別進行摘要實作,並透過其聯集與交集分別產生「概略摘要」與「精準摘要」,藉以提升自動摘要的品質。由實驗結果可證實本研究所提出之系統方法可以有效的提升文件自動摘要的正確性。
關鍵字: 自動文件摘要;文字探勘;網際網路探勘;資訊檢索;TF-IDF演算法;

Automatic Text Summarization based on Wights of Words

Abstract: Purpose-The objective of text document summarization is to extract essential sentences that cover most of the concepts of a document so that users are able to comprehend the ideas of the documents which try to address by simply reading through the corresponding summary. This study aims to develop an automatic text summarization technique to product the summary of the web pages by extracting the sentences which cover most of the concepts of the web pages. Design/methodology/approach-The research framework was developed from CKIP (Chinese Knowledge Information Processing) system and automatic text summarization techniques. Two studies were designed to elicit and evaluate the accuracy and applicability of the five automatic text summarization techniques with 10 samples from 184 web articles. Findings-Our results show that TF-ISF (Term Frequency-Inverse Sentence Frequency) is better than the others in the evaluation of "F-measure". Further, "Rough Summary" and "Accurate Summary" respectively is the best performance in the evaluation of "RECALL" and "PRECISION". Research limitations/implications-This paper focuses on Chinese web articles. Hence, future research is recommended to develop an automatic text summarization system based on Ontology-based architecture. Practical implications-This paper provides several automatic text summarization techniques to product the summary of the web pages by extracting the sentences which cover most of the concepts of the web pages. The experimental results indicate that the proposed approach outperform a significant improvement on the accuracy of automatic text summarization. Originality/value-This paper is the first that applies the union and intersection of "Rough Summary" and "Accurate Summary" to improve the quality of automatic text summarization.
Keywords: automatic text summarization;text mining;Web mining;TF-IDF;

瀏覽次數: 12592     下載次數: 214

引用     導入Endnote