資訊管理學報

陳林志;葉國暉;陳大仁;陳冠瑜;
頁: 155-183
日期: 2017/04
摘要: 部落格搜尋引擎是ㄧ種類似於谷歌的搜尋引擎,因為它們會自動收集來自網路上大量的資訊,並利用免費的介面讓一般人能搜索它們的資料庫。兩者之間的差異在於,部落格搜尋引擎主要是針對部落格進行索引並篩選掉一般的網頁,這個功能讓部落格搜尋引擎增加了一些特殊和獨特性。首先,每個部落格都有一個發佈日期,而部落格搜尋引擎可以顯示文章的發佈日期,相比一般搜尋引擎只能顯示最後更新日期,有時這些日期卻是不可靠的。其次,部落格搜尋引擎能抓取部落格文章發佈日期,相較於一般的搜尋引擎雖然有進階的搜索選項可以顯示日期,但這些都僅限於網頁的最後修改日期。本論文中,我們使用四種語意模型分析谷歌部落格引尋引擎:潛在語意分析(LSA)、機率潛在語意分析(PLSA)、潛在狄利克里分配(LDA)、關係主題模型(RTM)。另外,我們提出一個利用時間參數來改良RTM 的變形模型。根據實驗的結果,改良的RTM 模型結合時間參數能提高谷歌部落格引擎效能。
關鍵字: 潛在語意分析;機率潛在語意模型;潛在狄利克里分配;關係主題模型;谷歌部落格搜尋;

Improving the Performance of Google Blog Search Based on the Time Parameter


Abstract: Purpose-Blog search engines are similar to web search engines like Google in that they automatically gather large quantities of information from the web and give a free interface to allow the public to search their databases. Design/methodology/approach-In this paper, we use four kinds of semantic models to analyze Google blog search engine: Latent Semantic Analysis (LSA), Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocation (LDA), and Relational Topic Model (RTM). Findings-According to the result of experiment, our modified RTM's model can effectively combine the time parameter to Google blog search engine. Research limitations/implications-The main difference between the two is that blog search engines mainly index blogs and ignore the rest of the web. The special features of blogs give blog search engines some specific and unique attributes. Practical implications-First, since each blog posting is dated, blog search engines can reported the date at which the posting was created. For normal web pages, search engines can only report the last updated date, and this is often not very reliable. Second, many blog search engines have a date-specific search capability. Again, some general search engines have this as an advanced search option, but only for the last modified date of pages. Originality/value-In this paper, we propose a variant of RTM, which mainly focuses on the time parameter.
Keywords: latent semantic analysis;probabilistic latent semantic analysis;latent dirichlet allocation;relational topic model;Google blog search;

瀏覽次數: 13335     下載次數: 143

引用     導入Endnote