通过发现和聚类相关术语来搜索WebSearching the Web by Discovering and Clustering Related Terms |
|
课程网址: | http://videolectures.net/solomon_dias_swdcr/ |
主讲教师: | Gaël Dias |
开课单位: | 贝拉大学 |
开课时间: | 2007-02-25 |
课程语种: | 英语 |
中文简介: | 网络上的信息量增长如此之快,以至于经典搜索引擎查找相关信息变得越来越困难。实际上,由于以不同语言(有时用错误解释的语言)编写的网页的疯狂增加,人类语言的歧义程度一直在不断发展,直至目前为止还看不到。但是,人们仍然查询系统时平均不超过2个字。结果,需要提出新的信息检索系统以降低查询的歧义水平。这样的系统通常利用查询扩展技术来解决此问题。在本次演讲中,我将介绍一个基于自动发现与查询有关的术语的系统,以帮助用户搜索相关信息。可以在交互式查询扩展系统中对这种技术进行分类。但是,与其他系统不同,我们使用Web挖掘技术根据不同的功能(例如关联度量,文档相似性,文档相关性等)发现相关术语。在第二部分,我将介绍检索系统的未来扩展基于自动发现相关术语之间的关系。因此,通过使用聚集聚类技术和自动填充的WebWarehouse,我们希望能够提出比当前系统中更少歧义的查询扩展术语,在当前系统中,用户需要整理出他感兴趣的术语。 :网络蜘蛛是一个系统,它从给定的URL和给定的查询返回所有相关的术语和链接。 :Spider是使用C5.0机器学习算法开发的。 |
课程简介: | The amount of information on the web is growing so fast that it is becoming more and more difficult for classical search engines to find relevant information. Indeed, due to the frenetic increase of webpages written in different languages and sometimes in mis-interpreted languages, the degree of ambiguity of the human language has been constantly evolving to levels unseen so far. However, people still query the systems with no more than 2 words on average. As a consequence, new information retrieval systems need to be proposed to decrease the level of ambiguity of the queries. Such systems usually make use of query expansion techniques to solve this problem. In this talk, I will present a system based on the automatic discovery of terms that are related to the query as a means of helping the user to search for relevant information. This technique can be classified within Interactive Query Expansion systems. However, unlike other systems, we use Web Mining Techniques to discover related terms based on different features such as association measures, document similarity, document relevance, etc. In the second part of my talk, I will present the future extensions of our retrieval systems based on the automatic discovery of relations between related terms. So, by using agglomerative clustering techniques and an auto-fed WebWarehouse, we hope to be able to propose less ambiguous query expansion terms than in present systems where the user needs to sort out the terms he is interested in. ; Web Spider is a system that returns all related terms and links from a given URL and a given query. : The Spider has been developped using C5.0 machine learning algorithm. |
关 键 词: | 经典搜索引擎; 查询扩展; 聚集聚类 |
课程来源: | 视频讲座网 |
最后编审: | 2021-12-21:liyy |
阅读次数: | 47 |