
XML Information Retrieval
课程网址: http://videolectures.net/russir2010_lalmas_xmlir/  
主讲教师: Mounia Lalmas
开课单位: 格拉斯哥大学
开课时间: 2011-03-18
课程语种: 英语
文件通常有内容和结构。内容是指文档的文本,而结构是指文档在逻辑上的组织方式。编码结构的一种越来越常见的方式是使用标记语言。如今,用于表示结构的最广泛使用的标记语言是可扩展标记语言(XML)。 XML可用于提供对文档的集中访问,即,响应于查询而返回诸如部分和段落的XML元素,而不是整个文档。这种集中策略对于包含长文档的信息存储库或涵盖各种主题的文档特别有益,其中用户被引导到文档中最相关的内容。越来越多地采用XML来表示文档结构需要开发工具来有效地访问用XML标记的文档。本课程提供了查询语言,索引策略,排名算法,为访问XML文档而开发的演示场景的详细说明。从2002年开始,XML信息检索取得了重大进展,这是INEX(XML检索评估计划)的结果。 INEX,也在本课程中描述,提供了用于评估XML信息检索有效性的测试集。本课程中描述的许多发展和成果都在INEX内进行了调查。
课程简介: Documents usually have content and structure. The content refers to the text of the document, whereas the structure refers to how a document is logically organized. An increasingly common way to encode the structure is through the use of a mark up language. Nowadays, the most widely used mark up language for representing structure is the eXtensible Mark up Language (XML). XML can be used to provide a focused access to documents, i.e. returning XML elements, such as sections and paragraphs, instead of whole documents in response to a query. Such focused strategies are of particular benefit for information repositories containing long documents, or documents covering a wide variety of topics, where users are directed to the most relevant content within a document. The increased adoption of XML to represent a document structure requires the development of tools to effectively access documents marked up in XML. This course provides a detailed description of query languages, indexing strategies, ranking algorithms, presentation scenarios developed to access XML documents. Major advances in XML information retrieval were seen from 2002 as a result of INEX, the Initiative for Evaluation of XML Retrieval. INEX, also described in this course, provided test sets for evaluating XML information retrieval effectiveness. Many of the developments and results described in this course were investigated within INEX.
关 键 词: 计算机科学; 信息检索; XML
课程来源: 视频讲座网
最后编审: 2020-06-28:yumf
阅读次数: 50