0


从Web中测量隐含语义关系的相似性

Measuring the Similarity between Implicit Semantic Relations from the Web
课程网址: http://videolectures.net/www09_bollegala_mtsisr/  
主讲教师: Danushka Bollegala; Yutaka Matsuo; Mitsuru Ishizuka
开课单位: 东京大学
开课时间: 2009-05-20
课程语种: 日语
中文简介:
在关系提取、信息检索和类比检测等各种Web相关任务中,度量实体之间的语义关系之间的相似性是一个重要而必要的步骤。例如,假设一个人知道一对实体(如谷歌,YouTube),在这两个实体之间有一个特定的关系(如收购)。此人对检索具有类似关系(如Microsoft、Powerset)的其他_such对感兴趣。在这种情况下,不能直接应用现有的基于关键字的搜索引擎,因为在基于关键字的搜索中,目标是检索与查询中使用的单词相关的文档——不一定是检索一对单词隐含的关系。提出了一种基于网络搜索引擎的关系相似性度量方法,用于计算由两对词所隐含的语义关系之间的相似性。该方法由三个部分组成:使用自动提取的词汇模式表示一对词之间存在的各种语义关系;对提取的词汇模式进行聚类,以识别表达特定语义关系的不同模式;使用度量学习方法度量语义关系之间的相似性。我们从两个方面对该方法进行了评价:命名实体间的语义关系分类和单词类比问题的求解。该方法在关系分类任务中优于所有基线,平均精度得分为0.74,具有统计学意义。通过潜在关系分析,将374个单词类比题的处理时间从9天减少到不足6小时,SAT成绩为51%。
课程简介: Measuring the similarity between semantic relations that hold among entities is an important and necessary step in various Web related tasks such as relation extraction, information retrieval and analogy detection. For example, consider the case in which a person knows a pair of entities (e.g. Google, YouTube), between which a particular relation holds (e.g. acquisition). The person is interested in retrieving other_such pairs with similar relations (e.g. Microsoft, Powerset). Existing keyword-based search engines cannot be applied directly in this case because, in keyword-based search, the goal is to retrieve documents that are relevant to the words used in a query -- not necessarily to the relations implied by a pair of words. We propose a relational similarity measure, using a Web search engine, to compute the similarity between semantic relations implied by two pairs of words. Our method has three components: representing the various semantic relations that exist between a pair of words using automatically extracted lexical patterns, clustering the extracted lexical patterns to identify the different patterns that express a particular semantic relation, and measuring the similarity between semantic relations using a metric learning approach. We evaluate the proposed method in two tasks: classifying semantic relations between named entities, and solving word-analogy questions. The proposed method outperforms all baselines in a relation classification task with a statistically significant average precision score of 0.74. Moreover, it reduces the time take by Latent Relational Analysis to process 374 word-analogy questions from 9 days to less than 6 hours, with a SAT score of 51%.
关 键 词: Web挖掘; 语义网; 计算机科学; 机器学习
课程来源: 视频讲座网
最后编审: 2019-10-29:lxf
阅读次数: 60