
Linking Named Entities in Tweets with Knowledge Base via User Interest Modeling
课程网址: http://videolectures.net/kdd2013_shen_interest_modeling/  
主讲教师: Wei Shen
开课单位: 清华大学
开课时间: 2013-09-27
课程语种: 英语

Twitter已成为越来越重要的信息来源,每天发布超过4亿条推文。链接从推文中检测到的命名实体提及与知识库中相应的真实世界实体的任务称为推文实体链接。该任务具有实际重要性,并且可以促进许多不同的任务,例如个性化推荐和用户兴趣发现。由于推文的嘈杂,简短和非正式性质,推文实体链接任务具有挑战性。先前的方法着重于链接Web文档中的实体,并且主要依赖于围绕实体提及的上下文以及文档中实体之间的主题一致性。但是,由于tweet中包含的上下文信息不足,因此无法将这些方法有效地应用于tweet实体链接任务。在本文中,我们提出了KAURI,这是一个基于图的框架,可以通过对用户感兴趣的主题进行建模,来共同链接用户发布的所有推文中的所有命名实体提及。我们的假设是,每个用户在各种命名实体上都有潜在的主题兴趣分布。 KAURI将推文内的本地信息与推文间的用户兴趣信息集成到一个基于图的统一框架中。我们广泛评估了KAURI在手动注释的Tweet语料库上的性能,实验结果表明KAURI在准确性方面明显优于基线方法,并且KAURI高效并且可以很好地扩展到Tweet流中。

课程简介: Twitter has become an increasingly important source of information, with more than 400 million tweets posted per day. The task to link the named entity mentions detected from tweets with the corresponding real world entities in the knowledge base is called tweet entity linking. This task is of practical importance and can facilitate many different tasks, such as personalized recommendation and user interest discovery. The tweet entity linking task is challenging due to the noisy, short, and informal nature of tweets. Previous methods focus on linking entities in Web documents, and largely rely on the context around the entity mention and the topical coherence between entities in the document. However, these methods cannot be effectively applied to the tweet entity linking task due to the insufficient context information contained in a tweet. In this paper, we propose KAURI, a graph-based framework to collectively link all the named entity mentions in all tweets posted by a user via modeling the user's topics of interest. Our assumption is that each user has an underlying topic interest distribution over various named entities. KAURI integrates the intra-tweet local information with the inter-tweet user interest information into a unified graph-based framework. We extensively evaluated the performance of KAURI over manually annotated tweet corpus, and the experimental results show that KAURI significantly outperforms the baseline methods in terms of accuracy, and KAURI is efficient and scales well to tweet stream.
关 键 词: 链接Web; Twitter; 知识库
