0


AMiner中的名称消歧:集群、维护和循环中的人

Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop
课程网址: http://videolectures.net/kdd2018_zhang_AMiner/  
主讲教师: Yutao Zhang
开课单位: 清华大学
开课时间: 2018-11-23
课程语种: 英语
中文简介:
AMiner 1是一个免费的在线学术搜索和挖掘系统,从多个出版数据库[25]中收集了超过1.3亿名研究人员的资料和超过2亿篇论文。 在本文中,我们介绍了AMiner的核心组件名称消歧的实现和部署。这个问题已经研究了几十年,但在很大程度上仍未得到解决。在AMiner中,我们对这个问题进行了系统的调查,并提出了一个全面的框架来解决这个问题。我们提出了一种结合全局和局部信息的新颖表示学习方法,并提出了一种端到端的聚类大小估计方法,该方法明显优于传统的基于bic的方法。为了提高准确性,我们在消除歧义过程中使用了人工注释。我们在真实世界的大数据上仔细评估了所提出的框架,实验结果表明,所提出的解决方案获得了明显更好的性能(就f1得分而言+7-35%),比一些最先进的方法,包括GHOST [5], Zhang等人[33]和Louppe等人[17]。 最后,将该算法应用于AMiner中处理十亿级的消歧问题,进一步证明了所提框架的有效性和效率。
课程简介: AMiner 1 is a free online academic search and mining system, having collected more than 130,000,000 researcher profiles and over 200,000,000 papers from multiple publication databases [25]. In this paper, we present the implementation and deployment of name disambiguation , a core component in AMiner. The problem has been studied for decades but remains largely unsolved. In AMiner, we did a systemic investigation into the problem and propose a comprehensive framework to address the problem. We propose a novel representation learning method by incorporating both global and local information and present an end-to-end cluster size estimation method that is significantly better than traditional BIC-based method. To improve accuracy, we involve human annotators into the disambiguation process. We carefully evaluate the proposed framework on real-world large data and experimental results show that the proposed solution achieves clearly better performance (+7-35% in terms of F1-score) than several state-of-the-art methods including GHOST [5], Zhang et al. [33], and Louppe et al. [17]. Finally, the algorithm has been deployed in AMiner to deal with the disambiguation problem at the billion scale, which further demonstrates both effectiveness and efficiency of the proposed framework.
关 键 词: 在线学术搜索; 挖掘系统; 学习方法
课程来源: 视频讲座网
数据采集: 2022-12-16:chenjy
最后编审: 2022-12-16:chenjy
阅读次数: 17