0


AMiner中的名称消除歧义:集群、维护和循环中的人

Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop
课程网址: http://videolectures.net/kdd2018_zhang_AMiner/  
主讲教师: Yutao Zhang
开课单位: 清华大学
开课时间: 2018-11-23
课程语种: 英语
中文简介:
AMiner 1是一个免费的在线学术搜索和挖掘系统,从多个出版物数据库中收集了超过13000000份研究人员简介和超过200000000篇论文[25]。 在本文中,我们介绍了AMiner中的核心组件名称消歧的实现和部署。这个问题已经研究了几十年,但基本上仍未解决。在AMiner中,我们对该问题进行了系统调查,并提出了解决该问题的全面框架。我们提出了一种结合全局和局部信息的新的表示学习方法,并提出了一个端到端的聚类大小估计方法,该方法明显优于传统的基于BIC的方法。为了提高准确性,我们让人类注释者参与消歧过程。我们在真实世界的大数据上仔细评估了所提出的框架,实验结果表明,与包括GHOST[5]、Zhang等人[33]和Louppe等人[17]在内的几种最先进的方法相比,所提出的解决方案实现了明显更好的性能(F1分数为+7-35%)。 最后,该算法已在AMiner中部署,以处理十亿规模的歧义消除问题,这进一步证明了所提出框架的有效性和效率。
课程简介: AMiner 1 is a free online academic search and mining system, having collected more than 130,000,000 researcher profiles and over 200,000,000 papers from multiple publication databases [25]. In this paper, we present the implementation and deployment of name disambiguation , a core component in AMiner. The problem has been studied for decades but remains largely unsolved. In AMiner, we did a systemic investigation into the problem and propose a comprehensive framework to address the problem. We propose a novel representation learning method by incorporating both global and local information and present an end-to-end cluster size estimation method that is significantly better than traditional BIC-based method. To improve accuracy, we involve human annotators into the disambiguation process. We carefully evaluate the proposed framework on real-world large data and experimental results show that the proposed solution achieves clearly better performance (+7-35% in terms of F1-score) than several state-of-the-art methods including GHOST [5], Zhang et al. [33], and Louppe et al. [17]. Finally, the algorithm has been deployed in AMiner to deal with the disambiguation problem at the billion scale, which further demonstrates both effectiveness and efficiency of the proposed framework.
关 键 词: 在线学术搜索和挖掘系统; 新的表示学习方法; AMiner中部署; 聚类大小估计方法
课程来源: 视频讲座网
数据采集: 2023-01-28:cyh
最后编审: 2023-01-28:cyh
阅读次数: 38