AMiner中的名称消歧:集群、维护和循环中的人][Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop]_MOOC(慕课)境外开放课程

首页 → 工程与技术科学
首页 → 计算机科学技术

AMiner中的名称消歧:集群、维护和循环中的人 Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop


课程网址:	http://videolectures.net/kdd2018_zhang_AMiner/
主讲教师:	Yutao Zhang
开课单位:	清华大学
开课时间:	2018-11-23
课程语种:	英语
中文简介:	AMiner 1是一个免费的在线学术搜索和挖掘系统，从多个出版数据库[25]中收集了超过1.3亿名研究人员的资料和超过2亿篇论文。在本文中，我们介绍了AMiner的核心组件名称消歧的实现和部署。这个问题已经研究了几十年，但在很大程度上仍未得到解决。在AMiner中，我们对这个问题进行了系统的调查，并提出了一个全面的框架来解决这个问题。我们提出了一种结合全局和局部信息的新颖表示学习方法，并提出了一种端到端的聚类大小估计方法，该方法明显优于传统的基于bic的方法。为了提高准确性，我们在消除歧义过程中使用了人工注释。我们在真实世界的大数据上仔细评估了所提出的框架，实验结果表明，所提出的解决方案获得了明显更好的性能(就f1得分而言+7-35%)，比一些最先进的方法，包括GHOST [5]， Zhang等人[33]和Louppe等人[17]。最后，将该算法应用于AMiner中处理十亿级的消歧问题，进一步证明了所提框架的有效性和效率。
课程简介:	AMiner 1 is a free online academic search and mining system, having collected more than 130,000,000 researcher profiles and over 200,000,000 papers from multiple publication databases [25]. In this paper, we present the implementation and deployment of name disambiguation , a core component in AMiner. The problem has been studied for decades but remains largely unsolved. In AMiner, we did a systemic investigation into the problem and propose a comprehensive framework to address the problem. We propose a novel representation learning method by incorporating both global and local information and present an end-to-end cluster size estimation method that is significantly better than traditional BIC-based method. To improve accuracy, we involve human annotators into the disambiguation process. We carefully evaluate the proposed framework on real-world large data and experimental results show that the proposed solution achieves clearly better performance (+7-35% in terms of F1-score) than several state-of-the-art methods including GHOST [5], Zhang et al. [33], and Louppe et al. [17]. Finally, the algorithm has been deployed in AMiner to deal with the disambiguation problem at the billion scale, which further demonstrates both effectiveness and efficiency of the proposed framework.
关键词:	在线学术搜索; 挖掘系统; 学习方法
课程来源:	视频讲座网
数据采集:	2022-12-16：chenjy
最后编审:	2022-12-16：chenjy
阅读次数:	117

服务热线：0574-88229129
电子邮件：info_lib@nbt.edu.cn
信息服务：图书馆305室
系统研发：图书馆303室

图书馆学生服务群：437507696
图书馆教工服务群：1038697975
QQ在线咨询
2013-2026 © 浙大宁波理工学院图书馆