0


识别文档集合中的时间模式和关键人物

Identifying Temporal Patterns and Key Players in Document Collections
课程网址: http://videolectures.net/mlss06tw_shaparenko_itpkp/  
主讲教师: Benyah Shaparenko
开课单位: 康奈尔大学
开课时间: 2007-02-25
课程语种: 英语
中文简介:
我们考虑了随着时间的推移分析文档集开发的问题,而不需要有意义的引用数据。给定一组带时间戳的文档,我们制定并探索以下两个问题。首先,主题是什么以及这些主题如何随着时间的推移而发展?其次,要深入了解推动这一发展的动力,那么在这个过程中最有影响力的文件和作者是谁?与引用分析中的先前工作不同,我们提出了解决这些问题的方法,而无需引用数据。这些方法仅使用文档的文本作为输入。因此,它们适用于更广泛的文档集(电子邮件,博客等),其中大多数缺少有意义的引用数据。我们评估了神经信息处理系统(NIPS)会议进程中的方法。即使使用我们实施的初步方法,结果表明这些方法是有效的,并且仅基于文本解决问题是可行的。实际上,基于文本的方法有时甚至可以识别引文分析遗漏的有影响力的论文。
课程简介: We consider the problem of analyzing the development of a document collection over time without requiring meaningful citation data. Given a collection of timestamped documents, we formulate and explore the following two questions. First, what are the main topics and how do these topics develop over time? Second, to gain insight into the dynamics driving this development, what are the documents and who are the authors that are most influential in this process? Unlike prior work in citation analysis, we propose methods addressing these questions without requiring the availability of citation data. The methods use only the text of the documents as input. Consequentially, they are applicable to a much wider range of document collections (email, blogs, etc.), most of which lack meaningful citation data. We evaluate our methods on the proceedings of the Neural Information Processing Systems (NIPS) conference. Even with the preliminary methods that we implemented, the results show that the methods are effective and that addressing the questions based on the text alone is feasible. In fact, the text-based methods sometimes even identify influential papers that are missed by citation analysis.
关 键 词: 文档集开发; 数据引用; 神经信息处理系统
课程来源: 视频讲座网
最后编审: 2019-07-16:cjy
阅读次数: 32