在数以百万计的日志条目中识别个人故事 ][Identifying Personal Stories in Millions of Weblog Entries]_MOOC(慕课)境外开放课程

   首页 → 社会学
   首页 → 教育心理学
   首页 → 统计学

在数以百万计的日志条目中识别个人故事 Identifying Personal Stories in Millions of Weblog Entries


课程网址:	http://videolectures.net/icwsm09_gordon_ipsmwe/
主讲教师:	Andrew S. Gordon
开课单位:	南加利福尼亚大学
开课时间:	2009-06-24
课程语种:	英语
中文简介:	人们日常经历的故事长期以来一直是心理学和社会学研究的焦点，并且越来越多地被用于创新的知识技术。然而，由于缺乏足够大小的标准语料库以及从头开始创建一个语料库的成本，这个领域的持续研究受到阻碍。。在本文中，我们通过识别ICWSM 2009 Spinn3rDataset中数百万篇博文中的个人故事来描述我们为该领域的研究人员开发标准语料库的努力。我们的方法是对博客条目的内容采用统计文本分类技术，这需要创建一组足够大的注释训练示例。我们描述了这种分类技术的发展和评估，以及如何将其应用于数据集以识别近百万个人的故事。
课程简介:	Stories of people's everyday experiences have long been the focus of psychology and sociology research, and are increasingly being used in innovative knowledge-based technologies. However, continued research in this area is hindered by the lack of standard corpora of sufficient size and by the costs of creating one from scratch. In this paper, we describe our efforts to develop a standard corpus for researchers in this area by identifying personal stories in the tens of millions of blog posts in the ICWSM 2009 Spinn3r Dataset. Our approach was to employ statistical text classification technology on the content of blog entries, which required the creation of a sufficiently large set of annotated training examples. We describe the development and evaluation of this classification technology and how it was applied to the dataset in order to identify nearly a million personal stories.
关键词:	心理学; 社会学; 标准语料库
课程来源:	视频讲座网
最后编审:	2019-04-26：lxf
阅读次数:	183

服务热线：0574-88229129
电子邮件：info_lib@nbt.edu.cn
信息服务：图书馆305室
系统研发：图书馆303室

图书馆学生服务群：437507696
图书馆教工服务群：1038697975
QQ在线咨询
2013-2026 © 浙大宁波理工学院图书馆