0


在数以百万计的日志条目中识别个人故事

Identifying Personal Stories in Millions of Weblog Entries
课程网址: http://videolectures.net/icwsm09_gordon_ipsmwe/  
主讲教师: Andrew S. Gordon
开课单位: 南加利福尼亚大学
开课时间: 2009-06-24
课程语种: 英语
中文简介:
人们日常经历的故事长期以来一直是心理学和社会学研究的焦点,并且越来越多地被用于创新的知识技术。然而,由于缺乏足够大小的标准语料库以及从头开始创建一个语料库的成本,这个领域的持续研究受到阻碍。 。在本文中,我们通过识别ICWSM 2009 Spinn3rDataset中数百万篇博文中的个人故事来描述我们为该领域的研究人员开发标准语料库的努力。我们的方法是对博客条目的内容采用统计文本分类技术,这需要创建一组足够大的注释训练示例。我们描述了这种分类技术的发展和评估,以及如何将其应用于数据集以识别近百万个人的故事。
课程简介: Stories of people's everyday experiences have long been the focus of psychology and sociology research, and are increasingly being used in innovative knowledge-based technologies. However, continued research in this area is hindered by the lack of standard corpora of sufficient size and by the costs of creating one from scratch. In this paper, we describe our efforts to develop a standard corpus for researchers in this area by identifying personal stories in the tens of millions of blog posts in the ICWSM 2009 Spinn3r Dataset. Our approach was to employ statistical text classification technology on the content of blog entries, which required the creation of a sufficiently large set of annotated training examples. We describe the development and evaluation of this classification technology and how it was applied to the dataset in order to identify nearly a million personal stories.
关 键 词: 心理学; 社会学; 标准语料库
课程来源: 视频讲座网
最后编审: 2019-04-26:lxf
阅读次数: 62