首页系统学
   首页信息科学与系统科学

将结构引入文本:挖掘短语、实体概念、主题和层次结构

Bringing Structure to Text: Mining Phrases, Entity Concepts, Topics, and Hierarchies
课程网址: http://videolectures.net/kdd2014_han_wang_el_kishky_structure_tex...  
主讲教师: Jiawei Han; Chi Wang; Ahmed El-Kishky
开课单位: 伊利诺伊大学
开课时间: 2014-10-09
课程语种: 英语
中文简介:

从大文本语料库中挖掘短语,实体概念,主题和层次结构是大数据时代的一个基本问题。电子形式的文本数据无处不在,范围从科学文章到社交网络,企业日志,新闻文章,社交媒体和一般网页。非常需要将结构带入非结构化文本数据,发现底层层次结构,关系,模式和趋势并从此类数据中获取知识,但是具有挑战性。在本教程中,我们将对数据驱动方法的最新发展状况进行全面的调查,这些方法可自动挖掘短语,从文本语料库中提取和推断潜在结构,并构建多粒度主题分组和基础主题的层次结构。我们使用包括研究论文和新闻文章在内的几个真实数据集研究它们的原理,方法,算法和应用,并演示这些方法如何工作以及未发现的潜在实体结构如何帮助文本理解,知识发现和管理。

课程简介: Mining phrases, entity concepts, topics, and hierarchies from massive text corpus is an essential problem in the age of big data. Text data in electronic forms are ubiquitous, ranging from scientific articles to social networks, enterprise logs, news articles, social media and general web pages. It is highly desirable but challenging to bring structure to unstructured text data, uncover underlying hierarchies, relationships, patterns and trends, and gain knowledge from such data. In this tutorial, we provide a comprehensive survey on the state-of-the art of data-driven methods that automatically mine phrases, extract and infer latent structures from text corpus, and construct multi-granularity topical groupings and hierarchies of the underlying themes. We study their principles, methodologies, algorithms and applications using several real datasets including research papers and news articles and demonstrate how these methods work and how the uncovered latent entity structures may help text understanding, knowledge discovery and management.
关 键 词: 社交网络; 大文本语料库; 数据驱动
课程来源: 视频讲座网
数据采集: 2020-11-01:zyk
最后编审: 2020-11-01:zyk
阅读次数: 109