0


TaxonGen:基于自适应术语嵌入和聚类的无监督主题分类构建

TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering
课程网址: http://videolectures.net/kdd2018_zhang_taxogen_construction/  
主讲教师: Chao Zhang
开课单位: 伊利诺伊大学
开课时间: 2018-11-23
课程语种: 英语
中文简介:
分类构建不仅是文本语料库语义分析的基本任务,也是信息过滤、推荐和Web搜索等应用的重要步骤。现有的基于模式的方法提取高假名-低假名词对,然后将这些对组织成一个分类法。然而,通过将每个术语视为一个独立的概念节点,他们忽略了术语之间的主题接近度和语义相关性。在本文中,我们提出了一种构建主题分类的方法,其中每个节点表示一个概念主题,并被定义为一组语义上一致的概念术语。我们的方法TaxonGen使用术语嵌入和分层聚类以递归方式构建主题分类。为了保证递归过程的质量,它包括:(1)一个自适应球形聚类模块,用于在将粗主题划分为细粒度主题时将术语分配到适当的级别;(2) 用于学习术语嵌入的本地嵌入模块,该模块在分类的不同级别上保持强大的辨别能力。我们在两个真实数据集上的实验表明,与基线方法相比,TaxoGen的有效性。
课程简介: Taxonomy construction is not only a fundamental task for semantic analysis of text corpora, but also an important step for applications such as information filtering, recommendation, and Web search. Existing pattern-based methods extract hypernym-hyponym term pairs and then organize these pairs into a taxonomy. However, by considering each term as an independent concept node, they over-look the topical proximity and the semantic correlations among terms. In this paper, we propose a method for constructing topic taxonomies, wherein every node represents a conceptual topic and is defined as a cluster of semantically coherent concept terms. Our method, TaxoGen, uses term embeddings and hierarchical cluster-ing to construct a topic taxonomy in a recursive fashion. To ensure the quality of the recursive process, it consists of: (1) an adaptive spherical clustering module for allocating terms to proper levels when splitting a coarse topic into fine-grained ones; (2) a local embedding module for learning term embeddings that maintain strong discriminative power at different levels of the taxonomy. Our experiments on two real datasets demonstrate the effectiveness of TaxoGen compared with baseline methods.
关 键 词: 分类构建; 概念节点; 嵌入模块
课程来源: 视频讲座网
数据采集: 2022-12-12:chenjy
最后编审: 2022-12-12:chenjy
阅读次数: 29