0


孟加拉语单词聚类无监督引导进行归纳的社会网络方法

A Social Network Approach to Unsupervised Induction of Syntactic Clusters for Bengali
课程网址: http://videolectures.net/eccs07_choudhury_sna/  
主讲教师: Monojit Choudhury
开课单位: 微软公司
开课时间: 2007-12-14
课程语种: 英语
中文简介:
在本文中,我们描述了从原始文本语料库中完全无监督地引导孟加拉语单词的部分语音标签的实验。为此,我们构建了5000个最频繁的孟加拉语单词网络,其中节点是类型,两种类型之间边缘的权重表示它们的分布相似性,并使用中文Whispers算法对网络进行聚类[1]。我们还提出了标签熵的概念,它根据组成单词的词汇类别来衡量单词簇的凝聚力。
课程简介: In this paper we describe some experiments on fully unsupervised induction of parts-of-speech tags for Bengali words from a raw text corpus. For this purpose, we construct the network of 5000 most frequent Bengali words, where nodes are the types and the weight on the edge between two types is indicative of their distributional similarity and cluster the network using the Chinese Whispers algorithm [1]. We also propose the concept of tag-entropy that measures the cohesiveness of the word clusters in terms of the lexical categories of the constituent words.
关 键 词: 孟加拉语; Whispers算法; 标签熵
课程来源: 视频讲座网
最后编审: 2020-06-19:cxin
阅读次数: 73