0


名字的种族分类从公开来源

Name-Ethnicity Classification from Open Sources
课程网址: http://videolectures.net/kdd09_male_nec/  
主讲教师: Swapna Male
开课单位: 纽约州立大学
开课时间: 2009-09-14
课程语种: 英语
中文简介:
从名称中识别种族问题具有多种重要应用,包括生物医学研究,人口统计学研究和市场营销。在这里,我们报告了种族分类器的发展,其中所有训练数据都是从公共的,非机密的(因此有些不可靠)来源中提取的。我们的分类器使用隐马尔可夫模型(HMM)和决策树将名称分类为13个文化/种族群体,其个体群体准确度与早期二进制(例如,西班牙语/非西班牙语)分类器相当。我们将这个分类器应用于来自大型新闻语料库的超过2000万个名称,确定了特定文化/种族群体代表性的有趣时空趋势。
课程简介: The problem of ethnicity identification from names has a variety of important applications, including biomedical research, demographic studies, and marketing. Here we report on the development of an ethnicity classifier where all training data is extracted from public, non-confidential (and hence somewhat unreliable) sources. Our classifier uses hidden Markov models (HMMs) and decision trees to classify names into 13 cultural/ethnic groups with individual group accuracy comparable accuracy to earlier binary (e.g., Spanish/non-Spanish) classifiers. We have applied this classifier to over 20 million names from a large-scale news corpus, identifying interesting temporal and spatial trends on the representation of particular cultural/ethnic groups.
关 键 词: 种族身份认同; 隐马尔可夫模型(HMM); 民族代表性
课程来源: 视频讲座网
最后编审: 2021-09-15:zyk
阅读次数: 44