名字的种族分类从公开来源Name-Ethnicity Classification from Open Sources |
|
课程网址: | http://videolectures.net/kdd09_male_nec/ |
主讲教师: | Swapna Male |
开课单位: | 纽约州立大学 |
开课时间: | 2009-09-14 |
课程语种: | 英语 |
中文简介: | 从名称中识别种族问题具有多种重要应用,包括生物医学研究,人口统计学研究和市场营销。在这里,我们报告了种族分类器的发展,其中所有训练数据都是从公共的,非机密的(因此有些不可靠)来源中提取的。我们的分类器使用隐马尔可夫模型(HMM)和决策树将名称分类为13个文化/种族群体,其个体群体准确度与早期二进制(例如,西班牙语/非西班牙语)分类器相当。我们将这个分类器应用于来自大型新闻语料库的超过2000万个名称,确定了特定文化/种族群体代表性的有趣时空趋势。 |
课程简介: | The problem of ethnicity identification from names has a variety of important applications, including biomedical research, demographic studies, and marketing. Here we report on the development of an ethnicity classifier where all training data is extracted from public, non-confidential (and hence somewhat unreliable) sources. Our classifier uses hidden Markov models (HMMs) and decision trees to classify names into 13 cultural/ethnic groups with individual group accuracy comparable accuracy to earlier binary (e.g., Spanish/non-Spanish) classifiers. We have applied this classifier to over 20 million names from a large-scale news corpus, identifying interesting temporal and spatial trends on the representation of particular cultural/ethnic groups. |
关 键 词: | 种族身份认同; 隐马尔可夫模型(HMM); 民族代表性 |
课程来源: | 视频讲座网 |
最后编审: | 2021-09-15:zyk |
阅读次数: | 50 |