0


海报:知识作为无监督分类不确定性的约束:词性标注研究

Poster: Knowledge as a Constraint on Uncertainty for Unsupervised Classification: A Study in Part-of-Speech Tagging
课程网址: http://videolectures.net/icml08_murray_kcu/  
主讲教师: Thomas J. Murray
开课单位: 南加利福尼亚大学
开课时间: 2008-08-11
课程语种: 英语
中文简介:
本文评估了在无人监督的训练和分类过程中使用先验知识来限制或偏向分类者的选择。关注模型决策的不确定性中的影响,我们量化知识源的贡献作为给定输入语料库的标签分布的条件熵的减少。允许我们比较不带注释数据的不同知识集,我们发现标签熵对于部分语音标记任务的标准隐马尔可夫模型(HMM)的最终性能具有高度预测性。我们的研究结果表明,除了更稳定和有效的训练收敛之外,即使作为标签约束而整合的基本知识水平也会对分类准确性产生相当大的影响。最后,对于需要解释模型内部类并将其映射到定义标签集的情况,我们发现,对于约束模型,大大减少了对带注释数据进行质量分配的要求。
课程简介: This paper evaluates the use of prior knowledge to limit or bias the choices of a classifer during otherwise unsupervised training and classifcation. Focusing on effects in the uncertainty of the model's decisions, we quantify the contributions of the knowledge source as a reduction in the conditional entropy of the label distribution given the input corpus. Allowing us to compare diffrent sets of knowledge without annotated data, we find that label entropy is highly predictive of final performance for a standard Hidden Markov Model (HMM) on the task of part-of-speech tagging. Our results show too that even basic levels of knowledge, integrated as labeling constraints, have considerable effect on classification accuracy, in addition to more stable and effcient training convergence. Finally, for cases where the model's internal classes need to be interpreted and mapped to a de- sired label set, we find that, for constrained models, the requirements for annotated data to make quality assignments are greatly reduced.
关 键 词: 无人监督; 量化知识源; 标签熵
课程来源: 视频讲座网
最后编审: 2019-04-19:lxf
阅读次数: 74