
A Preliminary Evaluation of Word Representations for Named-Entity Recognition
对于线性模型的命名实体识别(NER)系统,我们使用不同的词表示作为词特征。这项工作是一个更大的实证调查的一部分,评估不同NLP任务中不同的词表示。我们评估了Brown Clusters、Collobert和Weston(2008)Embeddings以及HLBL(mnih&hinton,2009)Embeddings of Words。所有这三种表示都提高了NER的准确性,棕色集群比两种嵌入提供了更大的改进,HLBL嵌入比Collobert和Weston(2008)嵌入更多。我们还讨论了使用嵌入作为特征的一些实际问题。棕色集群比嵌入更简单,因为它们需要较少的超参数调整。
课程简介: We use different word representations as word features for a named-entity recognition (NER) system with a linear model. This work is part of a larger empirical survey, evaluating different word representations on different NLP tasks. We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words. All three representations improve accuracy on NER, with the Brown clusters providing a larger improvement than the two embeddings, and the HLBL embeddings more than the Collobert and Weston (2008) embeddings. We also discuss some of the practical issues in using embeddings as features. Brown clusters are simpler than embeddings because they require less hyperparameter tuning.
关 键 词: 线性模型; 布朗集群; 超参数调整
