0


一种联合分段和标注的中文词法分析方法

A Joint Segmenting and Labeling Approach for Chinese Lexical Analysis
课程网址: http://videolectures.net/ecmlpkdd08_wang_ajsa/  
主讲教师: Xihong Wu; Jiazhong Nie; Xinhao Wang; Dingsheng Luo
开课单位: 北京大学
开课时间: 信息不详。欢迎您在右侧留言补充。
课程语种: 英语
中文简介:
本文介绍了一种将分词和标注子任务串联起来进行汉语词汇分析的方法,包括分词、命名实体识别和部分语音标注。与传统的流水线方式不同,级联子任务是同时进行的,因此可以避免错误传播,并且可以在多级子任务之间共享信息。该方法采用加权有限状态传感器。在WFST的统一框架内,每个子任务的模型被表示出来,然后组合成一个单独的模型。因此,通过一次译码,可以达到多级过程的联合最优输出。实验结果表明,该联合处理方法在管道形式上明显优于传统方法。
课程简介: This paper introduces an approach which jointly performs a cascade of segmentation and labeling subtasks for Chinese lexical analysis, including word segmentation, named entity recognition and part-of-speech tagging. Unlike the traditional pipeline manner, the cascaded subtasks are conducted in a single step simultaneously, therefore error propagation could be avoided and the information could be shared among multi-level subtasks. In this approach, Weighted Finite State Transducers (WFSTs) are adopted. Within the unified framework of WFSTs, the models for each subtask are represented and then combined into a single one. Thereby, through one-pass decoding the joint optimal outputs for multi-level processes will be reached. The experimental results show the effectiveness of the presented joint processing approach, which significantly outperforms the traditional method in pipeline style.
关 键 词: 机器学习; 词法分析; 传感器
课程来源: 视频讲座网
最后编审: 2019-12-05:cwx
阅读次数: 38