基于转导学习的远程监控在电子病历中识别药物不良反应Distant Supervision with Transductive Learning for Adverse Drug Reaction Identification from Electronic Medical Records |
|
课程网址: | http://videolectures.net/kdd2017_taewijit_electronic_medical_reco... |
主讲教师: | Siriwon Taewijit |
开课单位: | 日本高等科学技术研究院 |
开课时间: | 2017-12-01 |
课程语种: | 英语 |
中文简介: | 从大规模临床文献中提取药品不良反应(ADR)信息和发现知识是非常有用和需要的过程。这项任务的两个主要困难是缺乏领域专家来标记例子和难以处理的非结构化临床文本。尽管先前的大多数工作都是通过对前者应用半监督学习,对后者应用基于单词的方法来进行的,但它们面临着获取初始标记数据的复杂性和对自然语言结构化序列的无知。在本研究中,我们提出了通过远程监督来自动标记数据,其中利用知识库为文本中的每个药物事件对分配实体级关系标签,然后使用模式来表征ADR关系。采用期望最大化的多实例学习方法估计模型参数。该方法利用转导学习在训练时迭代地重新分配未知药物事件对的概率。通过对50998个放电摘要的实验进行调查,我们通过改变大量参数来评估我们的方法,即模式类型、模式加权模型以及未标记数据的关系的初始和迭代权重。基于评估,我们提出的方法优于基于词的NB-EM (iEM)、MILR和TSVM特征,F1分数分别提高了11.3%、9.3%和6.5%。 |
课程简介: | Information extraction and knowledge discovery regarding adverse drug reaction (ADR) from large-scale clinical texts are very useful and needy processes. Two major difficulties of this task are the lack of domain experts for labeling examples and intractable processing of unstructured clinical texts. Even though most previous works have been conducted on these issues by applying semisupervised learning for the former and a word-based approach for the latter, they face with complexity in an acquisition of initial labeled data and ignorance of structured sequence of natural language. In this study, we propose automatic data labeling by distant supervision where knowledge bases are exploited to assign an entity-level relation label for each drug-event pair in texts, and then, we use patterns for characterizing ADR relation. The multiple-instance learning with expectation-maximization method is employed to estimate model parameters. The method applies transductive learning to iteratively reassign a probability of unknown drug-event pair at the training time. By investigating experiments with 50,998 discharge summaries, we evaluate our method by varying large number of parameters, that is, pattern types, pattern-weighting models, and initial and iterative weightings of relations for unlabeled data. Based on evaluations, our proposed method outperforms the word-based feature for NB-EM (iEM), MILR, and TSVM with F1 score of 11.3%, 9.3%, and 6.5% improvement, respectively. |
关 键 词: | 不良反应; 临床文本; 监督学习 |
课程来源: | 视频讲座网 |
数据采集: | 2023-04-16:chenxin01 |
最后编审: | 2023-05-21:chenxin01 |
阅读次数: | 30 |