0


本体驱动的研究过程抽取

Ontology Driven Extraction of Research Processes
课程网址: http://videolectures.net/iswc2018_pertsas_ontology_extraction_res...  
主讲教师: Vayianos Pertsas
开课单位: 经济与商业大学
开课时间: 2018-11-22
课程语种: 英语
中文简介:
我们解决了从出版物中自动提取代表研究过程的两个关键概念:研究活动的概念和连续活动之间的顺序关系。这些表示由学术本体论(SO)驱动,专门为记录研究过程而设计。与通常的命名实体提取任务不同,我们面对的是长度广泛可变的活动的文本描述,而成对的连续活动通常跨越不同的句子。我们使用Logistic回归、SVM和随机森林以及两阶段流水线分类器开发并试验了几个滑动窗口分类器。我们的分类器采用了特定于任务的特征,以及单词、词性和依赖嵌入,旨在利用英语研究出版物的独特特征。提取的活动和序列与出版物元数据中的其他相关信息相关联,并作为RDF三元组存储在知识库中。对数字人文、生物信息学和医学三个学科的数据集的评估显示出非常有前景的表现。
课程简介: We address the automatic extraction from publications of two key concepts for representing research processes: the concept of research activity and the sequence relation between successive activities. These representations are driven by the Scholarly Ontology (SO), specifically conceived for documenting research processes. Unlike usual named entity extraction tasks, we are facing textual descriptions of activities of widely variable length, while pairs of successive activities often span different sentences. We developed and experimented with several sliding window classifiers using Logistic Regression, SVMs, and Random Forests, as well as a two-stage pipeline classifier. Our classifiers employ task-specific features, as well as word, part-of-speech and dependency embeddings, engineered to exploit distinctive traits of research publication written in English. The extracted activities and sequences are associated with other relevant information from publication metadata and stored as RDF triples in a knowledge base. Evaluation on datasets from three disciplines, Digital Humanities, Bioinformatics, and Medicine, shows very promising performance.
关 键 词: 自动提取代表研究过程; 学术本体论; RDF三元组存储; 数据集的评估显示
课程来源: 视频讲座网
数据采集: 2023-01-07:cyh
最后编审: 2023-01-07:cyh
阅读次数: 22