0


基于语言模型的本体论新概念布局框架

A Language Model based Framework for New Concept Placement in Ontologies
课程网址: https://videolectures.net/eswc2024_dong_concept_placement/  
主讲教师: Han Dong
开课单位: 2024年上海世博会
开课时间: 2024-06-14
课程语种: 英语
中文简介:
我们研究了使用语言模型将从文本中提取的新概念插入到本体中的任务。我们探索了一种有三个步骤的方法:边缘搜索,即找到一组要插入的候选位置(即概念之间的包含),边缘形成和丰富,利用本体结构来产生和增强边缘候选,以及边缘选择,最终定位要放置的边。在所有步骤中,我们建议利用神经方法,其中我们应用基于嵌入的方法和对比学习与预训练语言模型(PLM)(如BERT)进行边缘搜索,并采用基于BERT微调的多标签边缘交叉编码器和大型语言模型(LLM)(如GPT系列、FLAN-T5和Llama 2)进行边缘选择。我们评估了使用SNOMED CT本体和MedMentions实体链接基准创建的最新数据集的方法。我们框架中的最佳设置使用微调的PLM进行搜索,并使用多标签交叉编码器进行选择。LLM的零样本提示仍然不足以完成任务,我们建议对LLM进行可解释的指令调整,以提高性能。我们的研究显示了PLM的优势,并强调了LLM的令人鼓舞的表现,这激励了未来的研究。
课程简介: We investigate the task of inserting new concepts extracted from texts into an ontology using language models. We explore an approach with three steps: edge search which is to find a set of candidate locations to insert (i.e., subsumptions between concepts), edge formation and enrichment which leverages the ontological structure to produce and enhance the edge candidates, and edge selection which eventually locates the edge to be placed into. In all steps, we propose to leverage neural methods, where we apply embedding-based methods and contrastive learning with Pre-trained Language Models (PLMs) such as BERT for edge search, and adapt a BERT fine-tuning-based multi-label Edge-Cross-encoder, and Large Language Models (LLMs) such as GPT series, FLAN-T5, and Llama 2, for edge selection. We evaluate the methods on recent datasets created using the SNOMED CT ontology and the MedMentions entity linking benchmark. The best settings in our framework use fine-tuned PLM for search and a multi-label Cross-encoder for selection. Zero-shot prompting of LLMs is still not adequate for the task, and we propose explainable instruction tuning of LLMs for improved performance. Our study shows the advantages of PLMs and highlights the encouraging performance of LLMs that motivates future studies.
关 键 词: 语言模型; 本体论新概念; 布局框架
课程来源: 视频讲座网
数据采集: 2024-08-05:liyq
最后编审: 2024-08-05:liyq
阅读次数: 15