OpenTag:从产品配置文件中打开属性值提取OpenTag: Open Attribute Value Extraction from Product Profiles |
|
课程网址: | http://videolectures.net/kdd2018_mukherjee_opentag_extraction/ |
主讲教师: | Subhabrata Mukherjee |
开课单位: | 马克斯·普朗克信息学研究所 |
开课时间: | 2018-11-23 |
课程语种: | 英语 |
中文简介: | 缺失属性值的提取是为了从自由文本输入中找到描述感兴趣属性的值。过去关于提取缺失属性值的大多数相关工作都采用封闭世界假设,使用预先已知的可能值集,或者使用值字典和手工制作的特性。我们如何发现以前从未见过的新属性值?我们能在有限的人工注释或监督下做到这一点吗?我们在产品目录中研究这个问题,这些目录通常缺少许多感兴趣属性的值。在这项工作中,我们利用产品概要信息(如标题和描述)来发现缺少的产品属性值。我们为这个提取问题开发了一个新的深度标记模型OpenTag,其贡献如下:(1)我们将该问题形式化为序列标记任务,并提出了一个联合模型,该模型利用递归神经网络(特别是双向LSTM)来捕获上下文和语义,并利用条件随机场(CRF)来增强标记一致性;(2) 我们开发了一种新的注意力机制,为模型的决策提供可解释的解释;(3) 我们提出了一种新的采样策略,探索主动学习以减轻人类注释的负担。OpenTag不像以前的作品那样使用任何字典或手工制作的功能。在不同领域的真实数据集中进行的大量实验表明,OpenTag采用我们的主动学习策略,从多达150个注释样本中发现了新的属性值(减少了3.3倍的注释工作量),F分数高达83%,优于最先进的模型。 |
课程简介: | Extraction of missing attribute values is to find values describing an attribute of interest from a free text input. Most past related work on extraction of missing attribute values work with a closed world assumption with the possible set of values known beforehand, or use dictionaries of values and hand-crafted features. How can we discover new attribute values that we have never seen before? Can we do this with limited human annotation or supervision? We study this problem in the context of product catalogs that often have missing values for many attributes of interest. In this work, we leverage product profile information such as titles and descriptions to discover missing values of product attributes. We develop a novel deep tagging model OpenTag for this extraction problem with the following contributions: (1) we formalize the problem as a sequence tagging task, and propose a joint model exploiting recurrent neural networks (specifically, bidirectional LSTM) to capture context and semantics, and Conditional Random Fields (CRF) to enforce tagging consistency; (2) we develop a novel attention mechanism to provide interpretable explanation for our model’s decisions; (3) we propose a novel sampling strategy exploring active learning to reduce the burden of human annotation. OpenTag does not use any dictionary or hand-crafted features as in prior works. Extensive experiments in real-life datasets in different domains show that OpenTag with our active learning strategy discovers new attribute values from as few as 150 annotated samples (reduction in 3.3x amount of annotation effort) with a high F-score of 83%, outperforming state-of-the-art models. |
关 键 词: | 缺失属性值的提取; 描述感兴趣属性的值; 利用产品概要信息; 利用递归神经网络; 新的深度标记模型OpenTag |
课程来源: | 视频讲座网 |
数据采集: | 2023-03-15:cyh |
最后编审: | 2023-03-15:cyh |
阅读次数: | 31 |