0


基于语言先验的视觉关系检测

Visual Relationship Detection with Language Priors
课程网址: http://videolectures.net/eccv2016_krishna_relationship_detection/  
主讲教师: Ranjay Krishna
开课单位: 斯坦福大学
开课时间: 2016-10-24
课程语种: 英语
中文简介:
视觉关系捕捉图像中对象对之间各种各样的相互作用(例如。“骑自行车的人”和“推自行车的人”)。因此,可能的关系集非常大,很难为所有可能的关系获得足够的训练示例。由于这种限制,以前关于视觉关系检测的工作集中在预测少数几个关系上。虽然大多数关系并不频繁,但它们的对象(例如。“人”和“自行车”)和谓词(例如。“骑”和“推”)独立出现的频率更高。我们提出了一个模型,使用这种洞察力单独训练对象和谓词的视觉模型,然后将它们组合在一起,以预测每张图像的多个关系。我们通过利用语义词嵌入的语言先验来改进先前的工作,以微调预测关系的可能性。我们的模型可以从几个例子中预测数千种类型的关系。此外,我们将预测关系中的对象定位为图像中的边界框。我们进一步证明,理解关系可以改善基于内容的图像检索。
课程简介: Visual relationships capture a wide variety of interactions between pairs of objects in images (e.g. "man riding bicycle" and "man pushing bicycle"). Consequently, the set of possible relationships is extremely large and it is difficult to obtain sufficient training examples for all possible relationships. Because of this limitation, previous work on visual relationship detection has concentrated on predicting only a handful of relationships. Though most relationships are infrequent, their objects (e.g. "man" and "bicycle") and predicates (e.g. "riding" and "pushing") independently occur more frequently. We propose a model that uses this insight to train visual models for objects and predicates individually and later combines them together to predict multiple relationships per image. We improve on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship. Our model can scale to predict thousands of types of relationships from a few examples. Additionally, we localize the objects in the predicted relationships as bounding boxes in the image. We further demonstrate that understanding relationships can improve content based image retrieval.
关 键 词: 视觉关系; 语言先验; 图像检索; 视觉模型
课程来源: 视频讲座网
数据采集: 2023-03-22:chenxin01
最后编审: 2023-05-22:chenxin01
阅读次数: 31