
TextTruth: An Unsupervised Approach to Discover Trustworthy Information from Multi‑Sourced Text Data
课程网址: http://videolectures.net/kdd2018_ma_texttruth_data/  
主讲教师: Fenglong Ma
开课单位: 布法罗大学计算机科学与工程系
开课时间: 2018-11-23
课程语种: 英语
课程简介: Truth discovery has attracted increasingly more attention due to its ability to distill trustworthy information from noisy multi-sourced data without any supervision. However, most existing truth discovery methods are designed for structured data, and cannot meet the strong need to extract trustworthy information from raw text data as text data has its unique characteristics. The major challenges of inferring true information on text data stem from the multifactorial property of text answers (i.e., an answer may contain multiple key factors) and the diversity of word usages (i.e., different words may have the same semantic meaning). To tackle these challenges, in this paper, we propose a novel truth discovery method, named “TextTruth”, which jointly groups the keywords extracted from the answers of a specific question into multiple interpretable factors, and infers the trustworthiness of both answer factors and answer providers. After that, the answers to each question can be ranked based on the estimated trustworthiness of factors. The proposed method works in an unsupervised manner, and thus can be applied to various application scenarios that involve text data. Experiments on three real-world datasets show that the proposed TextTruth model can accurately select trustworthy answers, even when these answers are formed by multiple factors.
关 键 词: 多源数据中提取; 结构化数据; 单词用法的多样性; 真实世界数据集
课程来源: 视频讲座网
数据采集: 2023-01-27:cyh
最后编审: 2023-01-27:cyh
阅读次数: 28