0


势和最小的监督引导的限制

Potential and limitations of minimally supervised botstrapping
课程网址: http://videolectures.net/solomon_uszkoreit_pal/  
主讲教师: Hans Uszkoreit
开课单位: 萨尔兰大学
开课时间: 2007-11-12
课程语种: 英语
中文简介:
关系实例的检测是从非结构化文本数据中提取结构化信息和逐步将文本转化为半结构化信息的核心功能。关于分类器或检测语法的获取, 现有的方法分为三大类: * 通过智力人力习获得的分类器检测 * 通过分类器/习得语法进行检测通过监督学习 * 通过分类语法/通过无监督或最小监督的学习获得的语法。在演讲中, 我们将提供这些方法的示例, 并总结它们各自的优势和 ¬ 优势。我们认为, 不同的关系检测任务需要不同的方法, 甚至不同的方法组合。一个有经验和理论上有吸引力的研究领域是学习从种子中提取规则。一些最低限度监督的方法已经被调查, 在最低限度的努力下取得了相当体面的结果。学习算法不依赖于域。基于种子的引导方法在理论上是令人愉快的, 因为学习的模式和规则是模块化和透明的。它们可以在新的应用程序中重用, 并且可以成为 (计算) 语言调查的宝贵资源。我们将解释几种引导方法, 其中大多数从种子的模式开始, 有些从事件种子开始。我们还将描述我们自己的盗窃罪方法 (xu 等人, 2007年), 这是 xu 等人 (2006年) 的根本延伸。在这种方法中, 学习从一小组 n-ary 关系实例开始, 作为 "种子" 以便自动 ¬ ti ¬ 从解析的数据中学习模式规则, 从而提取 n-ary 关系及其投影的新实例。在经历了一段富有成效的熟练试验和错误之后, 现在似乎已经到了对关系检测的替代方法进行更系统调查的合适时机。除了召回表和竞争方法的精确值之外, 我们还迫切需要解释, 即因果理论, 解释替代技术在域和文本数据属性方面的优点和缺点。我们描述了一种基于实验证据和解释见解的理论。所倡导的科学方法将能够对具体任务作出最佳选择, 有效减少今后调查有希望的方法组合的数量, 并指导寻找全新的方法。
课程简介: The detection of relation instances is a central functionality for the extraction of structured information from unstructured textual data and for gradually turning texts into semi-structured information. With respect to the acquisition of the classifiers or detection grammars, the existing approaches fall in three large categories: * detection by classifiers/grammars acquired through intellectual human labor * detection by classifiers/grammars acquired through supervised learning * detection by classifiers/grammars acquired through unsupervised or minimally supervised learning. In the talk we will provide examples for the classes of approaches and summarize their respective advantages and disad¬vantages. We will argue that different relation detection tasks require different methods or even different combinations of methods. One empirically promising and theoretically attractive line of research is the learning of extraction rules from seeds. Several minimally supervised approaches have been investigated that accomplished rather decent results with a minimum of effort. The learning algorithms are not domain dependent. The seed-based bootstrapping approaches are theoretically pleasing because the learned patterns and rules are modular and transparent. They can be reused in new applications and they can be a valuable resource for (computational) linguistic investigation. We will explain several bootstrapping methods, most of them starting with patterns as seeds and some with event seeds. We will also describe our own approach of bootstrapping (Xu et al. 2007) a radical extension of Xu et al. (2006). In this approach, learning starts from a small set of n-ary relation instances as "seeds" in order to auto-ma¬ti¬cally learn pattern rules from parsed data, which then can extract new instances of the n-ary relation and its projections. After a fruitful period of skillful trial and error, there seems to be the right time now for a more systematic investigation of the alternative approaches to relation detection. In addition to tables of recall and precision values for competing methods, we urgently need explanations, i.e. causal theories explaining the virtues and shortcomings of alternative techniques with respect to properties of domains and text data. We describe one theory of this kind based on experimental evidence and explanatory insight. The advocated scientific methodology will enable optimal choices for specific tasks, effectively reduce the number of promising combinations of methods for future investigation, and guide the search for completely new approaches.
关 键 词: 计算机科学; 机器学习; 监督学习
课程来源: 视频讲座网
最后编审: 2020-06-20:zyk
阅读次数: 43