0


特征的关系模式

Characteristic Relational Patterns
课程网址: http://videolectures.net/kdd09_koopman_crp/  
主讲教师: Arne Koopman
开课单位: 荷兰乌得勒支大学
开课时间: 2009-09-14
课程语种: 英语
中文简介:
关系数据挖掘的研究有两个主要方向:寻找关系数据库的全局模型和在数据库中发现局部关系模式。虽然关系模式详细地显示了属性值是如何共存的,但是它们的大量数据阻碍了它们在数据分析中的使用。另一方面,全局模型只提供不同表及其属性如何相互关联的摘要,缺乏本地级别的详细信息。本文介绍了一种结合两个方向的正属性的新方法:它使用一组模式对完整的数据库进行详细的描述。更具体地说,我们使用了一种丰富的模式语言,并展示了如何通过这种模式对数据库进行编码。然后,基于mdlprinciple,新的rdb-krimp算法选择了一组模式,允许对数据库进行最简洁的编码。这个集合,即代码表,是对数据库的局部关系模式的简洁描述。我们表明,从数据库大小和本地关系模式的数量来看,结果集非常小:最多可减少4个数量级。
课程简介: Research in relational data mining has two major directions: finding global models of a relational database and the discovery of local relational patterns within a database. While relational patterns show how attribute values co-occur in detail, their huge numbers hamper their usage in data analysis. Global models, on the other hand, only provide a summary of how different tables and their attributes relate to each other, lacking detail of what is going on at the local level. In this paper we introduce a new approach that combines the positive properties of both directions: it provides a detailed description of the complete database using a small set of patterns. More in particular, we utilise a rich pattern language and show how a database can be encoded by such patterns. Then, based on the MDLprinciple, the novel RDB-KRIMP algorithm selects the set of patterns that allows for the most succinct encoding of the database. This set, the code table, is a compact description of the database in terms of local relational patterns. We show that this resulting set is very small, both in terms of database size and in number of its local relational patterns: a reduction of up to 4 orders of magnitude is attained.
关 键 词: 关系数据库; 数据分析; 模式语言; 模式编码
课程来源: 视频讲座网
最后编审: 2019-12-21:lxf
阅读次数: 49