0


发现可靠的近似函数依赖关系

Discovering Reliable Approximate Functional Dependencies
课程网址: http://videolectures.net/kdd2017_mandros_functional_dependencies/  
主讲教师: Panagiotis Mandros
开课单位: 马克斯普朗克信息学研究所
开课时间: 2017-10-09
课程语种: 英语
中文简介:
给定一个数据库和一个感兴趣的目标属性,我们如何判断目标对数据中任何其他属性集是否存在函数或近似函数依赖?我们如何才能在不影响样本大小或维度的情况下可靠地衡量这种依赖性的强度?而且,我们如何有效地发现最优或 α 近似的 top-k 依赖关系?这些正是我们在本文中回答的问题。 由于我们希望对依赖性的形式保持不可知,因此我们采用信息论方法,并构建一个可以有效计算的可靠的偏差校正分数。此外,我们给出了这个分数的有效乐观估计,通过它,我们第一次可以从数据中挖掘近似函数依赖关系,并保证最优性。实证评估表明,导出的分数实现了良好的方差权衡偏差,可以在有效的发现算法中使用,并且确实发现了有意义的依赖关系。最重要的是,它在数据稀疏的情况下仍然可靠。
课程简介: Given a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to sample size or dimensionality, measure the strength of such a dependence? And, how can we efficiently discover the optimal or α-approximate top-k dependencies? These are exactly the questions we answer in this paper. As we want to be agnostic on the form of the dependence, we adopt an information-theoretic approach, and construct a reliable, bias correcting score that can be efficiently computed. Moreover, we give an effective optimistic estimator of this score, by which for the first time we can mine the approximate functional dependencies from data with guarantees of optimality. Empirical evaluation shows that the derived score achieves a good bias for variance trade-off, can be used within an efficient discovery algorithm, and indeed discovers meaningful dependencies. Most important, it remains reliable in the face of data sparsity.
关 键 词: 函数依赖关系; 数据挖掘; 数据科学
课程来源: 视频讲座网
数据采集: 2023-12-26:wujk
最后编审: 2023-12-26:wujk
阅读次数: 14