首页概率论
0


可能最好的集

Probably the Best Itemsets
课程网址: http://videolectures.net/kdd2010_tatti_pbi/  
主讲教师: Nikolaj Tatti
开课单位: 安特卫普大学
开课时间: 2010-10-01
课程语种: 英语
中文简介:
项目集挖掘当前面临的主要挑战之一是发现一小组高质量项目集。在本文中,我们提出了一种用于测量项目集质量的新的通用方法。该方法牢固地建立在贝叶斯统计中,并且单调递减,允许有效发现所有有趣的项集。通过连接统计模型和项集集合来定义度量。这使我们能够在构建在数据上的随机模型中对单个项目集进行评分。作为该框架的具体示例,我们使用指数模型。这类模型具有许多理想的特性。最重要的是,奥卡姆在贝叶斯模型选择中的剃刀为模式爆炸提供了防御。由于一般指数模型在实践中是不可行的,我们使用可分解模型;可以解决该度量的大型子类。对于分数的实际计算,我们使用MCMC方法从后验分布中模拟模型。对我们的方法进行的实验表明,该测量在实践中起作用,并为合成数据和现实世界数据产生可解释且富有洞察力的项目集。
课程简介: One of the main current challenges in itemset mining is to discover a small set of high-quality itemsets. In this paper we propose a new and general approach for measuring the quality of itemsets. The method is solidly founded in Bayesian statistics and decreases monotonically, allowing for efficient discovery of all interesting itemsets. The measure is defined by connecting statistical models and collections of itemsets. This allows us to score individual itemsets with the probability of them occuring in random models built on the data. As a concrete example of this framework we use exponential models. This class of models possesses many desirable properties. Most importantly, Occam's razor in Bayesian model selection provides a defence for the pattern explosion. As general exponential models are infeasible in practice, we use decomposable models; a large sub-class for which the measure is solvable. For the actual computation of the score we sample models from the posterior distribution using an MCMC approach. Experimentation on our method demonstrates the measure works in practice and results in interpretable and insightful itemsets for both synthetic and real-world data.
关 键 词: 高品质项目集; 贝叶斯统计; 单调减小; 数据集
课程来源: 视频讲座网
最后编审: 2020-10-01:yumf
阅读次数: 32