可能是最好的项目集Probably the Best Itemsets |
|
课程网址: | http://videolectures.net/kdd2010_tatti_pbi/ |
主讲教师: | Nikolaj Tatti |
开课单位: | 安特卫普大学 |
开课时间: | 2010-10-01 |
课程语种: | 英语 |
中文简介: | 当前项集挖掘的主要挑战之一是发现一小组高质量的项集。在本文中,我们提出了一种新的通用方法来衡量项集的质量。该方法以贝叶斯统计为基础,单调递减,可以有效地发现所有有趣的项集。该度量是通过连接统计模型和项集集合来定义的。这使我们可以对单个项目集进行评分,并根据它们在基于数据的随机模型中出现的概率进行评分。作为这个框架的一个具体例子,我们使用指数模型。这类模型具有许多理想的特性。最重要的是,贝叶斯模型选择中的奥卡姆剃刀为模式爆炸提供了防御。由于一般指数模型在实践中不可行,我们使用可分解模型;度量可解的大子类。对于分数的实际计算,我们使用 MCMC 方法从后验分布中采样模型。我们方法的实验证明了该度量在实践中是有效的,并为合成和现实世界的数据生成了可解释且富有洞察力的项集。 |
课程简介: | One of the main current challenges in itemset mining is to discover a small set of high-quality itemsets. In this paper we propose a new and general approach for measuring the quality of itemsets. The method is solidly founded in Bayesian statistics and decreases monotonically, allowing for efficient discovery of all interesting itemsets. The measure is defined by connecting statistical models and collections of itemsets. This allows us to score individual itemsets with the probability of them occuring in random models built on the data. As a concrete example of this framework we use exponential models. This class of models possesses many desirable properties. Most importantly, Occam's razor in Bayesian model selection provides a defence for the pattern explosion. As general exponential models are infeasible in practice, we use decomposable models; a large sub-class for which the measure is solvable. For the actual computation of the score we sample models from the posterior distribution using an MCMC approach. Experimentation on our method demonstrates the measure works in practice and results in interpretable and insightful itemsets for both synthetic and real-world data. |
关 键 词: | 项集挖掘; 贝叶斯统计; MCMC 方法 |
课程来源: | 视频讲座网 |
数据采集: | 2021-08-11:nkq |
最后编审: | 2021-08-28:nkq |
阅读次数: | 44 |