0


基于零不变测度的顶相关模式的有效挖掘

Efficient Mining of Top Correlated Patterns Based on Null-Invariant Measures
课程网址: http://videolectures.net/ecmlpkdd2011_han_measures/  
主讲教师: Jiawei Han
开课单位: 伊利诺伊大学
开课时间: 2011-11-29
课程语种: 英语
中文简介:
从事务性数据库中挖掘强相关性通常会导致比挖掘关联规则更有意义的结果。在这种挖掘中,零(事务)不变性是相关度量的一个重要属性。不幸的是,一些有用的零不变测度,如kulczynski和cosine,即使在非常不平衡的情况下也能发现相关性,缺乏(反单调性)。因此,它们只能作为后评估步骤应用于频繁项目集。对于大数据集和低支持,这种方法在计算上是禁止的。本文给出了所有已知的空不变测度的新性质。基于这些特性,我们开发了高效的剪枝技术,并设计了一种更好的先验算法来直接挖掘强相关模式。我们同时开发了算法的阈值有界和顶部k变化,其中,当事先不知道最佳相关阈值时,使用顶部k,并为用户提供对输出大小的控制。我们在来自不同应用领域的真实数据集上测试更好的信息,使用余弦作为空不变相关度量的示例。我们表明,nicominer比基于支持的方法更优于一个数量级的方法,并且它对于发现低支持项集中的顶级相关性非常有用。
课程简介: Mining strong correlations from transactional databases often leads to more meaningful results than mining association rules. In such mining, null (transaction)-invariance is an important property of the correlation measures. Unfortunately, some useful null-invariant measures such as Kulczynski and Cosine, which can discover correlations even for the very unbalanced cases, lack the (anti)-monotonicity property. Thus, they could only be applied to frequent itemsets as the post-evaluation step. For large datasets and for low supports, this approach is computationally prohibitive. This paper presents new properties for all known null-invariant measures. Based on these properties, we develop efficient pruning techniques and design the Apriori-like algorithm NICOMINER for mining strongly correlated patterns directly. We develop both the threshold-bounded and the top-k variations of the algorithm, where top-k is used when the optimal correlation threshold is not known in advance and to give user control over the output size. We test NICOMINER on real-life datasets from different application domains, using Cosine as an example of the null-invariant correlation measure. We show that NICOMINER outperforms support-based approach more than an order of magnitude, and that it is very useful for discovering top correlations in itemsets with low support.
关 键 词: 计算机科学; 数据挖掘; 频繁模式
课程来源: 视频讲座网
最后编审: 2020-07-13:yumf
阅读次数: 51