0


MiSoSouP:利用采样和伪维数挖掘有趣的子群

MiSoSouP: Mining Interesting Subgroups with Sampling and Pseudodimension
课程网址: http://videolectures.net/kdd2018_riondato_mining_pseudodimension/  
主讲教师: Matteo Riondato
开课单位: 二西格玛投资有限合伙
开课时间: 2018-11-23
课程语种: 英语
中文简介:
“Miso制作的汤充满了味道,让你省去了制作股票的麻烦。”–Y.Ottolenghi[19]我们介绍了MiSoSouP,这是一套算法,用于根据不同的有趣度度量,从交易数据集的随机样本中提取最有趣的子组的高质量近似值。我们描述了这些度量的一个新公式,它可以使用采样来近似它们。然后,我们讨论伪维数(统计学习理论中的一个关键概念)如何与获得最有趣子群的高质量近似所需的样本量相关。我们证明了当前问题的伪维数的上限,这导致了小样本量。我们对真实数据集的评估表明,MiSoSouP优于提供相同保证的最先进算法,并且在分析整个数据集时大大加快了子组的发现。
课程简介: “Miso makes a soup loaded with flavour that saves you the hassle of making stock.” – Y. Ottolenghi [19] We present MiSoSouP, a suite of algorithms for extracting highquality approximations of the most interesting subgroups, according to different interestingness measures, from a random sample of a transactional dataset. We describe a new formulation of these measures that makes it possible to approximate them using sampling. We then discuss how pseudodimension, a key concept from statistical learning theory, relates to the sample size needed to obtain an high-quality approximation of the most interesting subgroups. We prove an upper bound on the pseudodimension of the problem at hand, which results in small sample sizes. Our evaluation on real datasets shows that MiSoSouP outperforms state-of-the-art algorithms offering the same guarantees, and it vastly speeds up the discovery of subgroups w.r.t. analyzing the whole dataset.
关 键 词: 根据不同的有趣度度量; 近似所需的样本量相关; MiSoSouP; 分析整个数据集
课程来源: 视频讲座网
数据采集: 2023-01-16:cyh
最后编审: 2023-01-16:cyh
阅读次数: 24