0


大型复杂数据中的非冗余子群发现

Non-Redundant Subgroup Discovery in Large and Complex Data
课程网址: http://videolectures.net/ecmlpkdd2011_van_leeuwen_data/  
主讲教师: Matthijs van Leeuwen
开课单位: 荷兰乌得勒支大学
开课时间: 信息不详。欢迎您在右侧留言补充。
课程语种: 英语
中文简介:
对于大多数现有的发现算法来说,大而复杂的数据具有挑战性,原因有几个。首先,这些数据导致了巨大的假设空间,使得详尽的搜索不可行。第二,由于高基数的(数字)属性、相关属性等,存在许多本质上相同模式的变体。这会导致Top-K挖掘算法返回高度冗余的结果集,而忽略许多潜在的有趣结果。这些问题在子群发现及其概括、异常模型挖掘中尤为明显。为了解决这个问题,我们引入了子群集挖掘:不应该考虑单个子群,而应该考虑子群集。我们考虑了三个冗余度,并提出了相应的启发式选择策略以消除冗余度。通过将这些策略结合到波束搜索中,可以改善勘探和开发之间的平衡。实验表明,与传统的子群发现方法相比,该方法能产生更多的子群集。
课程简介: Large and complex data is challenging for most existing discovery algorithms, for several reasons. First of all, such data leads to enormous hypothesis spaces, making exhaustive search infeasible. Second, many variants of essentially the same pattern exist, due to (numeric) attributes of high cardinality, correlated attributes, and so on. This causes top-k mining algorithms to return highly redundant result sets, while ignoring many potentially interesting results. These problems are particularly apparent with Subgroup Discovery and its generalisation, Exceptional Model Mining. To address this, we introduce subgroup set mining: one should not consider individual subgroups, but sets of subgroups. We consider three degrees of redundancy, and propose corresponding heuristic selection strategies in order to eliminate redundancy. By incorporating these strategies in a beam search, the balance between exploration and exploitation is improved. Experiments clearly show that the proposed methods result in much more diverse subgroup sets than traditional Subgroup Discovery methods.
关 键 词: 模式挖掘; 启发式选择策略; 多样化子群集
课程来源: 视频讲座网
最后编审: 2019-11-17:cwx
阅读次数: 69