0


在数值域亚群的发现

On Subgroup Discovery in Numerical Domains
课程网址: http://videolectures.net/ecmlpkdd09_grosskreutz_sdnd/  
主讲教师: Henrik Grosskreutz
开课单位: 弗劳恩霍夫协会
开课时间: 2009-10-20
课程语种: 英语
中文简介:
子组发现是一项知识发现任务,旨在查找具有高度通用性和分布异常性的群体的子群。虽然过去已经提出了几种子组发现算法,但它们关注的是具有名义属性的数据库,或者利用离散化去除数值属性。在本文中,我们说明了为什么用名义属性替换数值属性会导致次优结果。此后,我们提出了一种新的子组发现算法,该算法通过利用相关数字子组描述之间的界限来修剪搜索空间的大部分。相同的算法也可以应用于序数属性。在一个实验部分中,我们展示了使用我们的新修剪方案可以获得巨大的性能增益,而只需要考虑数字属性的几个分裂点。
课程简介: Subgroup discovery is a Knowledge Discovery task that aims at finding subgroups of a population with high generality and distributional unusualness. While several subgroup discovery algorithms have been presented in the past, they focus on databases with nominal attributes or make use of discretization to get rid of the numerical attributes. In this paper, we illustrate why the replacement of numerical attributes by nominal attributes can result in suboptimal results. Thereafter, we present a new subgroup discovery algorithm that prunes large parts of the search space by exploiting bounds between related numerical subgroup descriptions. The same algorithm can also be applied to ordinal attributes. In an experimental section, we show that the use of our new pruning scheme results in a huge performance gain when more that just a few split-points are considered for the numerical attributes.
关 键 词: 名义属性数值属性; 剪枝算法; 数值属性
课程来源: 视频讲座网
最后编审: 2020-11-13:yumf
阅读次数: 43