0


可可:编码参数的孤立点检测的成本

CoCo: Coding Cost for Parameter-free Outlier Detection
课程网址: http://videolectures.net/kdd09_muller_cococcpfod/  
主讲教师: Nikola Müller
开课单位: 马克斯普朗克研究所
开课时间: 2009-09-14
课程语种: 英语
中文简介:
我们如何自动发现数据集中所有未完成的观察结果?这个问题出现在各种各样的应用中,例如:在经济,生物和医学方面。现有的离群值检测方法存在以下一个或多个缺点:许多方法的结果很大程度上取决于在没有数据背景知识的情况下非常难以估计的合适的参数设置,例如:最小簇大小或所需异常值的数量。许多方法隐含地假设高斯或均匀分布的数据,和/或它们的结果难以解释。为了解决这些问题,我们提出了CoCo,一种无参数离群检测技术。我们技术的基本思想是将异常值检测与数据压缩相关联:异常值是在给定数据集的情况下无法有效压缩的对象。为了避免假设某种数据分布,CoCo依赖于将指数功率分布与独立分量相结合的非常通用的数据模型。我们基于最小描述长度的原理以及用于异常检测的新算法来定义直观的异常因子。对合成和现实世界数据的广泛实验评估证明了我们的技术的好处。 ** //免责声明:// VideoLectures.Net强调此视频的长度不是标准的,因为讲座礼堂中提供的声音质量条件较差而被切断。**
课程简介: How can we automatically spot all outstanding observations in a data set? This question arises in a large variety of applications, e.g. in economy, biology and medicine. Existing approaches to outlier detection suffer from one or more of the following drawbacks: The results of many methods strongly depend on suitable parameter settings being very difficult to estimate without background knowledge on the data, e.g. the minimum cluster size or the number of desired outliers. Many methods implicitly assume Gaussian or uniformly distributed data, and/or their result is difficult to interpret. To cope with these problems, we propose CoCo, a technique for parameter-free outlier detection. The basic idea of our technique relates outlier detection to data compression: Outliers are objects which can not be effectively compressed given the data set. To avoid the assumption of a certain data distribution, CoCo relies on a very general data model combining the Exponential Power Distribution with Independent Components. We define an intuitive outlier factor based on the principle of the Minimum Description Length together with an novel algorithm for outlier detection. An extensive experimental evaluation on synthetic and real world data demonstrates the benefits of our technique. **//Disclaimer:// VideoLectures.Net emphasizes that the length of this video is not standard and was cut because of poor sound quality conditions provided in the lecture auditorium.**
关 键 词: 数据集中; 参数设置; 孤立点检测数据压缩
课程来源: 视频讲座网
最后编审: 2021-12-24:liyy
阅读次数: 66