0


通过信息理论元集群检测的独特的时间片段

Detection of Unique Temporal Segments by Information Theoretic Meta-clustering
课程网址: http://videolectures.net/kdd09_ando_dutsitmc/  
主讲教师: Shin Ando
开课单位: 群马大学
开课时间: 2009-09-14
课程语种: 英语
中文简介:
时间数据分析的核心挑战是获得有关其潜在动态的知识。在本文中, 我们讨论了噪声, 随机过程的观察, 并试图检测与其动力学中的不一致和不规则相关的时间段。许多传统的异常检测方法基于模式之间的距离来检测异常, 并且通常只对异常的生成过程提供有限的直觉。同时, 基于模型的方法很难识别一组小的、聚集的异常。我们提出了信息理论元聚类 (itmc), 一种基于模型的聚类的形式化, 它是基于有损数据压缩理论的。itmc 确定了一个 "唯一" 群集, 其分布与整个数据集存在显著差异。此外, itmc 采用了一个正则化项, 该正则化术语源于对高压缩率的偏好, 这对检测精度至关重要。为了进行经验评价, 我们将 itmc 应用于两个时间异常检测任务。数据集来自于涉及异构和不一致动态的生成过程。通过与基线方法的比较表明, 该算法能检测出具有显著高精度和召回率的不规则状态段。
课程简介: The central challenge in temporal data analysis is to obtain knowledge about its underlying dynamics. In this paper, we address the observation of noisy, stochastic processes and attempt to detect temporal segments that are related to inconsistencies and irregularities in its dynamics. Many conventional anomaly detection approaches detect anomalies based on the distance between patterns, and often provide only limited intuition about the generative process of the anomalies. Meanwhile, model-based approaches have difficulty in identifying a small, clustered set of anomalies. We propose Information-theoretic Meta-clustering (ITMC), a formalization of model-based clustering principled by the theory of lossy data compression. ITMC identifies a `unique' cluster whose distribution diverges significantly from the entire dataset. Furthermore, ITMC employs a regularization term derived from the preference for high compression rate, which is critical to the precision of detection. For empirical evaluation, we apply ITMC to two temporal anomaly detection tasks. Datasets are taken from generative processes involving heterogeneous and inconsistent dynamics. A comparison to baseline methods shows that the proposed algorithm detects segments from irregular states with significantly high precision and recall.
关 键 词: 计算机科学; 数据挖掘; 时间数据
课程来源: 视频讲座网
最后编审: 2020-06-24:yumf
阅读次数: 46