0


大容量流数据的在线细化

Online Thinning for High Volume Streaming Data
课程网址: http://videolectures.net/kdd2017_hunt_streaming_data/  
主讲教师: Xin J. Hunt
开课单位: 统计分析系统研究所有限公司
开课时间: 2017-12-01
课程语种: 英语
中文简介:
在一个大规模流数据无处不在的时代,数据的可用性远远超过了人类分析专家的能力。在许多情况下,这些数据要么被丢弃,要么未经处理就存储在数据中心中。本文提出了一种在线数据细化方法,该方法对大规模流数据集进行筛选,以保留独特的、异常的或显著的元素,以便及时进行专家分析。该方法的核心是基于动态低秩高斯混合模型的在线异常检测方法。具体来说,与高斯分量相关的高维协方差矩阵与低秩模型相关。根据这个模型,大多数观测值位于子空间的并集附近。低秩建模减轻了高维数据异常检测的维数问题,子空间聚类和子空间跟踪的最新进展使所提出的方法能够适应动态环境。由此产生的算法是可扩展的,高效的,并能够实时操作。广域运动图像实验验证了该方法的有效性。
课程简介: In an era of ubiquitous large-scale streaming data, the availability of data far exceeds the capacity of expert human analysts. In many settings, such data is either discarded or stored unprocessed in data centers. This paper proposes a method of online data thinning, in which large-scale streaming datasets are winnowed to preserve unique, anomalous, or salient elements for timely expert analysis. At the heart of this proposed approach is an online anomaly detection method based on dynamic, low-rank Gaussian mixture models. Specifically, the high-dimensional covariance matrices associated with the Gaussian components are associated with low-rank models. According to this model, most observations lie near a union of subspaces. The low-rank modeling mitigates the curse of dimensionality associated with anomaly detection for high-dimensional data, and recent advances in subspace clustering and subspace tracking allow the proposed method to adapt to dynamic environments. The resulting algorithms are scalable, efficient, and are capable of operating in real time. Experiments on wide-area motion imagery illustrate the efficacy of the proposed approach.
关 键 词: 规模数据; 数据细化; 高斯分量
课程来源: 视频讲座网
数据采集: 2023-03-20:chenxin01
最后编审: 2023-05-17:chenxin01
阅读次数: 14