0


常见的子结构的多个图形高斯模型学习

Common Substructure Learning of Multiple Graphical Gaussian Models
课程网址: http://videolectures.net/ecmlpkdd2011_hara_common/  
主讲教师: Satoshi Hara
开课单位: 大阪大学
开课时间: 2011-10-03
课程语种: 英语
中文简介:
学习数据生成的潜在机制在科学和工程领域中具有重要意义。在数据中查找变量之间的依赖关系结构是一种可行的方法,并且是数据挖掘中的一项重要任务。在本文中,我们专注于学习由多个数据集共享的依赖子结构。在许多情况下,数据的性质由于周围条件的变化或多个数据集上的非静止机制而变化。但是,我们也可以假设变化仅部分发生,变量之间的某些关系保持不变。此外,我们可以预期,多个数据集的这种共性与底层机制的不变性密切相关。例如,工程系统中的错误通常是由子系统中的故障引起的,其他部分保持健康。在这种情况下,尽管在传感器值中观察到异常,但是在错误发生之前和之后仍然通过一些稳定的依赖性结构捕获健康子系统的潜在不变性。我们提出了一种结构学习算法,用于在图形高斯模型(GGM)的情况下找到这种不变性。所提出的方法基于块坐标下降优化,其中子问题可以通过现有的Lasso算法和连续二次背包问题有效地求解。我们通过数值模拟以及从城市循环燃料消耗和汽车传感器中的异常检测分析中提取的真实世界数据集的应用来确认我们的方法的有效性。
课程简介: Learning underlying mechanisms of data generation is of great interest in the scientific and engineering fields amongst others. Finding dependency structures among variables in the data is one possible approach for the purpose, and is an important task in data mining. In this paper, we focus on learning dependency substructures shared by multiple datasets. In many scenarios, the nature of data varies due to a change in the surrounding conditions or non-stationary mechanisms over the multiple datasets. However, we can also assume that the change occurs only partially and some relations between variables remain unchanged. Moreover, we can expect that such commonness over the multiple datasets is closely related to the invariance of the underlying mechanism. For example, errors in engineering systems are usually caused by faults in the sub-systems with the other parts remaining healthy. In such situations, though anomalies are observed in sensor values, the underlying invariance of the healthy sub-systems is still captured by some steady dependency structures before and after the onset of the error. We propose a structure learning algorithm to find such invariances in the case of Graphical Gaussian Models (GGM). The proposed method is based on a block coordinate descent optimization, where subproblems can be solved efficiently by existing algorithms for Lasso and the continuous quadratic knapsack problem. We confirm the validity of our approach through numerical simulations and also in applications with real world datasets extracted from the analysis of city-cycle fuel consumption and anomaly detection in car sensors.
关 键 词: 数据生成机制的学习; 数据集的共性; 结构学习算法
课程来源: 视频讲座网
最后编审: 2020-06-24:yumf
阅读次数: 51