端点对动态时间扭曲的影响On the Effect of Endpoints on Dynamic Time Warping |
|
课程网址: | https://videolectures.net/videos/kdd2016_silva_time_warping |
主讲教师: | Diego Furtado Silva |
开课单位: | KDD 2016研讨会 |
开课时间: | 2025-02-04 |
课程语种: | 英语 |
中文简介: | 虽然大多数数据类型都有大量的分类算法,但人们越来越接受时间序列的独特特性意味着最近邻分类器和动态时间扭曲(DTW)的组合在从医学到天文学再到环境传感器的许多领域都极具竞争力。虽然近年来在提高DTW的效率和有效性方面取得了重大进展,但在这项工作中,我们证明了一个被低估的问题会显著降低DTW在现实部署中的准确性。这个问题可能没有引起非常活跃的时间序列研究界的注意,因为它依赖于静态的高度人为的基准数据集,而不是问题往往表现出来的现实世界动态数据集。本质上,问题在于DTW对扭曲的同名不变性仅适用于所比较的两个时间序列的主要“主体”。然而,对于时间序列的“头部”和“尾部”,DTW算法不提供扭曲不变性。其结果是,时间序列开始或结束时的微小差异(可能是间接的,也可能只是“裁剪”不当的结果)往往会对估计的相似性产生不成比例的影响,从而产生不正确的分类。在这项工作中,我们证明了这种效果是真实的,并降低了算法的性能。我们进一步表明,我们可以通过对DTW算法进行微妙的重新设计来解决这个问题,并且我们可以为我们引入的额外参数学习一个合适的设置。我们进一步证明,我们的泛化对所有使DTW易于处理大型数据集的优化都很友好。 |
课程简介: | While there exist a plethora of classification algorithms for most data types, there is an increasing acceptance that the unique properties of time series mean that the combination of nearest neighbor classifiers and Dynamic Time Warping (DTW) is very competitive across a host of domains, from medicine to astronomy to environmental sensors. While there has been significant progress in improving the efficiency and effectiveness of DTW in recent years, in this work we demonstrate that an underappreciated issue can significantly degrade the accuracy of DTW in real-world deployments. This issue has probably escaped the attention of the very active time series research community because of its reliance on static highly contrived benchmark datasets, rather than real world dynamic datasets where the problem tends to manifest itself. In essence, the issue is that DTW’s eponymous invariance to warping is only true for the main “body” of the two time series being compared. However, for the “head” and “tail” of the time series, the DTW algorithm affords no warping invariance. The effect of this is that tiny differences at the beginning or end of the time series (which may be either consequential or simply the result of poor “cropping”) will tend to contribute disproportionally to the estimated similarity, producing incorrect classifications. In this work, we show that this effect is real, and reduces the performance of the algorithm. We further show that we can fix the issue with a subtle redesign of the DTW algorithm, and that we can learn an appropriate setting for the extra parameter we introduced. We further demonstrate that our generalization is amiable to all the optimizations that make DTW tractable for large datasets. |
关 键 词: | 动态时间; 分类算法; 基准数据集 |
课程来源: | 视频讲座网 |
数据采集: | 2025-04-06:liyq |
最后编审: | 2025-04-06:liyq |
阅读次数: | 11 |