面膜：音频指纹识别的鲁棒局部特征][MASK: Robust Local Features for Audio Fingerprinting]_MOOC(慕课)境外开放课程

首页 → 工程与技术科学

面膜：音频指纹识别的鲁棒局部特征 MASK: Robust Local Features for Audio Fingerprinting


课程网址:	http://videolectures.net/icme2012_anguera_mask/
主讲教师:	Xavier Anguera Miro
开课单位:	艾尔莎公司
开课时间:	2012-09-18
课程语种:	英语
中文简介:	本文提出了一种新颖的本地音频指纹，称为MASK（Masked Audio Spectral Keypoints），它可以有效地编码音频文件中存在的声学信息，并区分相同声学文档和其他无关文档的变换版本。指纹被设计成对原始信号的强烈变换具有弹性，并且可用于通用音频，包括音乐和语音。其主要特征是其局部性，二进制编码，鲁棒性和紧凑性。所提出的音频指纹对在给定信号中的主要频谱峰值中选择的突出点周围的局部频谱能量进行编码。这种编码是通过以精心设计的掩模为中心来完成的，该掩模定义了光谱图的平均能量相互比较的区域。从每次比较中，我们根据哪个区域具有更多能量获得单个位，并将所有位分组为最终二进制指纹。另外，指纹还存储使用Mel滤波器组量化的每个峰值的频率。指纹的长度仅由所使用的比较区域的数量来定义，并且可以适应任何特定应用的要求。另外，也可以容易地修改每秒编码的突出点的数量。在实验部分，我们通过使用NIST-TRECVID基准评估数据集，通过将其与众所周知的指纹进行比较，显示这种指纹找到匹配片段的适合性，获得NDCR评分高达26％的相对改善。
课程简介:	This paper presents a novel local audio fingerprint called MASK (Masked Audio Spectral Keypoints) that can effectively encode the acoustic information existent in audio documents and discriminate between transformed versions of the same acoustic documents and other unrelated documents. The fingerprint has been designed to be resilient to strong transformations of the original signal and to be usable for generic audio, including music and speech. Its main characteristics are its locality, binary encoding, robustness and compactness. The proposed audio fingerprint encodes the local spectral energies around salient points selected among the main spectral peaks in a given signal. Such encoding is done by centering on each point a carefully designed mask defining regions of the spectrogram whose average energies are compared with each other. From each comparison we obtain a single bit depending on which region has more energy, and group all bits into a final binary fingerprint. In addition, the fingerprint also stores the frequency of each peak, quantized using a Mel filterbank. The length of the fingerprint is solely defined by the number of compared regions being used, and can be adapted to the requirements of any particular application. In addition, the number of salient points encoded per second can be also easily modified. In the experimental section we show the suitability of such fingerprint to find matching segments by using the NIST-TRECVID benchmarking evaluation datasets by comparing it with a well known fingerprint, obtaining up to 26% relative improvement in NDCR score.
关键词:	本地音频指纹; 掩模; 二进制指纹; 基准评估数据集
课程来源:	视频讲座网
最后编审:	2020-06-23：liqy
阅读次数:	150

服务热线：0574-88229129
电子邮件：info_lib@nbt.edu.cn
信息服务：图书馆305室
系统研发：图书馆303室

图书馆学生服务群：437507696
图书馆教工服务群：1038697975
QQ在线咨询
2013-2026 © 浙大宁波理工学院图书馆