首页数学
   首页计算机应用
0


Web尺度图像聚类的再探讨

Web-Scale Image Clustering Revisited
课程网址: http://videolectures.net/iccv2015_kalantidis_image_clustering/  
主讲教师: Yannis Kalantidis
开课单位: 雅虎公司
开课时间: 2016-02-10
课程语种: 英语
中文简介:

文档或图像的大规模重复检测、聚类和挖掘通常通过散列进行种子检测,然后是使用快速搜索的种子生长启发式方法。有原则的聚类方法,尤其是核化和谱聚类方法,具有更高的复杂性并且难以扩展到数百万以上。在文档或图像嵌入欧几里得空间的假设下,我们重新审视了近似 k 均值变体的最新进展,并借用它们的最佳成分来引入新的逆量化 k 均值(IQ 均值)。关键的基本概念是数据点的量化和从质心到单元格的基于多索引的反向搜索。它的量化是一种散列形式,类似于种子检测,而它的更新类似于种子生长,但原则上是失真最小化。我们进一步设计了一个动态变体,能够以接近零的额外成本确定单次运行中的集群数量 k。结合强大的深度学习表示,我们在不到一小时的时间内在单台机器上实现了 1 亿张图像集合的聚类。

课程简介: Large scale duplicate detection, clustering and mining of documents or images has been conventionally treated with seed detection via hashing, followed by seed growing heuristics using fast search. Principled clustering methods, especially kernelized and spectral ones, have higher complexity and are difficult to scale above millions. Under the assumption of documents or images embedded in Euclidean space, we revisit recent advances in approximate k-means variants, and borrow their best ingredients to introduce a new one, inverted-quantized k-means(IQ-means). Key underlying concepts are quantization of data points and multi-index based inverted search from centroids to cells. Its quantization is a form of hashing and analogous to seed detection, while its updates are analogous to seed growing, yet principled in the sense of distortion minimization. We further design a dynamic variant that is able to determine the number of clusters k in a single run at nearly zero additional cost. Combined with powerful deep learned representations, we achieve clustering of a 100 million image collection on a single machine in less than one hour.
关 键 词: 失真最小化; 欧几里得空间; 深度学习
课程来源: 视频讲座网
数据采集: 2021-06-23:zyk
最后编审: 2021-06-23:zyk
阅读次数: 61