B位minwise散列估计三的相似性b-Bit Minwise Hashing for Estimating Three-Way Similarities |
|
课程网址: | http://videolectures.net/nips2010_li_bmh/ |
主讲教师: | Ping Li |
开课单位: | 康奈尔大学 |
开课时间: | 2011-03-25 |
课程语种: | 英语 |
中文简介: | 计算双向和多向集相似性是一个基本问题。本研究的重点是使用b位minwise散列估计3向相似性(Jaccard相似性)。虽然传统的minwise散列方法使用64位存储每个散列值,但b位minwise散列仅存储最低b位(其中b> = 2用于3路)。从先前关于双向相似性的工作中对3向相似性的扩展在技术上是非平凡的。我们开发出精确且非常复杂的精确估计器;我们建议使用适用于稀疏数据的简化估算器。我们的分析表明,$ b $ -bit minwise散列通常可以实现3路相似度的给定估计精度所需的存储空间的10到25倍的改进。 |
课程简介: | Computing two-way and multi-way set similarities is a fundamental problem. This study focuses on estimating 3-way resemblance (Jaccard similarity) using b-bit minwise hashing. While traditional minwise hashing methods store each hashed value using 64 bits, b-bit minwise hashing only stores the lowest b bits (where b>= 2 for 3-way). The extension to 3-way similarity from the prior work on 2-way similarity is technically non-trivial. We develop the precise estimator which is accurate and very complicated; and we recommend a much simplified estimator suitable for sparse data. Our analysis shows that $b$-bit minwise hashing can normally achieve a 10 to 25-fold improvement in the storage space required for a given estimator accuracy of the 3-way resemblance. |
关 键 词: | 计算双向; 散列估计; 相似性 |
课程来源: | 视频讲座网 |
最后编审: | 2020-06-28:yumf |
阅读次数: | 62 |