0


B位minwise散列估计三的相似性

b-Bit Minwise Hashing for Estimating Three-Way Similarities
课程网址: http://videolectures.net/nips2010_li_bmh/  
主讲教师: Ping Li
开课单位: 康奈尔大学
开课时间: 2011-03-25
课程语种: 英语
中文简介:
计算双向和多向集相似性是一个基本问题。本研究的重点是使用b位minwise散列估计3向相似性(Jaccard相似性)。虽然传统的minwise散列方法使用64位存储每个散列值,但b位minwise散列仅存储最低b位(其中b> = 2用于3路)。从先前关于双向相似性的工作中对3向相似性的扩展在技术上是非平凡的。我们开发出精确且非常复杂的精确估计器;我们建议使用适用于稀疏数据的简化估算器。我们的分析表明,$ b $ -bit minwise散列通常可以实现3路相似度的给定估计精度所需的存储空间的10到25倍的改进。
课程简介: Computing two-way and multi-way set similarities is a fundamental problem. This study focuses on estimating 3-way resemblance (Jaccard similarity) using b-bit minwise hashing. While traditional minwise hashing methods store each hashed value using 64 bits, b-bit minwise hashing only stores the lowest b bits (where b>= 2 for 3-way). The extension to 3-way similarity from the prior work on 2-way similarity is technically non-trivial. We develop the precise estimator which is accurate and very complicated; and we recommend a much simplified estimator suitable for sparse data. Our analysis shows that $b$-bit minwise hashing can normally achieve a 10 to 25-fold improvement in the storage space required for a given estimator accuracy of the 3-way resemblance.
关 键 词: 计算双向; 散列估计; 相似性
课程来源: 视频讲座网
最后编审: 2020-06-28:yumf
阅读次数: 62