0


大数据集交集基数的最小方差估计器

A Minimal Variance Estimator for the Cardinality of Big Data Set Intersection
课程网址: http://videolectures.net/kdd2017_yehezkel_big_data_set/  
主讲教师: Aviv Yehezkel
开课单位: 以色列理工学院
开课时间: 2017-10-09
课程语种: 英语
中文简介:
近年来,人们对开发“流算法”以高效处理和查询连续数据流越来越感兴趣。这些算法力求提供准确的结果,同时最大限度地减少所需的存储和处理时间,但代价是输出略有不准确。感兴趣的基本查询是两个大数据流的交集大小。这个问题出现在许多不同的应用领域,例如网络监控、数据库系统、数据集成和信息检索。在本文中,我们基于最大似然(ML)方法针对此问题开发了一种新算法。我们证明该算法优于所有已知方案,并且渐进地实现了最优方差。
课程简介: In recent years there has been a growing interest in developing "streaming algorithms" for efficient processing and querying of continuous data streams. These algorithms seek to provide accurate results while minimizing the required storage and the processing time, at the price of a small inaccuracy in their output. A fundamental query of interest is the intersection size of two big data streams. This problem arises in many different application areas, such as network monitoring, database systems, data integration and information retrieval. In this paper we develop a new algorithm for this problem, based on the Maximum Likelihood (ML) method. We show that this algorithm outperforms all known schemes and that it asymptotically achieves the optimal variance.
关 键 词: 大数据; 连续数据流; 数据库系统
课程来源: 视频讲座网
数据采集: 2023-12-25:wujk
最后编审: 2023-12-25:wujk
阅读次数: 13