
Challenges in Building Large-Scale Information Retrieval Systems
课程网址: http://videolectures.net/wsdm09_dean_cblirs/  
主讲教师: Jeffrey Dean
开课单位: 谷歌公司
开课时间: 2009-03-12
课程语种: 英语
建立和运营世界各地数亿人使用的大规模信息检索系统提出了许多有趣的挑战。设计此类系统需要在多个维度上进行复杂的设计权衡,包括(a)每秒必须处理的用户查询的数量以及这些请求的响应延迟,(b)搜索的各种语料库的数量和大小(c)文档更新或添加到语料库的延迟和频率,以及(d)用于检索的排名算法的质量和成本。 在本次演讲中,我将讨论Google硬件基础架构和信息检索系统的发展以及所有这些方面不断增长的需求所带来的一些设计挑战。我还将描述在构建这些检索系统时如何使用各种分布式系统基础结构。 最后,我将描述该领域未来的一些挑战和开放的研究问题。
课程简介: Building and operating large-scale information retrieval systems used by hundreds of millions of people around the world provides a number of interesting challenges. Designing such systems requires making complex design tradeoffs in a number of dimensions, including (a) the number of user queries that must be handled per second and the response latency to these requests, (b) the number and size of various corpora that are searched, (c) the latency and frequency with which documents are updated or added to the corpora, and (d) the quality and cost of the ranking algorithms that are used for retrieval. In this talk I'll discuss the evolution of Google's hardware infrastructure and information retrieval systems and some of the design challenges that arise from ever-increasing demands in all of these dimensions. I'll also describe how we use various pieces of distributed systems infrastructure when building these retrieval systems. Finally, I'll describe some future challenges and open research problems in this area.
关 键 词: 文本挖掘; 信息检索; 搜索引擎
课程来源: 视频讲座网
最后编审: 2020-07-06:wuyq
阅读次数: 52