
Who Uses Web Search for What? And How?
课程网址: http://videolectures.net/wsdm2011_weber_wuw/  
主讲教师: Ingmar Weber
开课单位: 卡塔尔计算研究所
开课时间: 2010-08-09
课程语种: 英语
我们分析了来自Web规模的美国搜索引擎的230万匿名注册用户的大型查询日志,以便根据谁可能是(人口统计),他们要搜索的内容(查询主题)以及他们如何搜索来共同分析其在线行为(会话分析)。我们从用户提供的注册信息中检查基本的人口统计信息,并补充美国人口普查数据,分析基本会话统计数据,根据点击熵将查询分类为各种类型(导航,信息,交易),将查询分类为主题类别,并根据用户发布的查询对用户进行聚类。然后,我们根据人口统计特征和搜索行为检查生成的聚类。我们对数据的分析表明,不同人口群体在搜索行为,搜索主题以及搜索方式方面存在重大差异(例如,白人保守派是那些可能投票通过共和党的人,其中多数是白人男性,他们从事商业活动) ,家庭和园艺相关主题;婴儿潮一代通常主要对金融感兴趣,而他们的大部分会话都由与网上银行等相关的简单导航查询组成。)最后,我们研究了区域搜索差异,该差异似乎与本地行业的差异相关(例如,与赌博相关的查询在拉斯维加斯最高,在盐湖城最低;与演员相关的搜索在洛杉矶比任何其他地区高大约三倍) 。
课程简介: We analyze a large query log of 2.3 million anonymous registered users from a web-scale U.S. search engine in order to jointly analyze their on-line behavior in terms of who they might be (demographics), what they search for (query topics), and how they search (session analysis). We examine basic demographics from registration information provided by the users, augmented with U.S. census data, analyze basic session statistics, classify queries into types (navigational, informational, transactional) based on click entropy, classify queries into topic categories, and cluster users based on the queries they issued. We then examine the resulting clusters in terms of demographics and search behavior. Our analysis of the data suggests that there are important differences in search behavior across different demographic groups in terms of the topics they search for, and how they search (e.g., white conservatives are those likely to have voted republican, mostly white males, who search for business, home, and gardening related topics; Baby Boomers tend to be primarily interested in Finance and a large fraction of their sessions consist of simple navigational queries related to online banking, etc.). Finally, we examine regional search differences, which seem to correlate with differences in local industries (e.g., gambling related queries are highest in Las Vegas and lowest in Salt Lake City; searches related to actors are about three times higher in L.A. than in any other region).
关 键 词: 查询日志; 人口统计; 点击熵
课程来源: 视频讲座网
最后编审: 2019-10-10:cwx
阅读次数: 110