
Estimating Rates of Rare Events at Multiple Resolutions
课程网址: http://videolectures.net/kdd07_chakrabarti_erore/  
主讲教师: Deepayan Chakrabarti
开课单位: 卡内基梅隆大学
开课时间: 2007-08-15
课程语种: 英语
课程简介: We consider the problem of estimating occurrence rates of rare events for extremely sparse data, using pre-existing hierarchies to perform inference at multiple resolutions. In particular, we focus on the problem of estimating click rates for (webpage, advertisement) pairs (called impressions) where both the pages and the ads are classified into hierarchies that capture broad contextual information at different levels of granularity. Typically the click rates are low and the coverage of the hierarchies is sparse. To overcome these difficulties we devise a sampling method whereby we analyze a specially chosen sample of pages in the training set, and then estimate click rates using a two-stage model. The first stage imputes the number of (webpage, ad) pairs at all resolutions of the hierarchy to adjust for the sampling bias. The second stage estimates click rates at all resolutions after incorporating correlations among sibling nodes through a tree-structured Markov model. Both models are scalable and suited to large scale data mining applications. On a real-world dataset consisting of 1/2 billion impressions, we demonstrate that even with 95% negative (non-clicked)events in the training set, our method can effectively discriminate extremely rare events in terms of  heir click propensity.
关 键 词: 层次结构; 极稀疏数据; 马尔可夫模型
课程来源: 视频讲座网
最后编审: 2019-05-08:lxf
阅读次数: 28