0


基于上下文的强盗新闻推荐算法的无偏离线评价

Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
课程网址: http://videolectures.net/wsdm2011_li_uoe/  
主讲教师: Lihong Li
开课单位: 微软公司
开课时间: 2011-08-09
课程语种: 英语
中文简介:
上下文盗版者算法在Digg、Yahoo!以及新闻推荐。离线评估新算法在这些应用程序中的有效性对于保护在线用户体验至关重要,但由于它们的“部分标签”性质,因此非常具有挑战性。通常的做法是创建一个模拟器,该模拟器模拟当前问题的在线环境,然后针对该模拟器运行算法。然而,创建模拟器本身往往是困难的,建模偏差往往是不可避免的引入。本文介绍了一种上下文强盗算法评估的回放方法。与基于模拟器的方法不同,我们的方法完全是数据驱动的,非常容易适应不同的应用程序。更重要的是,我们的方法可以提供无偏倚的评估。我们对从Yahoo!头版与我们的理论结果吻合得很好。此外,通过对几种上下文强盗算法离线重播和在线桶评估的比较,验证了离线评估方法的准确性和有效性。
课程简介: Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news recommendation in general. Offline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature. Common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating simulator itself is often difficult and modelling bias is usually unavoidably introduced. In this paper, we introduce a replay methodology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbiased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show accuracy and effectiveness of our offline evaluation method.
关 键 词: 模拟器; 算法; 计算机科学
课程来源: 视频讲座网
最后编审: 2020-06-08:吴雨秋(课程编辑志愿者)
阅读次数: 283