0


拖着沉重的脚步的总结

Stumping along a summary
课程网址: http://videolectures.net/explorationexploitation2011_salperwyck_u...  
主讲教师: Christophe Salperwyck; Tanguy Urvoy
开课单位: 法国电信研究
开课时间: 信息不详。欢迎您在右侧留言补充。
课程语种: 英语
中文简介:
我们在“勘探与开发”挑战中的竞争方法基于三个层次。第一层为连续和名义数据提供数据流的在线摘要。连续数据处理使用格林沃尔德和卡纳在线分位数摘要,这为固定内存大小提供了错误保证。使用基于哈希的计数结构汇总名义数据。使用这些技术,我们成功地构建了一个具有较小内存占用的准确的流摘要。第二层使用摘要构建预测器。我们研究了从简单决策树桩到深多元决策树桩的几种树木。树桩被证明是非常稳定和有效的。但另一方面,树木的逐步展开似乎从长远来看改善了模型。最后一层,我们探讨了几种组合策略:在线装袋、指数加权、线性Ranker等。我们观察到预测因子的表达能力与组合策略的力量之间存在权衡,但大多数策略难以调整,我们回到了简单的平均。从我们的实验来看,无论是探索的需要还是点击的稀缺性,都使得对非常稳定的模型的需求更加迫切。
课程简介: The methods we used to compete in the « Exploration & Exploitation » challenge are based on three layers. The first layer provides an online summary of the data stream for continuous and nominal data. Continuous data are handled using the Greenwald and Khanna online quantile summary which provides error guarantees for a fixed memory size. Nominal data are summarized with a hash-based counting structure. With these techniques we managed to build an accurate stream summary with a small memory footprint. The second layer uses the summary to build predictors. We explored several kinds of trees from simple decision stumps to deep multivariate ones. The stumps proved to be remarkably stable and efficient. But on the other hand, a progressive unfolding of the trees seemed to improve the model on the long run. For the last layer, we explored several combination strategies: online bagging, exponential weighting, linear ranker, etc. We observed a tradeoff between the expressiveness of the predictors and the power of the combination strategy but most strategies being difficult to tune, we went back to a simple averaging. It seems, from our experiments, that both the need for exploration and the click scarcity sharpens the need for very stable models.
关 键 词: 预测因子; 决策树桩; 指数加权; 组合策略
课程来源: 视频讲座网
最后编审: 2019-12-01:cwx
阅读次数: 29