0


肮脏的一打:在线控制实验中十二个常见的度量解释陷阱

A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments
课程网址: http://videolectures.net/kdd2017_gupta_dirty_dozen/  
主讲教师: Somit Gupta
开课单位: 微软
开课时间: 2017-10-09
课程语种: 英语
中文简介:
在线控制实验(例如A/B测试)现在经常用于指导产品开发和加速软件创新。产品创意作为科学假设进行评估,并在网站、移动应用程序、桌面应用程序、服务和操作系统中进行测试。运行受控实验的组织面临的一个关键挑战是提出一组正确的度量标准。然而,仅有良好的指标是不够的。在我们与微软的许多团队进行了数千次实验的经验中,我们一次又一次地观察到,对度量运动的错误解释可能会导致对实验结果的错误结论,如果部署的话,可能会给业务造成数百万美元的损失。受史提夫古德曼的12个p值误解[4]的启发,在本文中,我们分享了我们在实验中反复观察到的12个常见的度量解释陷阱。我们用一个来自真实实验的令人困惑的例子来说明每个陷阱,并描述了可以用来检测和避免陷阱的过程、度量设计原则和指导方针。通过这篇论文,我们的目标是提高实验者对度量解释问题的意识,从而提高实验结果的质量和可信度,以及更好的数据驱动决策。
课程简介: Online controlled experiments (e.g., A/B tests) are now regularly used to guide product development and accelerate innovation in software. Product ideas are evaluated as scientific hypotheses, and tested in web sites, mobile applications, desktop applications, services, and operating systems. One of the key challenges for organizations that run controlled experiments is to come up with the right set of metrics. Having good metrics, however, is not enough. In our experience of running thousands of experiments with many teams across Microsoft, we observed again and again how incorrect interpretations of metric movements may lead to wrong conclusions about the experiment’s outcome, which if deployed could hurt the business by millions of dollars. Inspired by Steven Goodman’s twelve p-value misconceptions [4], in this paper, we share twelve common metric interpretation pitfalls which we observed repeatedly in our experiments. We illustrate each pitfall with a puzzling example from a real experiment, and describe processes, metric design principles, and guidelines that can be used to detect and avoid the pitfall. With this paper, we aim to increase the experimenters’ awareness of metric interpretation issues, leading to improved quality and trustworthiness of experiment results and better data-driven decisions.
关 键 词: 在线控制; 产品创意; 科学假设
课程来源: 视频讲座网
数据采集: 2023-03-27:chenxin01
最后编审: 2023-05-22:chenxin01
阅读次数: 24