0


选择性标签问题:在不可观测的情况下评估算法预测

The Selective Labels Problem: Evaluating Algorithmic Predictions in the Presence of Unobservables
课程网址: http://videolectures.net/kdd2017_lakkaraju_selective_labels_probl...  
主讲教师: Himabindu Lakkaraju
开课单位: 视频讲座网
开课时间: 2017-10-09
课程语种: 英语
中文简介:
评估机器是否能改善人类的表现是机器学习的核心问题之一。然而,在许多领域,数据被选择性地标记为{\em},因为观察到的结果本身就是人类决策者现有选择的结果。例如,在司法保释决定的背景下,我们观察一个被告是否只有在真人法官决定保释被告时才会缺席出庭的结果。用这种类型的偏差来比较人类和机器在数据上的表现会导致错误的估计和错误的结论。在这里,我们提出了一个新的框架来评估选择性标记数据的预测模型的性能。我们开发了一种评估方法,对未测量混杂因素(不可观察的)的存在具有鲁棒性。我们提出了一个度量标准,允许我们评估任何给定的黑盒预测模型的有效性,并以人类决策者的表现为基准。我们还开发了一种称为\emph{contraction}的方法,它允许我们通过利用人类决策者的异质性来计算这个度量,而不诉诸反事实推断。在跨越医疗保健、保险和刑事司法等不同领域的真实世界数据集上的实验结果证明了我们的评估指标在比较人类决策和机器预测方面的效用。在合成数据上的实验也表明,我们的收缩技术产生了我们的评价指标的准确估计。
课程简介: Evaluating whether machines improve on human performance is one of the central questions of machine learning. However, there are many domains where the data is {em selectively labeled} in the sense that the observed outcomes are themselves a consequence of the existing choices of the human decision-makers. For instance, in the context of judicial bail decisions, we observe the outcome of whether a defendant fails to return for their court appearance only if the human judge decides to release the defendant on bail. Comparing the performance of humans and machines on data with this type of bias can lead to erroneous estimates and wrong conclusions. Here we propose a novel framework for evaluating the performance of predictive models on selectively labeled data. We develop an evaluation methodology that is robust to the presence of unmeasured confounders (unobservables). We propose a metric that allows us to evaluate the effectiveness of any given black-box predictive model and benchmark it against the performance of human decision-makers. We also develop an approach called emph{contraction} which allows us to compute this metric without resorting to counterfactual inference by exploiting the heterogeneity of human decision-makers. Experimental results on real world datasets spanning diverse domains such as health care, insurance, and criminal justice demonstrate the utility of our evaluation metric in comparing human decisions and machine predictions. Experiments on synthetic data also show that our contraction technique produces accurate estimates of our evaluation metric.
关 键 词: 评估机器; 机器学习; 标记数据
课程来源: 视频讲座网
数据采集: 2022-11-30:chenxin01
最后编审: 2022-11-30:chenxin01
阅读次数: 30