学习方法与案例研究的实证比较Empirical Comparisons of Learning Methods & Case Studies |
|
课程网址: | http://videolectures.net/mlss05us_caruana_eclmc/ |
主讲教师: | Rich Caruana |
开课单位: | 微软公司 |
开课时间: | 2007-02-25 |
课程语种: | 英语 |
中文简介: | 决策树可能是可理解的,但他们可以削减芥末?是否已将SVM替换为神经网络,或神经网络仍然是回归的最佳选择,而SVM最适合分类?提升最大化边际与SVM一样,但可以提升与SVM的竞争力?正如理论所暗示的那样,推动弱模型或推动更强大的模型更好吗?套袋比提升更容易,所以套袋与提升相比有多好?对于像决策树这样的低偏差高方差方法,套袋应该是最好的,所以如果我们将神经网络等较低的方差模型包装起来,它们就像袋装树一样好吗?如果我们用类固醇包装,即切换到随机森林会发生什么?那些像k-最近邻居这样的老朋友呢?他们应该把它们放到牧场吗?在这个讲座中,我将比较各种流行的机器学习方法在九个性能标准上的表现:准确度,F分数,提升,精确/召回收支平衡?点,ROC下的面积,平均精度,平方误差,交叉熵和概率校准。我将展示虽然没有一种学习方法可以完成所有这些,但可以“修复”其中一些,以便它们在所有指标上都能很好地完成。然后我将描述NACHOS,这是一种新的集合方法,通过构建这些其他学习方法,可以做得更好。最后,我将讨论九个性能指标如何相互关联,并查看一些案例研究,以说明为每个问题使用正确的指标的重要性。 |
课程简介: | Decision trees may be intelligible, but can they cut the mustard? Have SVMs replaced neural nets, or are neural nets still best for regression, and SVMs best for classification? Boosting maximizes a margin much like SVMs, but can boosting compete with SVMs? And is it better to boost weak models, as theory suggests, or to boost stronger models? Bagging is much easier than boosting, so how well does bagging stack up against boosting? Bagging is supposed to be best with low bias high variance methods like decision trees, so if we bag lower variance models like neural nets are they as good as bagged trees? What happens if we do bagging with steroids, i.e. switch to random forests? And what about old friends like k-nearest neighbor — should they just be put out to pasture? In this lecture I'll compare the performance of a variety of popular machine learning methods on nine performance criteria: Accuracy, F-score, Lift, Precision/Recall Break-Even? Point, Area under the ROC, Average Precision, Squared Error, Cross-Entropy?, and Probabilistic Calibration. I'll show that while no one learning method does it all, it is possible to "repair" some of them so that they do well on all metrics. I'll then describe NACHOS, a new ensemble method that does even better by by building on top of these other learning methods. Finally, I'll discuss how the nine performance metrics relate to each other, and look at a few case-studies to show why it is important to use the right metric for each problem. |
关 键 词: | 决策树; 神经网络; 概率校准 |
课程来源: | 视频讲座网 |
最后编审: | 2020-04-30:chenxin |
阅读次数: | 62 |