0


公制空间中的怪异物品

Spooky Stuff in Metric Space
课程网址: http://videolectures.net/solomon_caruana_ssms/  
主讲教师: Rich Caruana
开课单位: 微软公司
开课时间: 2007-02-25
课程语种: 英语
中文简介:
决策树是可理解的,但是它们的性能是否足以让您使用它们?是SVM取代了神经网络,还是神经网络最适合回归,而SVM最适合分类?提升可以最大化支持SVM的利润,但是提升可以与SVM竞争吗?而且,如果它确实竞争了,那么如理论所暗示的那样,增强弱模型还是增强强模型更好?套袋比提升更简单套袋相对于提升有多好?布雷曼说,随机森林比套袋好,也比助推好。他说的对吗?那么像逻辑回归,KNN和朴素贝叶斯这样的老朋友呢?应该将它们降级为历史书籍,还是仍然占据重要位置?在这次演讲中,我们根据九个标准比较了十种监督学习方法的性能:准确性,F分数,提升,精确度/召回收支平衡点,ROC下面积,平均精确度,平方误差,交叉熵和概率校准。结果表明,没有一种学习方法可以完成所有工作,但是可以对“某些方法”进行“修复”,以便它们在所有性能指标上都表现出色。特别是,我们展示了如何从最大余量方法(例如SVM)中获得最佳概率,以及如何通过Platt方法和等渗回归进行增强。然后,我们描述了一种新的集成方法,该方法结合了从这十种学习方法中选择的模型以产生更好的性能。尽管这些合奏的效果非常好,但对于许多应用程序来说却过于复杂。我们将描述我们为解决此问题正在做的事情。最后,如果时间允许,我们将讨论这9个绩效指标如何相互关联,以及您可能应该(或不应)使用其中的哪个。 \\在本演讲中,我将简要介绍学习方法和性能指标,以帮助非机器学习专家学习该课程。
课程简介: Decision trees are intelligible, but do they perform well enough that you should use them? Have SVMs replaced neural nets, or are neural nets still best for regression, and SVMs best for classification? Boosting maximizes margins similar to SVMs, but can boosting compete with SVMs? And if it does compete, is it better to boost weak models, as theory might suggest, or to boost stronger models? Bagging is simpler than boosting -- how well does bagging stack up against boosting? Breiman said Random Forests are better than bagging and as good as boosting. Was he right? And what about old friends like logistic regression, KNN, and naive bayes? Should they be relegated to the history books, or do they still fill important niches? \\ In this talk we compare the performance of ten supervised learning methods on nine criteria: Accuracy, F-score, Lift, Precision/Recall Break-Even Point, Area under the ROC, Average Precision, Squared Error, Cross-Entropy, and Probability Calibration. The results show that no one learning method does it all, but some methods can be "repaired" so that they do very well across all performance metrics. In particular, we show how to obtain the best probabilities from max margin methods such as SVMs and boosting via Platt's Method and isotonic regression. We then describe a new ensemble method that combines select models from these ten learning methods to yield much better performance. Although these ensembles perform extremely well, they are too complex for many applications. We'll describe what we're doing to try to fix that. Finally, if time permits, we'll discuss how the nine performance metrics relate to each other, and which of them you probably should (or shouldn't) use. \\ During this talk I'll briefly describe the learning methods and performance metrics to help make the lecture accessible to non-specialists in machine learning.
关 键 词: 决策树; 神经网络; 逻辑回归
课程来源: 视频讲座网
最后编审: 2020-06-08:吴雨秋(课程编辑志愿者)
阅读次数: 77