0


MultiBoost公司

MultiBoost
课程网址: http://videolectures.net/icml2010_kegl_mubu/  
主讲教师: Balázs Kégl
开课单位: 巴黎南大学
开课时间: 2010-07-20
课程语种: 英语
中文简介:
AdaBoost[Freund Schapire,1997]是过去15年中开发的最好的现成监督分类方法之一。尽管(或可能是由于?)由于其简单性和通用性,它在开放软件家族中的代表性出人意料地低。本次提交的目标是填补这一空白。 我们的实现基于AdaBoost。MH算法[Schapire Singer,1999]。它本质上是一种多类分类方法(例如,与SVM不同),并且很容易扩展到多标签或多任务分类(当一个项目可以属于多个类时)。程序包可以分为四个模块,这些模块可以根据应用程序进行或多或少的独立更改。 学习能力强。它告诉你如何提升。主要增压发动机是AdaBoost。MH,但我们也为一个研究项目实施了FilterBoost。其他可能的强学习者可能是LogitBoost和ADTrees。 基础(或薄弱)学习者。它告诉您要增强哪些功能。现在我们有两个基本的(基于特征的)基础学习者:实值特征的决策树桩和标称特征的指标。我们有两个元基础学习者:树和产品。他们可以使用任何基础学习器,并使用“经典”树结构(决策树)或使用简单基础学习器的产品构建一个通用的复杂基础学习器(自广告:增强树桩的产品是继Hinton和Salakhutdinov的深度信念网之后,MNIST上报道最好的无领域知识算法)。我们还为图像分类实现了Haar过滤器[Viola Jones,2004],这是一种元基学习器,使用“实时”计算的高维特征空间上的树桩。这是一个依赖领域的基础学习者的很好的例子,它与适当的数据结构密切相关。 数据表示。基本数据结构是带有标签向量的观测矩阵。当标签数据也是全矩阵时,我们也有多标签分类。此外,我们有观测矩阵和标签矩阵的稀疏数据表示。通常,基础学习者被实现为使用自己的数据表示(例如,稀疏树桩处理稀疏观测矩阵,或Haar滤波器处理积分图像数据表示)。 数据分析器。我们可以读取arff和svmlight格式的数据。 基础学习器/数据结构组合涵盖了大量可能的应用程序,但该软件包的主要优点是,通过实现基础学习器和数据结构接口,(对于高级用户)很容易使MultiBoost适应特定(非标准)应用程序。
课程简介: AdaBoost [Freund-Schapire, 1997] is one of the best off-the-shelf supervised classification methods developed in the last fifteen years. Despite (or perhaps due to?) its simplicity and versatility, it is suprisingly under-represented in the family of open softwares. The goal of this submission is to fill this gap. Our implementation is based on the AdaBoost.MH algorithm [Schapire-Singer, 1999]. It is an intrinsically multi-class classification method (unlike SVM for example), and it was easy to extend to multi-label or multi-task classification (when one item can belong to several classes). The program package can be divided into four modules that can be changed more-or-less independently depending on the application. The strong learner. It tells you how to boost. The main boosting engine is AdaBoost.MH, but we have also implemented FilterBoost for a research project. Other possible strong learners could be LogitBoost and ADTrees. The base (or weak) learner. It tells you what features to boost. Right now we have two basic (feature-wise) base learners: decision stumps for real-valued features and indicators for nominal features. We have two meta base learners: trees and products. They can use any base learner and construct a generic complex base learner using a "classic" tree-structure (decision trees), or using the product of simple base learners (self advertisement: boosting products of stumps is the best reported no-domain-knowledge algorithm on MNIST after Hinton and Salakhutdinov's deep belief nets). We have also implemented Haar filters [Viola-Jones, 2004] for image classification, a meta base learner that uses stumps over a high dimensional feature space computed "on the fly". It is a nice example of a domain dependent base learner that works hand-in-hand with its appropriate data structure. The data representation. The basic data structure is a matrix of observations with a vector of labels. We also have multi-label classification when the label data is also a full matrix. In addition, we have sparse data representation for both the observation matrix and the label matrix. In general, base learners are implemented to work with their own data representation (for example, sparse stumps work on sparse observation matrices, or Haar filters work on a integral image data representation. Data parser. We can read in data in arff and svmlight formats. The base learner/data structure combinations cover a large spectrum of possible applications, but the main advantage of the package is that it is easy (for the advanced user) to adapt MultiBoost to a specific (non-standard) application by implementing the base learner and data structure interfaces that work together.
关 键 词: 数据结构; 基础学习器; Haar过滤器
课程来源: 视频讲座网
数据采集: 2022-11-02:chenjy
最后编审: 2022-11-02:chenjy
阅读次数: 32