将主动学习推向最优:可追踪蒙特卡罗和台球算法Boosting Active Learning to Optimality: a Tracable Monte-Carlo, Billiard-Based Algorithm |
|
课程网址: | http://videolectures.net/ecmlpkdd09_rolet_balo/ |
主讲教师: | Philippe Rolet |
开课单位: | 巴黎第十一大学 |
开课时间: | 2009-10-20 |
课程语种: | 英语 |
中文简介: | 本文侧重于主动学习,查询数量有限;在诸如数值工程的应用领域中,由于计算约束,训练集的大小可能限于几十或几百个示例。有限资源下的主动学习被形式化为有限时间强化学习问题,其中抽样策略旨在最小化泛化误差的期望。提出了一种易处理的最优(难以处理)策略的近似,即基于Bandit的主动学习者(Baal)算法。将Active Learning视为单人游戏,Baal结合了UCT,Kocsis和Szepesvari(2006)提出的树形结构多臂强盗算法,以及台球算法。该方法的原理证明表明其对最优政策及其纳入先前AL标准的能力的良好经验趋同。它与查询委员会方法的杂交被发现可以改善独立的Baal和独立的QbC。 |
课程简介: | This paper focuses on Active Learning with a limited number of queries; in application domains such as Numerical Engineering, the size of the training set might be limited to a few dozen or hundred examples due to computational constraints. Active Learning under bounded resources is formalized as a finite horizon Reinforcement Learning problem, where the sampling strategy aims at minimizing the expectation of the generalization error. A tractable approximation of the optimal (intractable) policy is presented, the Bandit-based Active Learner (Baal) algorithm. Viewing Active Learning as a single-player game, Baal combines UCT, the tree structured multi-armed bandit algorithm proposed by Kocsis and Szepesvari (2006), and billiard algorithms. A proof of principle of the approach demonstrates its good empirical convergence toward an optimal policy and its ability to incorporate prior AL criteria. Its hybridization with the Query-by-Committee approach is found to improve on both stand-alone Baal and stand-alone QbC. |
关 键 词: | 数值工程; 计算约束; 有限时间强化学习 |
课程来源: | 视频讲座网 |
最后编审: | 2019-03-27:lxf |
阅读次数: | 61 |