将主动学习推向最优：可追踪蒙特卡罗和台球算法][Boosting Active Learning to Optimality: a Tracable Monte-Carlo, Billiard-Based Algorithm]_MOOC(慕课)境外开放课程

   首页 → 数学
   首页 → 计算机科学技术
   首页 → 电子、通信与自动控制技术

将主动学习推向最优：可追踪蒙特卡罗和台球算法 Boosting Active Learning to Optimality: a Tracable Monte-Carlo, Billiard-Based Algorithm


课程网址:	http://videolectures.net/ecmlpkdd09_rolet_balo/
主讲教师:	Philippe Rolet
开课单位:	巴黎第十一大学
开课时间:	2009-10-20
课程语种:	英语
中文简介:	本文侧重于主动学习，查询数量有限;在诸如数值工程的应用领域中，由于计算约束，训练集的大小可能限于几十或几百个示例。有限资源下的主动学习被形式化为有限时间强化学习问题，其中抽样策略旨在最小化泛化误差的期望。提出了一种易处理的最优（难以处理）策略的近似，即基于Bandit的主动学习者（Baal）算法。将Active Learning视为单人游戏，Baal结合了UCT，Kocsis和Szepesvari（2006）提出的树形结构多臂强盗算法，以及台球算法。该方法的原理证明表明其对最优政策及其纳入先前AL标准的能力的良好经验趋同。它与查询委员会方法的杂交被发现可以改善独立的Baal和独立的QbC。
课程简介:	This paper focuses on Active Learning with a limited number of queries; in application domains such as Numerical Engineering, the size of the training set might be limited to a few dozen or hundred examples due to computational constraints. Active Learning under bounded resources is formalized as a finite horizon Reinforcement Learning problem, where the sampling strategy aims at minimizing the expectation of the generalization error. A tractable approximation of the optimal (intractable) policy is presented, the Bandit-based Active Learner (Baal) algorithm. Viewing Active Learning as a single-player game, Baal combines UCT, the tree structured multi-armed bandit algorithm proposed by Kocsis and Szepesvari (2006), and billiard algorithms. A proof of principle of the approach demonstrates its good empirical convergence toward an optimal policy and its ability to incorporate prior AL criteria. Its hybridization with the Query-by-Committee approach is found to improve on both stand-alone Baal and stand-alone QbC.
关键词:	数值工程; 计算约束; 有限时间强化学习
课程来源:	视频讲座网
最后编审:	2019-03-27：lxf
阅读次数:	88

服务热线：0574-88229129
电子邮件：info_lib@nbt.edu.cn
信息服务：图书馆305室
系统研发：图书馆303室

图书馆学生服务群：437507696
图书馆教工服务群：1038697975
QQ在线咨询
2013-2025 © 浙大宁波理工学院图书馆