游戏树中连贯推理的最佳玩法][Coherent inference on optimal play in game trees]_MOOC(慕课)境外开放课程

首页 → 数理逻辑与数学基础

游戏树中连贯推理的最佳玩法 Coherent inference on optimal play in game trees


课程网址:	http://videolectures.net/aistats2010_henning_cioop/
主讲教师:	Philipp Hennig
开课单位:	马克斯普朗克研究所
开课时间:	2010-06-03
课程语种:	英语
中文简介:	基于圆的博弈是离散规划问题的一个实例。一些当代最好的博弈树搜索算法使用随机展开作为数据。依靠一个好的策略，他们通过在树中向上传播信息（而不是在兄弟节点之间）来学习策略值。在这里，我们提出了一个生成模型和一个相应的近似消息传递方案，用于在给定随机滚动的情况下，对光滑和/或树中节点的最优、非策略值进行推理。关键的洞见是，价值在博弈树中的分布并不是完全任意的。我们使用每个状态的潜在分数来定义策略值的生成模型，表示随机推出策略下的值。最优策略下的值推理分为归纳、前数据步骤和演绎、后数据部分。这两个问题都可以通过期望传播近似地解决，允许在线性时间内对（指数大）树中的任何节点进行策略外值推断。
课程简介:	Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.
关键词:	游戏; 推理; 离散规划
课程来源:	视频讲座网
最后编审:	2020-07-29：yumf
阅读次数:	63

服务热线：0574-88229129
电子邮件：info_lib@nbt.edu.cn
信息服务：图书馆305室
系统研发：图书馆303室

图书馆学生服务群：437507696
图书馆教工服务群：1038697975
QQ在线咨询
2013-2025 © 浙大宁波理工学院图书馆