0


游戏树中连贯推理的最佳玩法

Coherent inference on optimal play in game trees
课程网址: http://videolectures.net/aistats2010_henning_cioop/  
主讲教师: Philipp Hennig
开课单位: 马克斯普朗克研究所
开课时间: 2010-06-03
课程语种: 英语
中文简介:
基于圆的博弈是离散规划问题的一个实例。一些当代最好的博弈树搜索算法使用随机展开作为数据。依靠一个好的策略,他们通过在树中向上传播信息(而不是在兄弟节点之间)来学习策略值。在这里,我们提出了一个生成模型和一个相应的近似消息传递方案,用于在给定随机滚动的情况下,对光滑和/或树中节点的最优、非策略值进行推理。关键的洞见是,价值在博弈树中的分布并不是完全任意的。我们使用每个状态的潜在分数来定义策略值的生成模型,表示随机推出策略下的值。最优策略下的值推理分为归纳、前数据步骤和演绎、后数据部分。这两个问题都可以通过期望传播近似地解决,允许在线性时间内对(指数大)树中的任何节点进行策略外值推断。
课程简介: Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.
关 键 词: 游戏; 推理; 离散规划
课程来源: 视频讲座网
最后编审: 2020-07-29:yumf
阅读次数: 47