部分历史共享分散随机控制系统的强化学习Reinforcement Learning in Decentralized Stochastic Control Systems with Partial History Sharing |
|
课程网址: | http://videolectures.net/rldm2015_arabneydi_history_sharing/ |
主讲教师: | Jalal Arabneydi |
开课单位: | 麦吉尔大学 |
开课时间: | 2015-07-28 |
课程语种: | 英语 |
中文简介: | 在本文中,我们所研究的系统有多个智能体,它们希望合作完成一个共同的任务,而a)智能体具有不同的信息(分散信息)和b)智能体不知道系统的完整模型,即他们可能只知道部分模型或根本不知道模型。agent必须通过与环境交互来学习最优策略,即通过多agent强化学习(RL)。具有不同信息的多个agent的存在使得多agent(分散)强化学习在概念上比单agent(集中式)强化学习更困难。针对具有部分历史共享信息结构的系统,我们提出了一种新的多智能体强化学习算法,该算法包括延迟共享、控制共享、平均场共享等一大类多智能体系统利用Nayyar等人的公共信息方法,将多智能体(分散)系统转化为等效的单智能体(集中式)POMDP(部分可观测马尔可夫决策过程),2)基于得到的POMDP,采用新的方法构造了近似RL算法。我们证明了RL策略的性能以指数级的速度收敛到最优性能。我们给出了一个多智能体Q学习算法,并对其进行了数值验证 |
课程简介: | In this paper, we are interested in systems with multiple agents that wish to cooperate in order to accomplish a common task while a) agents have different information (decentralized information) and b) agents do not know the complete model of the system i.e., they may only know the partial model or may not know the model at all. The agents must learn the optimal strategies by interacting with their environment i.e., by multi-agent Reinforcement Learning (RL). The presence of multiple agents with different information makes multi-agent (decentralized) reinforcement learning conceptually more difficult than single-agent (centralized) reinforcement learning. We propose a novel multi-agent reinforcement learning algorithm that learns epsilon-team-optimal solution for systems with partial history sharing information structure, which encompasses a large class of multi-agent systems including delayed sharing, control sharing, mean field sharing, etc. Our approach consists of two main steps as follows: 1) the multiagent (decentralized) system is converted to an equivalent single-agent (centralized) POMDP (Partial Observable Markov Decision Process) using the common information approach of Nayyar et al, TAC 2013, and 2) based on the obtained POMDP, an approximate RL algorithm is constructed using a novel methodology. We show that the performance of the RL strategy converges to the optimal performance exponentially fast. We illustrate the proposed approach and verify it numerically by obtaining a multi-agent Q-learning algorithm for two-user Multi Access Broadcast Channel (MABC) which is a benchmark example for multi-agent systems |
关 键 词: | 算法; 强化学习; 随机系统 |
课程来源: | 视频讲座网 |
数据采集: | 2020-12-14:yxd |
最后编审: | 2020-12-14:yxd |
阅读次数: | 59 |