
Reinforcement Learning in Decentralized Stochastic Control Systems with Partial History Sharing
课程网址: http://videolectures.net/rldm2015_arabneydi_history_sharing/  
主讲教师: Jalal Arabneydi
开课单位: 麦吉尔大学
开课时间: 2015-07-28
课程语种: 英语
课程简介: In this paper, we are interested in systems with multiple agents that wish to cooperate in order to accomplish a common task while a) agents have different information (decentralized information) and b) agents do not know the complete model of the system i.e., they may only know the partial model or may not know the model at all. The agents must learn the optimal strategies by interacting with their environment i.e., by multi-agent Reinforcement Learning (RL). The presence of multiple agents with different information makes multi-agent (decentralized) reinforcement learning conceptually more difficult than single-agent (centralized) reinforcement learning. We propose a novel multi-agent reinforcement learning algorithm that learns epsilon-team-optimal solution for systems with partial history sharing information structure, which encompasses a large class of multi-agent systems including delayed sharing, control sharing, mean field sharing, etc. Our approach consists of two main steps as follows: 1) the multiagent (decentralized) system is converted to an equivalent single-agent (centralized) POMDP (Partial Observable Markov Decision Process) using the common information approach of Nayyar et al, TAC 2013, and 2) based on the obtained POMDP, an approximate RL algorithm is constructed using a novel methodology. We show that the performance of the RL strategy converges to the optimal performance exponentially fast. We illustrate the proposed approach and verify it numerically by obtaining a multi-agent Q-learning algorithm for two-user Multi Access Broadcast Channel (MABC) which is a benchmark example for multi-agent systems
关 键 词: 算法; 强化学习; 随机系统
课程来源: 视频讲座网
数据采集: 2020-12-14:yxd
最后编审: 2020-12-14:yxd
阅读次数: 56