0


评估两个玩家重复博弈的确定性策略

Evaluating Deterministic Policies in Two-player Iterated Games
课程网址: http://videolectures.net/eccs07_dilao_edp/  
主讲教师: Rui Dilão
开课单位: 里斯本大学
开课时间: 2007-11-29
课程语种: 英语
中文简介:
我们构造了一个游戏的统计集合,在每个独立的子集合中,我们有两个玩家在玩同一个游戏。我们推导出游戏中具有代表性的玩家的每次移动的平均收益,并用有限内存评估所有确定性策略。特别地,我们表明,如果一个玩家有一个广义的“以牙还牙”策略,那么两个玩家的每次移动平均收益是相同的,这就迫使两个玩家的每次移动平均收益相等。在对称、非合作和两难博弈的情况下,我们证明了广义的以牙还牙策略或模仿策略,以及不首先出现缺陷的条件,导致了玩家每次移动的平均回报最高。在此基础上,分析了囚徒困境和鹰鸽博弈,确定了无限迭代博弈的均衡状态。无限迭代的囚徒困境博弈只有在玩家有确定性策略的情况下才能有纳什解。
课程简介: We construct a statistical ensemble of games, where in each independent subensemble we have two players playing the same game. We derive the mean payoffs per move of the representative players of the game, and we evaluate all the deterministic policies with finite memory. In particular,we show that if one of the players has a generalized tit-for-tat policy,the mean payoff per move of both players is the same, forcing the equalization of the mean payoffs per move of both players. In the case of symmetric, non-cooperative and dilemmatic games, we show that generalized tit-for-tat or imitation policies together with the condition of not being the first to defect, leads to the highest mean payoffs per move for the players. Within this approach, it can be decided which policies perform better than others.The Prisoner's Dilemma and the Hawk-Dove games have been analyzed,and the equilibrium states of the infinitely iterated games have been determined. The infinitely iterated Prisoner's Dilemma game can have Nash solutions only if players have deterministic policies.
关 键 词: 政策推广; 移动平均收益; 无限迭代; 困境博弈
课程来源: 视频讲座网
最后编审: 2019-11-30:lxf
阅读次数: 45