Prisoner's Dilemma: Introduction of a player playing according to a Reinforcement Learning algorithm
The Prisoner's Dilemma, formulated in 1950 by Albert W. Tucker at Princeton, characterises a situation in game theory where two players have an interest in cooperating but, in the absence of communication between them, each chooses to betray the other. In 1984, Robert Axelrod published his article entitled ‘The Evolution of Cooperation’. He examined the biological and sociological foundations of cooperation.
Axelrod suggested that reciprocity, i.e. responding to positive actions with positive actions and to negative actions with negative actions, was an essential element of cooperation.
This principle, embodied in the tit-for-tat strategy, has proved remarkably effective in Axelrod's computer simulations of repeated prisoner's dilemma games. The aim of our project is to introduce an agent playing with a Deep Q-Network algorithm against different types of player (random player, player playing the ‘Tit For That’ strategy).
This will enable us to study the DQN player's ability to learn and adapt to different player profiles. Finally, we will be able to see whether a reinforcement learning algorithm can do better than the optimal ‘Tit For Tat’ strategy found by Axelrod.