Two men are arrested, but the police do not possess enough information for a conviction. Following the separation of the two men, the police offer both a similar deal—if one testifies against his partner (defects/betrays), and the other remains silent (cooperates/assists), the betrayer goes free and the one that remains silent receives the full one-year sentence. If both remain silent, both are sentenced to only one month in jail for a minor charge. If each 'rats out' the other, each receives a three-month sentence. Each prisoner must choose either to betray or remain silent; the decision of each is kept quiet. What should they do?
Prisoner A reasons as follows: "If B testifies against me, I will get a one-year sentence if I stay silent, but only three months if I testify against him. If he stays silent, I will get a one-month sentence if I stay silent, but I will walk free if I testify against him. So, in either case, I get the lighter sentence by testifying against B."
Of course B reasons in the same way, and so both prisoners testify against each other and go to jail for three months, even though they could (from their point of view) have achieved a better outcome by cooperating and both staying silent. In the language of game theory, the Nash equilibrium is suboptimal.
This very simple model has a logical structure that is present in many other dilemmas of choice. For instance, we could think of A and B as nations in a world menaced by carbon emissions. Each nation has the choice whether to adhere to some "Kyoto-style" emissions protocol (cooperate/assist), or o disregard it and go full steam ahead with growth as usual in the hope that the other will take up the slack (defect/betray). Just as in the prisoner's dilemma, both parties may decide that it is in their interests to defect, thus pushing their world onto a suboptimal trajectory.
The Prisoner's Dilemma and similar games have been analyzed since the 1950s, when they were originally devised by RAND Corporation strategists thinking about nuclear war. An important development in the 1980s arose from observing that the paradoxical nature of PD is tied to the fact that the "game" is played only once - this may be appropriate when we're talking about thermonuclear war but in lower-stakes contexts the "players" may have to maintain a continuing relationship. So a new study arose of iterated Prisoner's Dilemma where the same two people play a long series of PD games and adjust their strategies according to what they learn of the other player's character from his/her game-playing behavior. The results of these investigations were popularized in a book whose title, The Evolution of Cooperation, summarizes its main theme. On the basis of extensive computer experiments the author, Robert Axelrod, contended that, in the long run, "cooperative" strategies would win out over others (and indeed that the simplest cooperative strategy would do best of all).
|Freeman Dyson. Source: Wikimedia Commons|
Unfortunately there was no mathematical proof of this sanguine conclusion, and the reason has become clear this year with the publication of an extraordinary paper, Iterated Prisoner's Dilemma contains strategies that dominate any evolutionary opponent, by William Press and Freeman Dyson. In this short article the authors show, using nothing more than undergraduate-level linear algebra and probability theory, that "extortionate" strategies exist in iterated PD: if A plays an extortionate strategy, then s/he can guarantee a better long term payoff than B, whatever strategy B plays. [Technical footnote: there is one exception to this, where both players' payoffs are equal - but in that case the payoffs are equal to those of the suboptimal Nash equilibrium, the "both defect" strategy. Any excess payoff over this minimum goes disproportionately to A.]
It's not clear (to me) what practical implications this will have, but it is a striking example of the power of mathematical insight.