Algorithms

How to Build a World-Class Rock Paper Scissors Bot

How to Build a World-Class Rock Paper Scissors Bot

by Dorothy Jane Wingrove

In this engaging talk titled "How to Build a World-Class Rock Paper Scissors Bot," presented by Dorothy Jane Wingrove at RubyConf 2017, the speaker explores the complexities behind the seemingly simple game of rock-paper-scissors and its applications in creating AI bots. The talk begins with an introduction to the game itself, outlining basic rules and exploring various amusing variations, such as rock-paper-scissors-lizard-Spock and even more extravagant versions like rock-paper-scissors-fire-water-air-sponge-gun-zoom-devil-wolf.

Key Points Discussed:
- Game Theory and Mind Games: Wingrove touches on the strategic elements of the game, emphasizing that it's more than just random chance. She highlights techniques like the 'crystal ball' and 'tattoo play' that seasoned players use to mislead opponents.
- Bot Strategies: The talk emphasizes that bots cannot use human tactics. Instead, attendees learn about the implementation of various strategies and the development of competitive bots that analyze their opponents’ moves.
- Historical Context: Wingrove shares insights from the International Rochambeau Programming Competition where teams created bots to compete in thousands of rounds of rock-paper-scissors.
- Types of Bots: She discusses different types of bots, starting from a random bot (which performs decently against others) to more advanced strategies like frequency counting bots and fixed string bots, which aim to track and counter opponents' behavior.
- De Bruijn Sequences: The speaker introduces De Bruijn sequences, a method for optimizing bots' move strategies, ensuring all potential combinations are efficiently utilized.
- Direct History Analysis: Wingrove explains how analyzing opponent's move history can lead to better counter strategies, emphasizing the importance of tracking patterns to anticipate future moves.
- Meta-Strategies: She concludes with the concept of developing a meta-strategy, where bots can adapt their playstyle based on historical performance, ensuring they remain competitive against evolving tactics.

In addition to the technical aspects, the talk encourages humor and engagement, making the session both educational and enjoyable. Wingrove's practical application is showcased through the introduction of a Slack bot that utilizes these strategies, inviting viewers to interact and explore the bot’s capabilities.

Overall, the main takeaway from this talk is the intricate balancing of luck and strategy in rock-paper-scissors and the potential for developing sophisticated bots that leverage human-like strategic thinking within a programming context. Wingrove's unique approach, along with practical demonstrations, underscores the intricacies of game theory and AI development, making this session valuable for Ruby programmers of all levels.

00:00:10.460 Hi everyone, thanks for joining me here. Let's begin.
00:00:13.139 So, Ruby paper scissors, or how to build a world-class rock paper scissors bot. A little bit about me first.
00:00:19.830 My name is Dot, short for Dorothy. I'm based in Brighton in the UK on the south coast, hence the accent you may have noticed.
00:00:24.689 I'm a bit of a maths nerd; I studied it in my spare time for a few years with the Open University.
00:00:31.109 I previously helped to organize the Brighton branch of Codebar. If you don't know what Codebar is, you should definitely check them out—they do some great work for under-representation in tech.
00:00:41.879 I currently work for Happy Bear Software, a fabulous Rails consultancy that helped get me here, so thanks to Happy Bear!
00:00:48.870 Now, today I'm going to take you down a bit of a rabbit hole I recently explored, which is completely unrelated to web development and possibly even a little unrelated to Ruby—but trust me, it's going to be fascinating.
00:00:57.329 We're going to be looking at rock-paper-scissors, a simple game played between two people. But just as a formality, let's briefly explain the rules.
00:01:09.030 Each player simultaneously chooses one of the three moves: rock, paper, and scissors. If both players choose the same move, it's a draw. Otherwise, we have rock beats scissors, scissors beats paper, and paper beats rock.
00:01:19.890 But I've often wondered how paper manages to beat rock. In Malaysia, they have a much more sensible version where they have water instead of paper and a bird instead of scissors.
00:01:33.570 Speaking of other versions, here's my personal favorite: bear, hunter, ninja. The earliest Japanese version of the game, which was imported directly from China, instead has a frog, a slug, and a snake.
00:01:41.640 There's also rock-paper-scissors-lizard-Spock, which I'm sure you've seen Sheldon play on The Big Bang Theory. But he never mentions this version: rock-paper-scissors-fire-water-air-sponge, or rock-paper-scissors-fire-water-air-sponge-gun-human.
00:01:54.420 Or even rock-paper-scissors-fire-water-air-sponge-gun-zoom-devil-wolf. And you can add dragon, snake, lightning, and tree. However, that's all nothing compared to rock-paper-scissors 101.
00:02:10.050 Pretty ridiculous, but there are some great moves in it. One of my favorites is the monkey move, which beats a lot of other moves by flinging poop, climbing on them, scaring them, or ripping them up.
00:02:24.580 But of course, monkey doesn’t always win. Monkey is enraged by math, which I’m rather fond of—not just because I like maths, but it makes me look kind of street-smart.
00:02:40.250 And there are other versions involving movement as money, but you should remember that these versions of the game are outlawed within the official Rock-Paper-Scissors Society due to their unnecessary complexity.
00:02:58.160 Yes, there's a World Rock-Paper-Scissors Society, and this is their shiny new website. However, I'm not sure how seriously you should take them.
00:03:02.799 Even lizards are using rock-paper-scissors, but for them, it's personal. This is the male side-blot lizard.
00:03:06.680 Now, these lizards don't actually play rock-paper-scissors—that would be ridiculous—but rock-paper-scissors does model their mating behavior. The male lizard comes in one of three colors: orange, blue, and yellow, and these colors correspond to the moves they play.
00:03:26.680 The orange lizard is greedy and ultra-dominant; it's the largest of the three, establishing large territories with many females. The blue male is smaller and content with a little territory and a single female.
00:03:42.940 However, the orange lizard can beat the blue and take over his territory and female. As for the yellow, he's sly, mimicking the color of sexually mature females without having territory, sneaking into the territories of oranges to mate.
00:03:59.990 This gives us the play of orange beats blue, blue beats yellow, and yellow beats orange since the yellow sneaks into the territory. However, returning to rock-paper-scissors, you may wonder how many strategies there are.
00:04:24.020 There's actually more to it than just random play. There are a couple of great techniques and ways to win provided to us by the lovely World Rock-Paper-Scissors Society in their guide.
00:04:39.130 One such move is called the 'crystal ball'—the aim is to confuse your opponent by declaring what move you will play next, hoping they get confused and won’t play it.
00:04:45.150 For instance, you could say you're going to play paper, anticipating that they'll avoid paper, resulting in playing either rock or scissors, making rock a safe bet.
00:04:51.919 Another difficult strategy for seasoned players is 'tattoo play.' The aim here is to distract your opponent with subtle clues, like wearing a t-shirt that says 'I'm going to play paper,' or a belt that says 'scissors.' Some may even go as far as making body modifications with 'rock' and 'scissors' tattooed across their knuckles.
00:05:09.550 Of course, bots can't use these strategies.
00:05:11.960 So, what techniques could our bot employ? Well, there's a bot that has a 100% winning rate, capable of recognizing the position and move of the opponent’s hand in one millisecond and immediately playing a counter move.
00:05:22.550 However, I would say this is clearly cheating. Instead, we should look toward techniques that can help our bot decide what the opponent will play before they play it.
00:05:38.320 Let's frame the problem in terms of gameplay. Rock-paper-scissors can be thought of as a scaled-down version of poker. In poker, you're unaware of your opponent's move at the time you're making yours, which means you have to analyze your opponent's tendencies, such as whether they frequently bluff.
00:05:55.590 This situation, where you do not know your opponent's next move when making your own, is known as imperfect information. This characteristic also applies to rock-paper-scissors, which makes it interesting to model.
00:06:18.060 In contrast, games such as chess are known as perfect information games where each player can see all of the pieces on the board at all times.
00:06:27.270 Furthermore, in rock-paper-scissors, there is only one winner and one loser, known as zero-sum. Zero-sum games are termed as such because the net outcome is zero.
00:06:43.920 For example, if we have a win as plus one and a loss as minus one, the draw is zero. Thus, one winner and one loser yield zero, and two draws result in zero.
00:06:56.340 This stands in contrast to games where participants can all gain or suffer together, referred to as non-zero-sum games.
00:07:10.250 Let's delve into competition and strategies to crush our opponents. In 1999, and again a year later, a rock-paper-scissors bot competition entitled the International Rochambeau Programming Competition was run by Dice Billings.
00:07:29.510 He invited people to create bots to compete in a round-robin competition. Each match between two bots consisted of 1,000 rounds of rock-paper-scissors.
00:07:48.360 They included an open round where every match result counted, as well as a 'best of the best' bout where the highest number of wins determined the winner.
00:07:58.260 To encourage participants to create bots that actively tried to beat each other, they even added dummy bots into the mix.
00:08:06.540 Not all players were optimal, and this dynamic changed everything. This meant to beat these dummy bots, you had to detect patterns in their moves and employ appropriate counter-strategies.
00:08:22.470 To summarize: if we have some not-so-great bots, we can create some really impressive ones to beat them!
00:08:29.210 So, known as the winner of this competition, we're going to step through the evolution of that bot. First, let's look at the most well-known strategy: the random bot.
00:08:44.220 This is what all our examples will look like. Each bot has a state containing our past moves and our opponent's past moves. We have a next turn method, which will return our next chosen move.
00:09:01.640 Here we see that we are randomly returning one of our P for rock, P for paper, or S for scissors. So how does this bot fare against others? The always rock bot is an example; I think any one of us could defeat this bot.
00:09:21.860 However, the random bot would still only win around 50% of the time, meaning random play is often considered optimal—because it's impossible to gain an advantage over a truly random opponent.
00:09:43.050 However, random play can’t exploit opponents who utilize strategies. Clearly, random play is far from optimal because not losing is not the same as winning.
00:10:00.490 Let's take a look at how we can beat random play using a frequency counting bot. This bot actively tries to beat other bots by using frequency analysis to determine which move to play.
00:10:16.410 So here, we increment our counter with the opponent's last move. We take the move with the highest count and return our counter move.
00:10:32.230 Against the always rock bot, the rock count in our hash would increase, meaning this bot would win every time. In fact, this bot would beat any opponent that chose rock, paper, or scissors with different statistical frequencies.
00:10:49.040 Additionally, we have our fixed string bot which cycles through a balanced string of moves. This means we have the same number of rocks, papers, and scissors.
00:11:06.830 Now, let’s discuss how we might generate our balanced strings of P's, S's, and R's. If we did it manually, it would take ages or be full of patterns. This is where de Bruijn sequences come in, as used in the official tournament.
00:11:32.750 De Bruijn is a cyclic sequence of order n on a size K alphabet in which every possible string of length n occurs exactly once as a substring.
00:11:55.670 Simply put, it consists of a set number of elements in which every possible n-length combination occurs. Looking at the string we used, we have three different elements: rock, paper, and scissors.
00:12:12.580 Now, say we wanted all substrings of length three. We can represent this like so, where A is our set, and B represents both the length of our set and the length of our substring.
00:12:29.400 In this string, there’s every possible three-length combination of our elements, including wrapping around at the end.
00:12:43.550 De Bruijn strings can be created using any set you'd like. For instance, if we had the set {0, 1} and sought all substrings of length three, we'd get two distinct strings—one being the reverse of the other.
00:13:01.510 Now, let's say we have a set of {1, 2, 3, 4} and aim for all substrings of length four. This would yield a specific string length as well.
00:13:24.490 Therefore, observe that these strings are optimally short—long enough to contain all substrings without any repetitions.
00:13:36.040 Returning to our RPS set, if we wanted all substrings of length six, we would see how rapidly the string's length increases.
00:13:54.560 And you can generate De Bruijn strings quite easily, even the super long one, making them quite hard to detect.
00:14:05.830 This is the code I used to generate De Bruijn strings, which I translated from the Python example on the Wikipedia page into Ruby code.
00:14:20.450 Now, using the fixed string is a pretty good starting strategy, but how do we tackle beating the frequency counting bot?
00:14:35.290 We need a new method. One great option is to employ direct history analysis. This is a solid strategy for dismantling an opponent's play.
00:14:57.310 We can store their move history and look for patterns. Here is our opponent's last 30 moves where the latest move is rightmost.
00:15:14.590 Let’s analyze the last 10 moves for patterns. For instance, if we find a match, we can record the next move they'd likely play.
00:15:30.170 Of course, we could decrease our window size to find other matches and possible following moves, refining the potential counter.
00:15:44.720 This can be compiled into a table to visualize several strategies. We can take the matches of the longest length and find the most effective counter.
00:15:58.650 We might also consider those matches with the highest count and implement counter moves based on that data.
00:16:18.940 Additionally, we can aggregate data by window sizes and determine the best counter move using the highest yields across different scenarios.
00:16:35.520 Let’s look at a basic example for direct history implementation. This will analyze the opponents' last moves and seek out matching sequences.
00:16:50.140 For example, here's a simple strategy where we append their latest move to our history string. We look for matches using regex and record the following character.
00:17:01.550 The regex patterns can thoroughly analyze sequences. The dollar sign indicates the end of the line, while backslash-one references previously captured groups.
00:17:18.800 The regex will capture sequences and identify subsequent moves that could be countered effectively. There are also match conditions that govern the number of moves tracked.
00:17:36.600 By exploring these conditional strategies, we derive optimal countering strategies based on observed behavior.
00:17:57.010 In summary, we develop a variety of methods looking forward to an opponent's likely moves, then executing optimal play based on historical patterns.
00:18:14.810 Combining strategies will lead to more complex interactions, making it harder for opponents to predict our moves.
00:18:31.100 To adapt further, we might develop a meta-strategy. This keeps a tally of how each strategic approach has performed over time.
00:18:50.860 For instance, if our bot often plays rock, we can check if the last few opponents used paper or scissors against that play.
00:19:05.390 Therein lies the potential to change tactics based on winning streaks, ensuring the bot evolves into a more competitive player.
00:19:23.120 As a final example, let's discuss how to reflect on past performance. Suppose our bot has maintained a winning streak—its historical data should encourage more of that successful strategy.
00:19:41.270 But if the opponent unexpectedly alters their strategy, we'll need a mechanism in place to allow our bot to adapt seamlessly—such as gradually diminishing the success of previous strategies.
00:20:00.320 This decay mechanism will ensure that over time, newer patterns are allotted more significance, helping our bot to choose an effective strategy against challengers.
00:20:16.480 With these various approaches combined, the potential for our rock-paper-scissors bot becomes quite interesting.
00:20:29.780 Ultimately, we've looked at multiple strategies, meta-strategies, and all the roadmaps that apply to them.
00:20:41.490 The culmination of these ideas has led me to build a Slack bot that leverages these concepts. You can play with the bot and see how it reacts to your moves.
00:20:58.480 It learns form your inputs and adapts accordingly, building on practical applications of the strategies we've discussed.
00:21:16.380 You can find an 'Add to Slack' button at the link provided, which allows you to integrate this bot into your workspace.
00:21:35.530 The second link is my GitHub for the project—contributions and pull requests are welcome! I'll also share these slides later on Twitter.
00:21:50.450 Thank you!