00:00:10.460
Hi everyone, thanks for joining me here. Let's begin.
00:00:13.139
So, Ruby paper scissors, or how to build a world-class rock paper scissors bot. A little bit about me first.
00:00:19.830
My name is Dot, short for Dorothy. I'm based in Brighton in the UK on the south coast, hence the accent you may have noticed.
00:00:24.689
I'm a bit of a maths nerd; I studied it in my spare time for a few years with the Open University.
00:00:31.109
I previously helped to organize the Brighton branch of Codebar. If you don't know what Codebar is, you should definitely check them out—they do some great work for under-representation in tech.
00:00:41.879
I currently work for Happy Bear Software, a fabulous Rails consultancy that helped get me here, so thanks to Happy Bear!
00:00:48.870
Now, today I'm going to take you down a bit of a rabbit hole I recently explored, which is completely unrelated to web development and possibly even a little unrelated to Ruby—but trust me, it's going to be fascinating.
00:00:57.329
We're going to be looking at rock-paper-scissors, a simple game played between two people. But just as a formality, let's briefly explain the rules.
00:01:09.030
Each player simultaneously chooses one of the three moves: rock, paper, and scissors. If both players choose the same move, it's a draw. Otherwise, we have rock beats scissors, scissors beats paper, and paper beats rock.
00:01:19.890
But I've often wondered how paper manages to beat rock. In Malaysia, they have a much more sensible version where they have water instead of paper and a bird instead of scissors.
00:01:33.570
Speaking of other versions, here's my personal favorite: bear, hunter, ninja. The earliest Japanese version of the game, which was imported directly from China, instead has a frog, a slug, and a snake.
00:01:41.640
There's also rock-paper-scissors-lizard-Spock, which I'm sure you've seen Sheldon play on The Big Bang Theory. But he never mentions this version: rock-paper-scissors-fire-water-air-sponge, or rock-paper-scissors-fire-water-air-sponge-gun-human.
00:01:54.420
Or even rock-paper-scissors-fire-water-air-sponge-gun-zoom-devil-wolf. And you can add dragon, snake, lightning, and tree. However, that's all nothing compared to rock-paper-scissors 101.
00:02:10.050
Pretty ridiculous, but there are some great moves in it. One of my favorites is the monkey move, which beats a lot of other moves by flinging poop, climbing on them, scaring them, or ripping them up.
00:02:24.580
But of course, monkey doesn’t always win. Monkey is enraged by math, which I’m rather fond of—not just because I like maths, but it makes me look kind of street-smart.
00:02:40.250
And there are other versions involving movement as money, but you should remember that these versions of the game are outlawed within the official Rock-Paper-Scissors Society due to their unnecessary complexity.
00:02:58.160
Yes, there's a World Rock-Paper-Scissors Society, and this is their shiny new website. However, I'm not sure how seriously you should take them.
00:03:02.799
Even lizards are using rock-paper-scissors, but for them, it's personal. This is the male side-blot lizard.
00:03:06.680
Now, these lizards don't actually play rock-paper-scissors—that would be ridiculous—but rock-paper-scissors does model their mating behavior. The male lizard comes in one of three colors: orange, blue, and yellow, and these colors correspond to the moves they play.
00:03:26.680
The orange lizard is greedy and ultra-dominant; it's the largest of the three, establishing large territories with many females. The blue male is smaller and content with a little territory and a single female.
00:03:42.940
However, the orange lizard can beat the blue and take over his territory and female. As for the yellow, he's sly, mimicking the color of sexually mature females without having territory, sneaking into the territories of oranges to mate.
00:03:59.990
This gives us the play of orange beats blue, blue beats yellow, and yellow beats orange since the yellow sneaks into the territory. However, returning to rock-paper-scissors, you may wonder how many strategies there are.
00:04:24.020
There's actually more to it than just random play. There are a couple of great techniques and ways to win provided to us by the lovely World Rock-Paper-Scissors Society in their guide.
00:04:39.130
One such move is called the 'crystal ball'—the aim is to confuse your opponent by declaring what move you will play next, hoping they get confused and won’t play it.
00:04:45.150
For instance, you could say you're going to play paper, anticipating that they'll avoid paper, resulting in playing either rock or scissors, making rock a safe bet.
00:04:51.919
Another difficult strategy for seasoned players is 'tattoo play.' The aim here is to distract your opponent with subtle clues, like wearing a t-shirt that says 'I'm going to play paper,' or a belt that says 'scissors.' Some may even go as far as making body modifications with 'rock' and 'scissors' tattooed across their knuckles.
00:05:09.550
Of course, bots can't use these strategies.
00:05:11.960
So, what techniques could our bot employ? Well, there's a bot that has a 100% winning rate, capable of recognizing the position and move of the opponent’s hand in one millisecond and immediately playing a counter move.
00:05:22.550
However, I would say this is clearly cheating. Instead, we should look toward techniques that can help our bot decide what the opponent will play before they play it.
00:05:38.320
Let's frame the problem in terms of gameplay. Rock-paper-scissors can be thought of as a scaled-down version of poker. In poker, you're unaware of your opponent's move at the time you're making yours, which means you have to analyze your opponent's tendencies, such as whether they frequently bluff.
00:05:55.590
This situation, where you do not know your opponent's next move when making your own, is known as imperfect information. This characteristic also applies to rock-paper-scissors, which makes it interesting to model.
00:06:18.060
In contrast, games such as chess are known as perfect information games where each player can see all of the pieces on the board at all times.
00:06:27.270
Furthermore, in rock-paper-scissors, there is only one winner and one loser, known as zero-sum. Zero-sum games are termed as such because the net outcome is zero.
00:06:43.920
For example, if we have a win as plus one and a loss as minus one, the draw is zero. Thus, one winner and one loser yield zero, and two draws result in zero.
00:06:56.340
This stands in contrast to games where participants can all gain or suffer together, referred to as non-zero-sum games.
00:07:10.250
Let's delve into competition and strategies to crush our opponents. In 1999, and again a year later, a rock-paper-scissors bot competition entitled the International Rochambeau Programming Competition was run by Dice Billings.
00:07:29.510
He invited people to create bots to compete in a round-robin competition. Each match between two bots consisted of 1,000 rounds of rock-paper-scissors.
00:07:48.360
They included an open round where every match result counted, as well as a 'best of the best' bout where the highest number of wins determined the winner.
00:07:58.260
To encourage participants to create bots that actively tried to beat each other, they even added dummy bots into the mix.
00:08:06.540
Not all players were optimal, and this dynamic changed everything. This meant to beat these dummy bots, you had to detect patterns in their moves and employ appropriate counter-strategies.
00:08:22.470
To summarize: if we have some not-so-great bots, we can create some really impressive ones to beat them!
00:08:29.210
So, known as the winner of this competition, we're going to step through the evolution of that bot. First, let's look at the most well-known strategy: the random bot.
00:08:44.220
This is what all our examples will look like. Each bot has a state containing our past moves and our opponent's past moves. We have a next turn method, which will return our next chosen move.
00:09:01.640
Here we see that we are randomly returning one of our P for rock, P for paper, or S for scissors. So how does this bot fare against others? The always rock bot is an example; I think any one of us could defeat this bot.
00:09:21.860
However, the random bot would still only win around 50% of the time, meaning random play is often considered optimal—because it's impossible to gain an advantage over a truly random opponent.
00:09:43.050
However, random play can’t exploit opponents who utilize strategies. Clearly, random play is far from optimal because not losing is not the same as winning.
00:10:00.490
Let's take a look at how we can beat random play using a frequency counting bot. This bot actively tries to beat other bots by using frequency analysis to determine which move to play.
00:10:16.410
So here, we increment our counter with the opponent's last move. We take the move with the highest count and return our counter move.
00:10:32.230
Against the always rock bot, the rock count in our hash would increase, meaning this bot would win every time. In fact, this bot would beat any opponent that chose rock, paper, or scissors with different statistical frequencies.
00:10:49.040
Additionally, we have our fixed string bot which cycles through a balanced string of moves. This means we have the same number of rocks, papers, and scissors.
00:11:06.830
Now, let’s discuss how we might generate our balanced strings of P's, S's, and R's. If we did it manually, it would take ages or be full of patterns. This is where de Bruijn sequences come in, as used in the official tournament.
00:11:32.750
De Bruijn is a cyclic sequence of order n on a size K alphabet in which every possible string of length n occurs exactly once as a substring.
00:11:55.670
Simply put, it consists of a set number of elements in which every possible n-length combination occurs. Looking at the string we used, we have three different elements: rock, paper, and scissors.
00:12:12.580
Now, say we wanted all substrings of length three. We can represent this like so, where A is our set, and B represents both the length of our set and the length of our substring.
00:12:29.400
In this string, there’s every possible three-length combination of our elements, including wrapping around at the end.
00:12:43.550
De Bruijn strings can be created using any set you'd like. For instance, if we had the set {0, 1} and sought all substrings of length three, we'd get two distinct strings—one being the reverse of the other.
00:13:01.510
Now, let's say we have a set of {1, 2, 3, 4} and aim for all substrings of length four. This would yield a specific string length as well.
00:13:24.490
Therefore, observe that these strings are optimally short—long enough to contain all substrings without any repetitions.
00:13:36.040
Returning to our RPS set, if we wanted all substrings of length six, we would see how rapidly the string's length increases.
00:13:54.560
And you can generate De Bruijn strings quite easily, even the super long one, making them quite hard to detect.
00:14:05.830
This is the code I used to generate De Bruijn strings, which I translated from the Python example on the Wikipedia page into Ruby code.
00:14:20.450
Now, using the fixed string is a pretty good starting strategy, but how do we tackle beating the frequency counting bot?
00:14:35.290
We need a new method. One great option is to employ direct history analysis. This is a solid strategy for dismantling an opponent's play.
00:14:57.310
We can store their move history and look for patterns. Here is our opponent's last 30 moves where the latest move is rightmost.
00:15:14.590
Let’s analyze the last 10 moves for patterns. For instance, if we find a match, we can record the next move they'd likely play.
00:15:30.170
Of course, we could decrease our window size to find other matches and possible following moves, refining the potential counter.
00:15:44.720
This can be compiled into a table to visualize several strategies. We can take the matches of the longest length and find the most effective counter.
00:15:58.650
We might also consider those matches with the highest count and implement counter moves based on that data.
00:16:18.940
Additionally, we can aggregate data by window sizes and determine the best counter move using the highest yields across different scenarios.
00:16:35.520
Let’s look at a basic example for direct history implementation. This will analyze the opponents' last moves and seek out matching sequences.
00:16:50.140
For example, here's a simple strategy where we append their latest move to our history string. We look for matches using regex and record the following character.
00:17:01.550
The regex patterns can thoroughly analyze sequences. The dollar sign indicates the end of the line, while backslash-one references previously captured groups.
00:17:18.800
The regex will capture sequences and identify subsequent moves that could be countered effectively. There are also match conditions that govern the number of moves tracked.
00:17:36.600
By exploring these conditional strategies, we derive optimal countering strategies based on observed behavior.
00:17:57.010
In summary, we develop a variety of methods looking forward to an opponent's likely moves, then executing optimal play based on historical patterns.
00:18:14.810
Combining strategies will lead to more complex interactions, making it harder for opponents to predict our moves.
00:18:31.100
To adapt further, we might develop a meta-strategy. This keeps a tally of how each strategic approach has performed over time.
00:18:50.860
For instance, if our bot often plays rock, we can check if the last few opponents used paper or scissors against that play.
00:19:05.390
Therein lies the potential to change tactics based on winning streaks, ensuring the bot evolves into a more competitive player.
00:19:23.120
As a final example, let's discuss how to reflect on past performance. Suppose our bot has maintained a winning streak—its historical data should encourage more of that successful strategy.
00:19:41.270
But if the opponent unexpectedly alters their strategy, we'll need a mechanism in place to allow our bot to adapt seamlessly—such as gradually diminishing the success of previous strategies.
00:20:00.320
This decay mechanism will ensure that over time, newer patterns are allotted more significance, helping our bot to choose an effective strategy against challengers.
00:20:16.480
With these various approaches combined, the potential for our rock-paper-scissors bot becomes quite interesting.
00:20:29.780
Ultimately, we've looked at multiple strategies, meta-strategies, and all the roadmaps that apply to them.
00:20:41.490
The culmination of these ideas has led me to build a Slack bot that leverages these concepts. You can play with the bot and see how it reacts to your moves.
00:20:58.480
It learns form your inputs and adapts accordingly, building on practical applications of the strategies we've discussed.
00:21:16.380
You can find an 'Add to Slack' button at the link provided, which allows you to integrate this bot into your workspace.
00:21:35.530
The second link is my GitHub for the project—contributions and pull requests are welcome! I'll also share these slides later on Twitter.
00:21:50.450
Thank you!