00:00:04.339
Good morning, everybody. Welcome to this first session today.
00:00:13.019
The title is "On Ruby and ꝩduЯ, or How Scary are Trojan Source Attacks." This is the overview.
00:00:21.840
I’ll have a little bit that may scare you at some point, so if you don't like to get scared, maybe you should leave.
00:00:34.739
This is my introduction, just to show that I have done related work before.
00:00:40.020
My students and I have also contributed to Ruby in many different ways, mostly on internationalization.
00:00:46.620
The start for this talk is based on a paper titled 'Trojan Source: Invisible Vulnerabilities.'
00:01:02.399
There is a spelling mistake here; the word 'vulnerabilities' is very interesting in terms of English orthography.
00:01:13.619
The spelling of the 'J' here is actually changed to fit the adjective 'Triple J.' Anyway, this paper came out about two and a half years ago.
00:01:25.560
It was revised and will be presented later this year. Who wrote this paper?
00:01:36.840
Well, Nicolas is probably a student in Cambridge, and Rose Anderson is a well-known security expert.
00:01:51.540
This book I referenced is actually over 1200 pages and weighs 1.8 kilograms. If it was a little bit lighter, I would have brought it here to show you, but it’s a bit too heavy.
00:02:11.400
Now, imagine that you are out at sea at night, and there's fog. Suddenly, you get attacked!
00:02:23.400
I’m not sure by zombies; I gave a lightning talk on zombies once at RubyKaigi.
00:02:28.500
That's actually the website for this paper, so I would at least give this paper a prize for scariest website associated with it. Anyway, let's go back to the talk and see how good you are with Ruby.
00:03:12.840
Here is a small Ruby program, and the question here is: who wins? What will the program print out?
00:03:39.360
So can I please ask for your hands? If you think, after executing this program, it prints out 'white,' please raise your left hand. If you think it will print out 'black,' then please raise your right hand. Everybody give it a try.
00:04:05.640
Okay, we have a lot of left hands, but we also have some right hands, which are kind of contrarians for whatever reason.
00:04:15.060
Let’s go to the next example.
00:04:20.699
This example looks exactly the same, but it’s actually different. I can show you here.
00:05:14.280
Again, sorry for the confusion in this program. Who thinks that 'white' wins? Please raise your left hand. Who thinks that 'black' wins? Please raise your right hand.
00:05:42.960
Okay, we have almost the same distribution. Let’s go on to the next example.
00:06:17.100
This is a bit simpler. In this example, we don't even do any calculations; we simply compensate for the fact that black plays first.
00:07:06.240
What do you think? Who wins? Again, please use your hands.
00:07:09.060
Who thinks that the program prints out 'white'? Who thinks it prints 'black'? Okay, you're not completely wrong. Let’s move on.
00:07:38.820
Now it gets interesting. What will this program do? This is a legal Ruby program, in most versions at least.
00:08:15.420
Again, please raise your hands if you want to take a guess at its output.
00:08:41.760
We have made some guesses. Now, let's see my guesses compared to yours.
00:09:49.320
I think I wasn’t too far off, but let's explain what just happened.
00:10:44.700
In example one, there are some special characters. The assignment goes to a different variable, and 'white' and 'black' are at 1445.
00:11:05.080
The score becomes negative, and thus 'black' wins.
00:11:32.339
To give you more details, one of the 'i's is actually the Cyrillic small 'i' while the other is the usual Latin 'i.' You see, we even have some political implications here.
00:12:02.999
It's interesting to note that the Belarusians and the Russians are together, whereas the Ukrainians are separate in politics.
00:12:34.560
Anyway, this one is called a homoglyph attack, using identifiers that look the same but contain different characters.
00:12:51.060
This is well known from internationalized domain names, although it happens rarely there.
00:13:57.600
Next, what's going on in example two? In example two, we cannot search for the word 'white' throughout the text, but we can do that in the second example.
00:14:34.780
However, there is an invisible zero-width space that you cannot see.
00:15:12.180
Why is this possible? Ruby allows all non-ASCII spaces in identifiers. So I'll call this an invisible space attack, which is not well known.
00:15:35.220
This may be the only exception in Ruby, and we should close this loophole.
00:16:00.500
In example four, there are many different spaces in Unicode, including the zero-width space and the ideographic space, better known as full-width space.
00:16:45.200
This allows us to write programs using invisible characters, but we cannot conceal critical elements such as 'if' and 'else'.
00:17:23.100
This shows that a program structured like this is most likely going to be discovered very rapidly.
00:18:10.740
Next example: Here is the actual code, and I will concentrate on this example, as it's a bit more intricate.
00:18:59.160
In this example, I added various characters. We look at these in detail; they are known as Unicode bi-directional formatting characters.
00:19:37.740
The structure allows these characters as part of variable names in Ruby; almost everything is allowed.
00:20:18.420
What is the character involved here? The character is 'subtracting,' or 'black minus white.'
00:20:46.020
But then these special characters control how this is displayed, which explains why many of you guessed incorrectly.
00:21:30.960
That's how the bi-directional ordering attack works.
00:21:59.760
Here is a list of all bi-directional control characters for your reference.
00:22:39.540
In summary, these invisible characters can change the order of display, and they’re needed for special situations with bi-directional text.
00:23:15.960
Each line of code represents a separate paragraph in this algorithm.
00:23:51.420
For an attack to be successful, it needs to fit in a single line.
00:24:28.020
Most text is displayed left to right, but certain languages like Arabic or Hebrew are right to left.
00:25:02.040
However, in programming, symbols such as operators define structure, and the words should ideally follow that structure.
00:25:48.300
That was a detailed explanation of example three, and I hope you grasp the significance of these vulnerabilities.
00:26:30.960
We need to implement defenses in depth; it’s about more than just blaming one aspect of the ecosystem.
00:27:17.880
In addition, a healthy ecosystem is necessary to help prevent potential attacks.
00:28:00.320
I’d like to highlight how editors display information and advise against forcing team members to use the same editor.
00:29:05.640
In summary, we should ensure that any programming language tokens are displayed as words that can be read correctly by those familiar with Arabic or Hebrew.
00:30:03.740
Next steps for Ruby must address these vulnerabilities and aim to eliminate non-ASCII spaces and control characters.
00:30:37.520
In summary, our approach should be comprehensive, addressing multiple factors instead of simply pointing fingers.
00:31:15.720
Thank you for your attention. For questions, feel free to contact me directly, and I will provide a link to my presentation as well.