Talks

On Ruby and ꝩduЯ, or How Scary are Trojan Source Attacks

RubyKaigi 2023

00:00:04.339 Good morning, everybody. Welcome to this first session today.
00:00:13.019 The title is "On Ruby and ꝩduЯ, or How Scary are Trojan Source Attacks." This is the overview.
00:00:21.840 I’ll have a little bit that may scare you at some point, so if you don't like to get scared, maybe you should leave.
00:00:34.739 This is my introduction, just to show that I have done related work before.
00:00:40.020 My students and I have also contributed to Ruby in many different ways, mostly on internationalization.
00:00:46.620 The start for this talk is based on a paper titled 'Trojan Source: Invisible Vulnerabilities.'
00:01:02.399 There is a spelling mistake here; the word 'vulnerabilities' is very interesting in terms of English orthography.
00:01:13.619 The spelling of the 'J' here is actually changed to fit the adjective 'Triple J.' Anyway, this paper came out about two and a half years ago.
00:01:25.560 It was revised and will be presented later this year. Who wrote this paper?
00:01:36.840 Well, Nicolas is probably a student in Cambridge, and Rose Anderson is a well-known security expert.
00:01:51.540 This book I referenced is actually over 1200 pages and weighs 1.8 kilograms. If it was a little bit lighter, I would have brought it here to show you, but it’s a bit too heavy.
00:02:11.400 Now, imagine that you are out at sea at night, and there's fog. Suddenly, you get attacked!
00:02:23.400 I’m not sure by zombies; I gave a lightning talk on zombies once at RubyKaigi.
00:02:28.500 That's actually the website for this paper, so I would at least give this paper a prize for scariest website associated with it. Anyway, let's go back to the talk and see how good you are with Ruby.
00:03:12.840 Here is a small Ruby program, and the question here is: who wins? What will the program print out?
00:03:39.360 So can I please ask for your hands? If you think, after executing this program, it prints out 'white,' please raise your left hand. If you think it will print out 'black,' then please raise your right hand. Everybody give it a try.
00:04:05.640 Okay, we have a lot of left hands, but we also have some right hands, which are kind of contrarians for whatever reason.
00:04:15.060 Let’s go to the next example.
00:04:20.699 This example looks exactly the same, but it’s actually different. I can show you here.
00:05:14.280 Again, sorry for the confusion in this program. Who thinks that 'white' wins? Please raise your left hand. Who thinks that 'black' wins? Please raise your right hand.
00:05:42.960 Okay, we have almost the same distribution. Let’s go on to the next example.
00:06:17.100 This is a bit simpler. In this example, we don't even do any calculations; we simply compensate for the fact that black plays first.
00:07:06.240 What do you think? Who wins? Again, please use your hands.
00:07:09.060 Who thinks that the program prints out 'white'? Who thinks it prints 'black'? Okay, you're not completely wrong. Let’s move on.
00:07:38.820 Now it gets interesting. What will this program do? This is a legal Ruby program, in most versions at least.
00:08:15.420 Again, please raise your hands if you want to take a guess at its output.
00:08:41.760 We have made some guesses. Now, let's see my guesses compared to yours.
00:09:49.320 I think I wasn’t too far off, but let's explain what just happened.
00:10:44.700 In example one, there are some special characters. The assignment goes to a different variable, and 'white' and 'black' are at 1445.
00:11:05.080 The score becomes negative, and thus 'black' wins.
00:11:32.339 To give you more details, one of the 'i's is actually the Cyrillic small 'i' while the other is the usual Latin 'i.' You see, we even have some political implications here.
00:12:02.999 It's interesting to note that the Belarusians and the Russians are together, whereas the Ukrainians are separate in politics.
00:12:34.560 Anyway, this one is called a homoglyph attack, using identifiers that look the same but contain different characters.
00:12:51.060 This is well known from internationalized domain names, although it happens rarely there.
00:13:57.600 Next, what's going on in example two? In example two, we cannot search for the word 'white' throughout the text, but we can do that in the second example.
00:14:34.780 However, there is an invisible zero-width space that you cannot see.
00:15:12.180 Why is this possible? Ruby allows all non-ASCII spaces in identifiers. So I'll call this an invisible space attack, which is not well known.
00:15:35.220 This may be the only exception in Ruby, and we should close this loophole.
00:16:00.500 In example four, there are many different spaces in Unicode, including the zero-width space and the ideographic space, better known as full-width space.
00:16:45.200 This allows us to write programs using invisible characters, but we cannot conceal critical elements such as 'if' and 'else'.
00:17:23.100 This shows that a program structured like this is most likely going to be discovered very rapidly.
00:18:10.740 Next example: Here is the actual code, and I will concentrate on this example, as it's a bit more intricate.
00:18:59.160 In this example, I added various characters. We look at these in detail; they are known as Unicode bi-directional formatting characters.
00:19:37.740 The structure allows these characters as part of variable names in Ruby; almost everything is allowed.
00:20:18.420 What is the character involved here? The character is 'subtracting,' or 'black minus white.'
00:20:46.020 But then these special characters control how this is displayed, which explains why many of you guessed incorrectly.
00:21:30.960 That's how the bi-directional ordering attack works.
00:21:59.760 Here is a list of all bi-directional control characters for your reference.
00:22:39.540 In summary, these invisible characters can change the order of display, and they’re needed for special situations with bi-directional text.
00:23:15.960 Each line of code represents a separate paragraph in this algorithm.
00:23:51.420 For an attack to be successful, it needs to fit in a single line.
00:24:28.020 Most text is displayed left to right, but certain languages like Arabic or Hebrew are right to left.
00:25:02.040 However, in programming, symbols such as operators define structure, and the words should ideally follow that structure.
00:25:48.300 That was a detailed explanation of example three, and I hope you grasp the significance of these vulnerabilities.
00:26:30.960 We need to implement defenses in depth; it’s about more than just blaming one aspect of the ecosystem.
00:27:17.880 In addition, a healthy ecosystem is necessary to help prevent potential attacks.
00:28:00.320 I’d like to highlight how editors display information and advise against forcing team members to use the same editor.
00:29:05.640 In summary, we should ensure that any programming language tokens are displayed as words that can be read correctly by those familiar with Arabic or Hebrew.
00:30:03.740 Next steps for Ruby must address these vulnerabilities and aim to eliminate non-ASCII spaces and control characters.
00:30:37.520 In summary, our approach should be comprehensive, addressing multiple factors instead of simply pointing fingers.
00:31:15.720 Thank you for your attention. For questions, feel free to contact me directly, and I will provide a link to my presentation as well.