Talks

Ruby and the World Record Pi Calculation

RubyKaigi 2024

00:00:02.760 All right, I guess we are starting. Hello!
00:00:04.920 Thank you for coming to my talk. It will be about the world record calculation of pi.
00:00:06.600 My name is Emma Haruka Iwao, and I'm a software engineer at Google Cloud.
00:00:09.639 So why am I here talking about pi and Ruby? Well, I'm a two-time world record holder for the most accurate pi ever calculated.
00:00:13.480 I've achieved this with 31.4 trillion digits in 2019 and 62.8 trillion digits in 2022.
00:00:15.279 So I know a little bit about pi and pi calculations.
00:00:19.039 Before talking about how we do this, let me discuss why we do this.
00:00:21.600 Why do we need more digits of pi? Because we only need about 30 to 40 decimals for most scientific computations.
00:00:24.640 As Donald Knuth mentions in his book, 'The Art of Computer Programming,' human progress in calculation has traditionally been measured by the number of decimal digits of pi.
00:00:28.000 By calculating more digits, we are somehow showcasing that human civilization has been advancing and improving in mathematics.
00:00:31.200 So I'm preparing for a potential extraterrestrial intelligence invasion or something.
00:00:34.080 In fact, we've been doing this for thousands of years. The earliest known records of pi date back to around 2000 BC, nearly 4,000 years ago.
00:00:39.079 Since we started using computers, the number of calculated digits has exploded.
00:00:42.200 This is a logarithmic graph showing exponential growth.
00:00:45.680 Pi is also a popular benchmark for PCs. In 1995, Super Pi was released by researchers at the University of Tokyo, and it can compute up to 16 million digits.
00:00:49.840 Two years later, PiFast was released, which supports calculations up to 16 billion digits, and in 2009, y-cruncher was introduced, which now supports up to 108 quadrillion digits, although nobody has actually tested that.
00:00:57.719 There is another category: pi is also a popular benchmark among PC overclockers, who calculate fewer digits—like one billion digits—very quickly.
00:01:05.239 The current world record is 4 seconds and 48 milliseconds.
00:01:07.760 The second place is at 4 seconds and 442 milliseconds, so they are basically competing by the millisecond.
00:01:11.040 It's fairly competitive, but today the calculation involves massive trillions of digits.
00:01:14.000 We need to work within certain constraints, such as resources and environments available to us.
00:01:17.000 Let me go over each step. We use y-cruncher, which was developed by Alexander Yee, who started this project as a high school project.
00:01:22.680 He has been working on it for a few decades, and it is currently the fastest available non-parallel program to calculate pi on a single-node computer.
00:01:26.000 A single-node computer means it's not a supercomputer, and y-cruncher is written in C++ and optimized for modern CPUs.
00:01:30.580 It supports some of the latest instructions like AVX512.
00:01:34.699 To calculate 100 trillion digits of pi, you need a fairly fast computer with a lot of memory and storage.
00:01:40.360 In this case, that means you need 468 terabytes of storage or memory in your workspace.
00:01:45.500 Unfortunately, we don't have that much available storage.
00:01:48.000 You can't fit everything into memory, so you need to attach storage to your workspace.
00:01:53.120 The storage system is usually several orders of magnitude slower than the CPU.
00:01:56.000 In other words, CPU speed is not the most important; the important part is the I/O and disk throughput.
00:02:00.200 The calculation of 100 trillion digits took 157 days.
00:02:03.000 During this time, the calculation moved 62 petabytes of data, and the average CPU utilization during the calculation was only 35%.
00:02:07.400 Basically, 2,000 hours of the time was spent on I/O and waiting for the disks to feed data.
00:02:11.000 So even with an infinitely fast CPU, the calculation would still take over 100 days.
00:02:15.000 Now, let’s talk about some of the moving pieces.
00:02:17.840 We are using Google Compute Engine as our environment because I happen to work for this cloud provider.
00:02:20.960 I had access to the resources, and the maximum disk size you can attach to a single VM is 257 terabytes.
00:02:23.400 That's fairly big, but not enough for our case. We needed more than 500 terabytes, including some buffers.
00:02:27.200 We can't precisely measure the exact disk requirements.
00:02:30.120 So, you need to mount additional disks somehow. In our case, we used iSCSI.
00:02:33.000 iSCSI is an industry standard protocol to mount block devices over TCP/IP.
00:02:36.000 It's implemented in the Linux operating system, and if you use network, the bandwidth limit is 100 Gbps.
00:02:38.560 So, that's the throughput we want to maximize.
00:02:43.000 The overall architecture looks like this: we have the main VM running y-cruncher, which has some big CPUs.
00:02:48.000 There are additional nodes providing I/O targets and the necessary workspace for y-cruncher.
00:02:51.680 In the top right corner, there are small 50 terabyte disks attached directly to the main VM, but these are only used for the final results.
00:02:56.560 They are not used during the calculations.
00:03:00.640 We want to maximize the throughput between the large disks and the array of storage nodes.
00:03:04.560 If you look closer inside the data paths, there are multiple layers of complexity, especially with the use of network storage.
00:03:09.400 Y-cruncher writes to the file system, and then the file system issues I/O requests to the block device.
00:03:13.320 The I/O scheduler reorders and dispatches those requests, which then go to the iSCSI initiator.
00:03:17.600 The packets travel through the TCP/IP stack and then over the cloud network.
00:03:21.600 We don't have much control over the cloud network.
00:03:25.920 The reverse process occurs as well, and we do not have a file system in this case, because it's a block device.
00:03:30.080 But there are many layers and parameters that you can configure.
00:03:34.320 For instance, regarding file systems, should we use ext4, xfs, or something else?
00:03:37.200 Do we want to use a scheduler like the deadline scheduler or none at all?
00:03:39.440 There are TCP/IP parameters that can affect performance, like buffer sizes and the congestion algorithm.
00:03:43.800 And there are I/O parameters like queue depth and the number of outstanding requests.
00:03:46.640 Do we want to allow simultaneous multithreading for the CPU, or just one thread per core?
00:03:49.760 There is also a very important parameter called blocks per second.
00:03:51.920 This refers to block access size.
00:03:54.960 There are cloud-specific configurations like instance type and persistent disk type.
00:03:58.760 For example, do we want to use SDD disks, balanced disks, or more throughput-oriented disks?
00:04:03.200 There are a lot of parameters to consider, and you want to make sure that you are choosing the best options.
00:04:08.000 Every tuning can significantly affect the overall performance.
00:04:12.080 If the calculation is massive, it could take multiple months; therefore, if something is even 1% faster, it could save a day in total computation time.
00:04:15.120 Y-cruncher has a benchmark mode where you can test different parameters.
00:04:18.240 However, each run takes around 30 to 60 minutes, so you definitely don't want to wait in front of the terminal.
00:04:22.560 Instead, you might want to automate some parts of the process.
00:04:27.840 Let's discuss automation and how to manage most of the tasks, but not all.
00:04:30.920 Importantly, this is what the y-cruncher configuration file looks like.
00:04:35.360 It looks like JSON, but it is not. It is a custom format, meaning you can't use standard libraries to manipulate the file.
00:04:39.680 There is a lot of both love and hate towards YAML and JSON, but you come to appreciate some of the available libraries.
00:04:43.440 So how do we deal with that? Here comes ERB to the rescue.
00:04:46.080 If you're using Ruby, you're probably familiar with ERB.
00:04:49.920 ERB is a template engine that allows you to embed Ruby code within a document.
00:04:53.760 You place Ruby code in brackets, which executes as code.
00:04:56.440 If you add an equal sign, the block is replaced with the output of the code.
00:05:01.479 For example, if you put the string reader in the block, the output will include that string.
00:05:04.760 There is an ERB talk happening right now in the main hall, so thank you for being here.
00:05:08.880 I’m sure you can check the recording later.
00:05:11.360 There are two ways to use ERB.
00:05:13.840 For example, if you have a file, you can pass the location as a parameter to the ERB command to generate the output.
00:05:17.440 Alternatively, you can use the ERB class from your code.
00:05:20.560 You can read the file, pass the content to the ERB class, create the object, and print the result.
00:05:23.960 Both methods will yield the same result.
00:05:26.680 I created a custom script that has parameters like blocks per second and the number of disks you want to attach to the virtual machine.
00:05:31.920 You retrieve the benchmark template config file, set the parameters, and write the config file.
00:05:35.040 Finally, call the system method to launch y-cruncher.
00:05:37.760 So instead of writing a separate shell script, I decided to execute everything in the Ruby script.
00:05:42.240 This compact Ruby script runs y-cruncher with different numbers of disks, ranging from 32 to 72.
00:05:45.560 It's automated, so if you start this script before leaving the office, by the next morning, you should have the final results ready.
00:05:49.720 This is the ERB file, which is just a portion of the overall config.
00:05:51.760 You need to configure each storage volume as a line in the y-cruncher configuration.
00:05:54.200 You can write loops in ERB; here, you simply define one line template and let it expand to multiple lines.
00:05:58.000 Y-cruncher recognizes and uses all these disks.
00:06:02.000 You might ask why I didn't use my favorite template engine.
00:06:05.520 I think that any tool could have worked, but I chose one I was already familiar with.
00:06:08.400 The goal was to finish benchmarking as quickly as possible, rather than learning a new tool.
00:06:12.360 This is my comfortable path.
00:06:15.120 When you run y-cruncher in the terminal, this is what the result looks like.
00:06:18.919 The results are color-coded, and as you run the program, you can see the progress animated.
00:06:22.240 The most important numbers are displayed at the bottom right, along with y-cruncher specific parameters.
00:06:26.919 We need to extract these numbers from this output and save them to a text file.
00:06:30.000 After processing the output, the last command may warn about a binary file.
00:06:33.120 This is because the file contains many escape sequences.
00:06:35.000 I don’t know of any options to disable these escape sequences, so you need to work with them to pull the data you want.
00:06:39.440 This is the final result I arrived at, using a shell one-liner to extract the numbers.
00:06:42.640 Let me explain my thought process.
00:06:44.080 First, we need to filter the lines we care about because the overall output is fairly large.
00:06:47.200 The first regular expression picks out specific keywords like speed and computation.
00:06:49.840 The result will contain some relevant lines, even though there may be extra ones.
00:06:54.920 However, we can further process this text file later.
00:06:58.080 We can now filter out the escape sequences to see the important numbers.
00:07:01.640 I specifically used the GTE (giga translations per second) numbers as markers to help extract values.
00:07:04.560 To ease understanding, I chose to split the extraction into several commands.
00:07:09.920 However, some lines repeat, as y-cruncher writes the final result on separate lines for different outputs.
00:07:13.200 The first group appears multiple times while the last two lines concerning computation appear only once.
00:07:19.360 To manage this, we can utilize the 'sed' command, which allows text substitution.
00:07:23.280 This command can also extract lines by applying specific conditions.
00:07:26.080 Using the 'n' option to output only matching patterns, we define specific line counts to print.
00:07:30.320 Finally, we can filter for exact lines based on the requirements.
00:07:33.920 To clean up, we can remove unnecessary markers we used earlier to help keep things organized.
00:07:36.960 When handling large amounts of data, it’s often best to use structured files such as CSV or TSV.
00:07:40.000 Although some argue that CSV isn't truly structured, it still serves a purpose.
00:07:42.880 In this case, we want to convert our separate entries into a single comma-separated line.
00:07:45.440 You can still use Ruby or any programming language, but there's a handy command in the Linux toolkit for this.
00:07:49.600 The 'paste' command reads input from the file and replaces newline characters with a delimiter of your choice.
00:07:52.880 So, if you use a comma as the delimiter, all lines will be joined on one line, separated by commas.
00:07:55.760 This might not work for all cases, but it’s suitable for our data.
00:08:00.320 Now we have our CSV file ready.
00:08:03.210 Next, we want to process the benchmark results, which are stored across multiple files.
00:08:06.560 Using the 'find' command, we can gather all results into a single CSV file.
00:08:09.440 With our CSV output, we can crunch the numbers—what’s the best tool for that?
00:08:12.320 There is Jupyter, or you can utilize Google Sheets.
00:08:14.400 I chose Google Sheets for its popularity at my workplace.
00:08:17.520 During the benchmarking, I set columns for the parameters I planned to analyze.
00:08:20.480 However, I ended up adding extra parameters as I saw their potential during the process.
00:08:24.960 I even created a notes section in the last column to keep random observations.
00:08:28.000 While it wasn’t structured in a professional manner, it was helpful with fresh memories.
00:08:31.120 There’s another spreadsheet showing the throughput for different numbers of disks.
00:08:34.960 This confirmed that y-cruncher scales quite well up to 72 storage volumes.
00:08:39.000 The difference between my initial configuration and the best yield was substantial.
00:08:42.560 The most crucial figure was the blocks per second, which came from y-cruncher.
00:08:45.520 This specific workflow simulation improved performance by over 200%.
00:08:48.720 If the overall calculation took 157 days, relying on the initial configuration could have taken more than 300 days.
00:08:52.240 By running those benchmarks, I saved more than five months and, consequently, a lot of resources for companies.
00:08:56.640 At this point, you may wonder why I am sharing all this at RubyKaigi.
00:09:01.040 I’m more comfortable with Linux commands for text processing, even though everything can be written in Ruby.
00:09:04.200 This method allows me to experiment more effectively, especially with unknown input formats.
00:09:09.000 It’s a trial-and-error process; after all, you don’t regularly calculate pi every month.
00:09:13.720 Unlike a production system, there are no maintenance concerns.
00:09:16.800 If y-cruncher changes its format in the future, then yes, your program might break.
00:09:20.160 That's why there’s no point in over-optimizing.
00:09:22.560 Also, using Google Sheets is efficient for sharing and collaboration with co-workers.
00:09:25.760 So that became my tool of choice.
00:09:28.160 Finally, at the end of the assessment, I proceeded with the final configurations.
00:09:31.760 A few months later, I had the final results ready.
00:09:34.560 This is the terminal view when y-cruncher finishes the calculation.
00:09:37.600 Now, before we wrap up the conversation, I’d like to share a story about my journey to the world records.
00:09:42.320 This is RubyKaigi, after all, and I believe a story about Ruby will be interesting.
00:09:46.840 My first RubyKaigi was in 2009 in Tokyo when I was still a university student.
00:09:51.200 I tweeted about everything even though I might not have understood everything.
00:09:55.600 My senpai noticed my interest in Ruby from the conference.
00:10:00.000 A few years later, he suggested I attend RailsConf Tokyo in 2013 to learn more about Rails and Ruby.
00:10:04.800 I thought it was cool; I really enjoyed Rails, so I started contributing as a coach.
00:10:08.680 Many years later, I decided to speak at RubyKaigi 2014, presenting a proposal about Rails and diversity in tech.
00:10:13.760 It was accepted, and I had the opportunity to speak for the first time.
00:10:17.600 I appreciate the organizers for maintaining the historical archives of the event.
00:10:21.080 This experience opened more doors for me.
00:10:25.760 I got a second job through a referral from someone I met at RubyKaigi.
00:10:29.280 His impression from my tweets was that I might be a good programmer.
00:10:31.680 I still find that rather risky, but it did bring me success.
00:10:35.120 When I interviewed at Google in 2017, I submitted my RubyKaigi video link as a demonstration of my speaking abilities.
00:10:38.920 The hiring committee appreciated it, and I got an offer.
00:10:43.760 Moreover, they valued my contributions to the Ruby community; such contributions are essential for developers.
00:10:49.760 As it turns out, Google Cloud is one of the best environments for calculating pi.
00:10:54.720 I wanted to calculate pi, but I didn't have access to resources.
00:10:57.440 I don’t have a supercomputer just lying around, fortunately.
00:11:01.920 Google Cloud has P Day celebrations, where we experiment with new cloud technologies.
00:11:06.560 This provided the perfect setting for my idea.
00:11:08.560 With all the right skills, I proposed the idea to my managers and directors.
00:11:12.240 They were intrigued and allowed me to try.
00:11:16.520 You can see from this site that by 2015, 250 billion digits had been calculated.
00:11:20.640 By 2018, the number reached 500 billion.
00:11:25.280 But in 2019, the world record was achieved.
00:11:29.920 That was quite a leap!
00:11:32.000 Greg Wilson was a director at the time, and everyone enjoyed the concept.
00:11:36.080 It was an absolute pleasure working with fellow pi enthusiasts.
00:11:41.000 And that's the story.
00:11:44.080 I think this is a good RubyKaigi talk, as Ruby made these pi records possible.
00:11:47.560 Without Ruby, my career path could have looked very different.
00:11:52.600 I might not have worked at Google, nor broken the world record for pi twice.
00:11:56.000 Ultimately, without these resources, I wouldn't be standing here today, sharing this story.
00:12:00.560 Thank you very much!