Performance Optimization

Summarized using AI

Ruby and the World Record Pi Calculation

Emma Haruka Iwao • May 17, 2024 • Naha, Okinawa, Japan

In this talk titled "Ruby and the World Record Pi Calculation," Emma Haruka Iwao, a software engineer at Google Cloud, shares her journey in calculating pi to record-breaking lengths—31.4 trillion digits in 2019 and 62.8 trillion digits in 2022. The presentation delves into the reasons behind calculating more digits of pi, emphasizing the historical significance of pi in measuring human progress in mathematics. Emma elaborates on the technical aspects of using y-cruncher, a powerful software for pi calculations, which she describes as optimized for modern CPUs and capable of functioning on a single-node computer.

Key Points Discussed:
- Historical Context: The calculation of pi has been a significant endeavor for thousands of years and serves as an impressive benchmark in computing.
- The Importance of Digits: While most scientific tasks require only 30 to 40 digits of pi, extensive calculations reflect advancements in man-made capabilities.
- Technical Challenges: The calculation of 100 trillion digits took 157 days, primarily due to the constraints of I/O and disk throughput, despite only 35% average CPU utilization during the process.
- Using Google Cloud: Access to Google Compute Engine was crucial for managing the vast storage and computing needs, utilizing protocols like iSCSI to maximize throughput.
- Optimization Techniques: Emma explored a multitude of parameters, including file system choices, TCP/IP configurations, and I/O scheduling to enhance performance, ultimately achieving significant efficiency improvements, saving months of computational time.
- Automation with Ruby: Emma demonstrated how she utilized ERB, a Ruby template engine, to automate the y-cruncher configuration process, showcasing the intersection of Ruby programming with high-performance computing.
- Personal Journey: Emma shared her background with Ruby, from attending RubyKaigi to her career development facilitated by her work within the Ruby community.

Conclusions and Takeaways:
Emma concluded that the advancements in pi calculations and her record achievements were made possible through her technical expertise, Ruby programming skills, and the supportive environment of Google Cloud. Her experiences highlight the integral role that community and collaboration play in technological progress and career development. Ultimately, the ability to not only calculate pi but to do it efficiently and innovatively exemplifies the potential of programming skills in solving complex challenges.

Ruby and the World Record Pi Calculation
Emma Haruka Iwao • May 17, 2024 • Naha, Okinawa, Japan

RubyKaigi 2024

00:00:02.760 All right, I guess we are starting. Hello!
00:00:04.920 Thank you for coming to my talk. It will be about the world record calculation of pi.
00:00:06.600 My name is Emma Haruka Iwao, and I'm a software engineer at Google Cloud.
00:00:09.639 So why am I here talking about pi and Ruby? Well, I'm a two-time world record holder for the most accurate pi ever calculated.
00:00:13.480 I've achieved this with 31.4 trillion digits in 2019 and 62.8 trillion digits in 2022.
00:00:15.279 So I know a little bit about pi and pi calculations.
00:00:19.039 Before talking about how we do this, let me discuss why we do this.
00:00:21.600 Why do we need more digits of pi? Because we only need about 30 to 40 decimals for most scientific computations.
00:00:24.640 As Donald Knuth mentions in his book, 'The Art of Computer Programming,' human progress in calculation has traditionally been measured by the number of decimal digits of pi.
00:00:28.000 By calculating more digits, we are somehow showcasing that human civilization has been advancing and improving in mathematics.
00:00:31.200 So I'm preparing for a potential extraterrestrial intelligence invasion or something.
00:00:34.080 In fact, we've been doing this for thousands of years. The earliest known records of pi date back to around 2000 BC, nearly 4,000 years ago.
00:00:39.079 Since we started using computers, the number of calculated digits has exploded.
00:00:42.200 This is a logarithmic graph showing exponential growth.
00:00:45.680 Pi is also a popular benchmark for PCs. In 1995, Super Pi was released by researchers at the University of Tokyo, and it can compute up to 16 million digits.
00:00:49.840 Two years later, PiFast was released, which supports calculations up to 16 billion digits, and in 2009, y-cruncher was introduced, which now supports up to 108 quadrillion digits, although nobody has actually tested that.
00:00:57.719 There is another category: pi is also a popular benchmark among PC overclockers, who calculate fewer digits—like one billion digits—very quickly.
00:01:05.239 The current world record is 4 seconds and 48 milliseconds.
00:01:07.760 The second place is at 4 seconds and 442 milliseconds, so they are basically competing by the millisecond.
00:01:11.040 It's fairly competitive, but today the calculation involves massive trillions of digits.
00:01:14.000 We need to work within certain constraints, such as resources and environments available to us.
00:01:17.000 Let me go over each step. We use y-cruncher, which was developed by Alexander Yee, who started this project as a high school project.
00:01:22.680 He has been working on it for a few decades, and it is currently the fastest available non-parallel program to calculate pi on a single-node computer.
00:01:26.000 A single-node computer means it's not a supercomputer, and y-cruncher is written in C++ and optimized for modern CPUs.
00:01:30.580 It supports some of the latest instructions like AVX512.
00:01:34.699 To calculate 100 trillion digits of pi, you need a fairly fast computer with a lot of memory and storage.
00:01:40.360 In this case, that means you need 468 terabytes of storage or memory in your workspace.
00:01:45.500 Unfortunately, we don't have that much available storage.
00:01:48.000 You can't fit everything into memory, so you need to attach storage to your workspace.
00:01:53.120 The storage system is usually several orders of magnitude slower than the CPU.
00:01:56.000 In other words, CPU speed is not the most important; the important part is the I/O and disk throughput.
00:02:00.200 The calculation of 100 trillion digits took 157 days.
00:02:03.000 During this time, the calculation moved 62 petabytes of data, and the average CPU utilization during the calculation was only 35%.
00:02:07.400 Basically, 2,000 hours of the time was spent on I/O and waiting for the disks to feed data.
00:02:11.000 So even with an infinitely fast CPU, the calculation would still take over 100 days.
00:02:15.000 Now, let’s talk about some of the moving pieces.
00:02:17.840 We are using Google Compute Engine as our environment because I happen to work for this cloud provider.
00:02:20.960 I had access to the resources, and the maximum disk size you can attach to a single VM is 257 terabytes.
00:02:23.400 That's fairly big, but not enough for our case. We needed more than 500 terabytes, including some buffers.
00:02:27.200 We can't precisely measure the exact disk requirements.
00:02:30.120 So, you need to mount additional disks somehow. In our case, we used iSCSI.
00:02:33.000 iSCSI is an industry standard protocol to mount block devices over TCP/IP.
00:02:36.000 It's implemented in the Linux operating system, and if you use network, the bandwidth limit is 100 Gbps.
00:02:38.560 So, that's the throughput we want to maximize.
00:02:43.000 The overall architecture looks like this: we have the main VM running y-cruncher, which has some big CPUs.
00:02:48.000 There are additional nodes providing I/O targets and the necessary workspace for y-cruncher.
00:02:51.680 In the top right corner, there are small 50 terabyte disks attached directly to the main VM, but these are only used for the final results.
00:02:56.560 They are not used during the calculations.
00:03:00.640 We want to maximize the throughput between the large disks and the array of storage nodes.
00:03:04.560 If you look closer inside the data paths, there are multiple layers of complexity, especially with the use of network storage.
00:03:09.400 Y-cruncher writes to the file system, and then the file system issues I/O requests to the block device.
00:03:13.320 The I/O scheduler reorders and dispatches those requests, which then go to the iSCSI initiator.
00:03:17.600 The packets travel through the TCP/IP stack and then over the cloud network.
00:03:21.600 We don't have much control over the cloud network.
00:03:25.920 The reverse process occurs as well, and we do not have a file system in this case, because it's a block device.
00:03:30.080 But there are many layers and parameters that you can configure.
00:03:34.320 For instance, regarding file systems, should we use ext4, xfs, or something else?
00:03:37.200 Do we want to use a scheduler like the deadline scheduler or none at all?
00:03:39.440 There are TCP/IP parameters that can affect performance, like buffer sizes and the congestion algorithm.
00:03:43.800 And there are I/O parameters like queue depth and the number of outstanding requests.
00:03:46.640 Do we want to allow simultaneous multithreading for the CPU, or just one thread per core?
00:03:49.760 There is also a very important parameter called blocks per second.
00:03:51.920 This refers to block access size.
00:03:54.960 There are cloud-specific configurations like instance type and persistent disk type.
00:03:58.760 For example, do we want to use SDD disks, balanced disks, or more throughput-oriented disks?
00:04:03.200 There are a lot of parameters to consider, and you want to make sure that you are choosing the best options.
00:04:08.000 Every tuning can significantly affect the overall performance.
00:04:12.080 If the calculation is massive, it could take multiple months; therefore, if something is even 1% faster, it could save a day in total computation time.
00:04:15.120 Y-cruncher has a benchmark mode where you can test different parameters.
00:04:18.240 However, each run takes around 30 to 60 minutes, so you definitely don't want to wait in front of the terminal.
00:04:22.560 Instead, you might want to automate some parts of the process.
00:04:27.840 Let's discuss automation and how to manage most of the tasks, but not all.
00:04:30.920 Importantly, this is what the y-cruncher configuration file looks like.
00:04:35.360 It looks like JSON, but it is not. It is a custom format, meaning you can't use standard libraries to manipulate the file.
00:04:39.680 There is a lot of both love and hate towards YAML and JSON, but you come to appreciate some of the available libraries.
00:04:43.440 So how do we deal with that? Here comes ERB to the rescue.
00:04:46.080 If you're using Ruby, you're probably familiar with ERB.
00:04:49.920 ERB is a template engine that allows you to embed Ruby code within a document.
00:04:53.760 You place Ruby code in brackets, which executes as code.
00:04:56.440 If you add an equal sign, the block is replaced with the output of the code.
00:05:01.479 For example, if you put the string reader in the block, the output will include that string.
00:05:04.760 There is an ERB talk happening right now in the main hall, so thank you for being here.
00:05:08.880 I’m sure you can check the recording later.
00:05:11.360 There are two ways to use ERB.
00:05:13.840 For example, if you have a file, you can pass the location as a parameter to the ERB command to generate the output.
00:05:17.440 Alternatively, you can use the ERB class from your code.
00:05:20.560 You can read the file, pass the content to the ERB class, create the object, and print the result.
00:05:23.960 Both methods will yield the same result.
00:05:26.680 I created a custom script that has parameters like blocks per second and the number of disks you want to attach to the virtual machine.
00:05:31.920 You retrieve the benchmark template config file, set the parameters, and write the config file.
00:05:35.040 Finally, call the system method to launch y-cruncher.
00:05:37.760 So instead of writing a separate shell script, I decided to execute everything in the Ruby script.
00:05:42.240 This compact Ruby script runs y-cruncher with different numbers of disks, ranging from 32 to 72.
00:05:45.560 It's automated, so if you start this script before leaving the office, by the next morning, you should have the final results ready.
00:05:49.720 This is the ERB file, which is just a portion of the overall config.
00:05:51.760 You need to configure each storage volume as a line in the y-cruncher configuration.
00:05:54.200 You can write loops in ERB; here, you simply define one line template and let it expand to multiple lines.
00:05:58.000 Y-cruncher recognizes and uses all these disks.
00:06:02.000 You might ask why I didn't use my favorite template engine.
00:06:05.520 I think that any tool could have worked, but I chose one I was already familiar with.
00:06:08.400 The goal was to finish benchmarking as quickly as possible, rather than learning a new tool.
00:06:12.360 This is my comfortable path.
00:06:15.120 When you run y-cruncher in the terminal, this is what the result looks like.
00:06:18.919 The results are color-coded, and as you run the program, you can see the progress animated.
00:06:22.240 The most important numbers are displayed at the bottom right, along with y-cruncher specific parameters.
00:06:26.919 We need to extract these numbers from this output and save them to a text file.
00:06:30.000 After processing the output, the last command may warn about a binary file.
00:06:33.120 This is because the file contains many escape sequences.
00:06:35.000 I don’t know of any options to disable these escape sequences, so you need to work with them to pull the data you want.
00:06:39.440 This is the final result I arrived at, using a shell one-liner to extract the numbers.
00:06:42.640 Let me explain my thought process.
00:06:44.080 First, we need to filter the lines we care about because the overall output is fairly large.
00:06:47.200 The first regular expression picks out specific keywords like speed and computation.
00:06:49.840 The result will contain some relevant lines, even though there may be extra ones.
00:06:54.920 However, we can further process this text file later.
00:06:58.080 We can now filter out the escape sequences to see the important numbers.
00:07:01.640 I specifically used the GTE (giga translations per second) numbers as markers to help extract values.
00:07:04.560 To ease understanding, I chose to split the extraction into several commands.
00:07:09.920 However, some lines repeat, as y-cruncher writes the final result on separate lines for different outputs.
00:07:13.200 The first group appears multiple times while the last two lines concerning computation appear only once.
00:07:19.360 To manage this, we can utilize the 'sed' command, which allows text substitution.
00:07:23.280 This command can also extract lines by applying specific conditions.
00:07:26.080 Using the 'n' option to output only matching patterns, we define specific line counts to print.
00:07:30.320 Finally, we can filter for exact lines based on the requirements.
00:07:33.920 To clean up, we can remove unnecessary markers we used earlier to help keep things organized.
00:07:36.960 When handling large amounts of data, it’s often best to use structured files such as CSV or TSV.
00:07:40.000 Although some argue that CSV isn't truly structured, it still serves a purpose.
00:07:42.880 In this case, we want to convert our separate entries into a single comma-separated line.
00:07:45.440 You can still use Ruby or any programming language, but there's a handy command in the Linux toolkit for this.
00:07:49.600 The 'paste' command reads input from the file and replaces newline characters with a delimiter of your choice.
00:07:52.880 So, if you use a comma as the delimiter, all lines will be joined on one line, separated by commas.
00:07:55.760 This might not work for all cases, but it’s suitable for our data.
00:08:00.320 Now we have our CSV file ready.
00:08:03.210 Next, we want to process the benchmark results, which are stored across multiple files.
00:08:06.560 Using the 'find' command, we can gather all results into a single CSV file.
00:08:09.440 With our CSV output, we can crunch the numbers—what’s the best tool for that?
00:08:12.320 There is Jupyter, or you can utilize Google Sheets.
00:08:14.400 I chose Google Sheets for its popularity at my workplace.
00:08:17.520 During the benchmarking, I set columns for the parameters I planned to analyze.
00:08:20.480 However, I ended up adding extra parameters as I saw their potential during the process.
00:08:24.960 I even created a notes section in the last column to keep random observations.
00:08:28.000 While it wasn’t structured in a professional manner, it was helpful with fresh memories.
00:08:31.120 There’s another spreadsheet showing the throughput for different numbers of disks.
00:08:34.960 This confirmed that y-cruncher scales quite well up to 72 storage volumes.
00:08:39.000 The difference between my initial configuration and the best yield was substantial.
00:08:42.560 The most crucial figure was the blocks per second, which came from y-cruncher.
00:08:45.520 This specific workflow simulation improved performance by over 200%.
00:08:48.720 If the overall calculation took 157 days, relying on the initial configuration could have taken more than 300 days.
00:08:52.240 By running those benchmarks, I saved more than five months and, consequently, a lot of resources for companies.
00:08:56.640 At this point, you may wonder why I am sharing all this at RubyKaigi.
00:09:01.040 I’m more comfortable with Linux commands for text processing, even though everything can be written in Ruby.
00:09:04.200 This method allows me to experiment more effectively, especially with unknown input formats.
00:09:09.000 It’s a trial-and-error process; after all, you don’t regularly calculate pi every month.
00:09:13.720 Unlike a production system, there are no maintenance concerns.
00:09:16.800 If y-cruncher changes its format in the future, then yes, your program might break.
00:09:20.160 That's why there’s no point in over-optimizing.
00:09:22.560 Also, using Google Sheets is efficient for sharing and collaboration with co-workers.
00:09:25.760 So that became my tool of choice.
00:09:28.160 Finally, at the end of the assessment, I proceeded with the final configurations.
00:09:31.760 A few months later, I had the final results ready.
00:09:34.560 This is the terminal view when y-cruncher finishes the calculation.
00:09:37.600 Now, before we wrap up the conversation, I’d like to share a story about my journey to the world records.
00:09:42.320 This is RubyKaigi, after all, and I believe a story about Ruby will be interesting.
00:09:46.840 My first RubyKaigi was in 2009 in Tokyo when I was still a university student.
00:09:51.200 I tweeted about everything even though I might not have understood everything.
00:09:55.600 My senpai noticed my interest in Ruby from the conference.
00:10:00.000 A few years later, he suggested I attend RailsConf Tokyo in 2013 to learn more about Rails and Ruby.
00:10:04.800 I thought it was cool; I really enjoyed Rails, so I started contributing as a coach.
00:10:08.680 Many years later, I decided to speak at RubyKaigi 2014, presenting a proposal about Rails and diversity in tech.
00:10:13.760 It was accepted, and I had the opportunity to speak for the first time.
00:10:17.600 I appreciate the organizers for maintaining the historical archives of the event.
00:10:21.080 This experience opened more doors for me.
00:10:25.760 I got a second job through a referral from someone I met at RubyKaigi.
00:10:29.280 His impression from my tweets was that I might be a good programmer.
00:10:31.680 I still find that rather risky, but it did bring me success.
00:10:35.120 When I interviewed at Google in 2017, I submitted my RubyKaigi video link as a demonstration of my speaking abilities.
00:10:38.920 The hiring committee appreciated it, and I got an offer.
00:10:43.760 Moreover, they valued my contributions to the Ruby community; such contributions are essential for developers.
00:10:49.760 As it turns out, Google Cloud is one of the best environments for calculating pi.
00:10:54.720 I wanted to calculate pi, but I didn't have access to resources.
00:10:57.440 I don’t have a supercomputer just lying around, fortunately.
00:11:01.920 Google Cloud has P Day celebrations, where we experiment with new cloud technologies.
00:11:06.560 This provided the perfect setting for my idea.
00:11:08.560 With all the right skills, I proposed the idea to my managers and directors.
00:11:12.240 They were intrigued and allowed me to try.
00:11:16.520 You can see from this site that by 2015, 250 billion digits had been calculated.
00:11:20.640 By 2018, the number reached 500 billion.
00:11:25.280 But in 2019, the world record was achieved.
00:11:29.920 That was quite a leap!
00:11:32.000 Greg Wilson was a director at the time, and everyone enjoyed the concept.
00:11:36.080 It was an absolute pleasure working with fellow pi enthusiasts.
00:11:41.000 And that's the story.
00:11:44.080 I think this is a good RubyKaigi talk, as Ruby made these pi records possible.
00:11:47.560 Without Ruby, my career path could have looked very different.
00:11:52.600 I might not have worked at Google, nor broken the world record for pi twice.
00:11:56.000 Ultimately, without these resources, I wouldn't be standing here today, sharing this story.
00:12:00.560 Thank you very much!
Explore all talks recorded at RubyKaigi 2024
+55