00:00:02.760
All right, I guess we are starting. Hello!
00:00:04.920
Thank you for coming to my talk. It will be about the world record calculation of pi.
00:00:06.600
My name is Emma Haruka Iwao, and I'm a software engineer at Google Cloud.
00:00:09.639
So why am I here talking about pi and Ruby? Well, I'm a two-time world record holder for the most accurate pi ever calculated.
00:00:13.480
I've achieved this with 31.4 trillion digits in 2019 and 62.8 trillion digits in 2022.
00:00:15.279
So I know a little bit about pi and pi calculations.
00:00:19.039
Before talking about how we do this, let me discuss why we do this.
00:00:21.600
Why do we need more digits of pi? Because we only need about 30 to 40 decimals for most scientific computations.
00:00:24.640
As Donald Knuth mentions in his book, 'The Art of Computer Programming,' human progress in calculation has traditionally been measured by the number of decimal digits of pi.
00:00:28.000
By calculating more digits, we are somehow showcasing that human civilization has been advancing and improving in mathematics.
00:00:31.200
So I'm preparing for a potential extraterrestrial intelligence invasion or something.
00:00:34.080
In fact, we've been doing this for thousands of years. The earliest known records of pi date back to around 2000 BC, nearly 4,000 years ago.
00:00:39.079
Since we started using computers, the number of calculated digits has exploded.
00:00:42.200
This is a logarithmic graph showing exponential growth.
00:00:45.680
Pi is also a popular benchmark for PCs. In 1995, Super Pi was released by researchers at the University of Tokyo, and it can compute up to 16 million digits.
00:00:49.840
Two years later, PiFast was released, which supports calculations up to 16 billion digits, and in 2009, y-cruncher was introduced, which now supports up to 108 quadrillion digits, although nobody has actually tested that.
00:00:57.719
There is another category: pi is also a popular benchmark among PC overclockers, who calculate fewer digits—like one billion digits—very quickly.
00:01:05.239
The current world record is 4 seconds and 48 milliseconds.
00:01:07.760
The second place is at 4 seconds and 442 milliseconds, so they are basically competing by the millisecond.
00:01:11.040
It's fairly competitive, but today the calculation involves massive trillions of digits.
00:01:14.000
We need to work within certain constraints, such as resources and environments available to us.
00:01:17.000
Let me go over each step. We use y-cruncher, which was developed by Alexander Yee, who started this project as a high school project.
00:01:22.680
He has been working on it for a few decades, and it is currently the fastest available non-parallel program to calculate pi on a single-node computer.
00:01:26.000
A single-node computer means it's not a supercomputer, and y-cruncher is written in C++ and optimized for modern CPUs.
00:01:30.580
It supports some of the latest instructions like AVX512.
00:01:34.699
To calculate 100 trillion digits of pi, you need a fairly fast computer with a lot of memory and storage.
00:01:40.360
In this case, that means you need 468 terabytes of storage or memory in your workspace.
00:01:45.500
Unfortunately, we don't have that much available storage.
00:01:48.000
You can't fit everything into memory, so you need to attach storage to your workspace.
00:01:53.120
The storage system is usually several orders of magnitude slower than the CPU.
00:01:56.000
In other words, CPU speed is not the most important; the important part is the I/O and disk throughput.
00:02:00.200
The calculation of 100 trillion digits took 157 days.
00:02:03.000
During this time, the calculation moved 62 petabytes of data, and the average CPU utilization during the calculation was only 35%.
00:02:07.400
Basically, 2,000 hours of the time was spent on I/O and waiting for the disks to feed data.
00:02:11.000
So even with an infinitely fast CPU, the calculation would still take over 100 days.
00:02:15.000
Now, let’s talk about some of the moving pieces.
00:02:17.840
We are using Google Compute Engine as our environment because I happen to work for this cloud provider.
00:02:20.960
I had access to the resources, and the maximum disk size you can attach to a single VM is 257 terabytes.
00:02:23.400
That's fairly big, but not enough for our case. We needed more than 500 terabytes, including some buffers.
00:02:27.200
We can't precisely measure the exact disk requirements.
00:02:30.120
So, you need to mount additional disks somehow. In our case, we used iSCSI.
00:02:33.000
iSCSI is an industry standard protocol to mount block devices over TCP/IP.
00:02:36.000
It's implemented in the Linux operating system, and if you use network, the bandwidth limit is 100 Gbps.
00:02:38.560
So, that's the throughput we want to maximize.
00:02:43.000
The overall architecture looks like this: we have the main VM running y-cruncher, which has some big CPUs.
00:02:48.000
There are additional nodes providing I/O targets and the necessary workspace for y-cruncher.
00:02:51.680
In the top right corner, there are small 50 terabyte disks attached directly to the main VM, but these are only used for the final results.
00:02:56.560
They are not used during the calculations.
00:03:00.640
We want to maximize the throughput between the large disks and the array of storage nodes.
00:03:04.560
If you look closer inside the data paths, there are multiple layers of complexity, especially with the use of network storage.
00:03:09.400
Y-cruncher writes to the file system, and then the file system issues I/O requests to the block device.
00:03:13.320
The I/O scheduler reorders and dispatches those requests, which then go to the iSCSI initiator.
00:03:17.600
The packets travel through the TCP/IP stack and then over the cloud network.
00:03:21.600
We don't have much control over the cloud network.
00:03:25.920
The reverse process occurs as well, and we do not have a file system in this case, because it's a block device.
00:03:30.080
But there are many layers and parameters that you can configure.
00:03:34.320
For instance, regarding file systems, should we use ext4, xfs, or something else?
00:03:37.200
Do we want to use a scheduler like the deadline scheduler or none at all?
00:03:39.440
There are TCP/IP parameters that can affect performance, like buffer sizes and the congestion algorithm.
00:03:43.800
And there are I/O parameters like queue depth and the number of outstanding requests.
00:03:46.640
Do we want to allow simultaneous multithreading for the CPU, or just one thread per core?
00:03:49.760
There is also a very important parameter called blocks per second.
00:03:51.920
This refers to block access size.
00:03:54.960
There are cloud-specific configurations like instance type and persistent disk type.
00:03:58.760
For example, do we want to use SDD disks, balanced disks, or more throughput-oriented disks?
00:04:03.200
There are a lot of parameters to consider, and you want to make sure that you are choosing the best options.
00:04:08.000
Every tuning can significantly affect the overall performance.
00:04:12.080
If the calculation is massive, it could take multiple months; therefore, if something is even 1% faster, it could save a day in total computation time.
00:04:15.120
Y-cruncher has a benchmark mode where you can test different parameters.
00:04:18.240
However, each run takes around 30 to 60 minutes, so you definitely don't want to wait in front of the terminal.
00:04:22.560
Instead, you might want to automate some parts of the process.
00:04:27.840
Let's discuss automation and how to manage most of the tasks, but not all.
00:04:30.920
Importantly, this is what the y-cruncher configuration file looks like.
00:04:35.360
It looks like JSON, but it is not. It is a custom format, meaning you can't use standard libraries to manipulate the file.
00:04:39.680
There is a lot of both love and hate towards YAML and JSON, but you come to appreciate some of the available libraries.
00:04:43.440
So how do we deal with that? Here comes ERB to the rescue.
00:04:46.080
If you're using Ruby, you're probably familiar with ERB.
00:04:49.920
ERB is a template engine that allows you to embed Ruby code within a document.
00:04:53.760
You place Ruby code in brackets, which executes as code.
00:04:56.440
If you add an equal sign, the block is replaced with the output of the code.
00:05:01.479
For example, if you put the string reader in the block, the output will include that string.
00:05:04.760
There is an ERB talk happening right now in the main hall, so thank you for being here.
00:05:08.880
I’m sure you can check the recording later.
00:05:11.360
There are two ways to use ERB.
00:05:13.840
For example, if you have a file, you can pass the location as a parameter to the ERB command to generate the output.
00:05:17.440
Alternatively, you can use the ERB class from your code.
00:05:20.560
You can read the file, pass the content to the ERB class, create the object, and print the result.
00:05:23.960
Both methods will yield the same result.
00:05:26.680
I created a custom script that has parameters like blocks per second and the number of disks you want to attach to the virtual machine.
00:05:31.920
You retrieve the benchmark template config file, set the parameters, and write the config file.
00:05:35.040
Finally, call the system method to launch y-cruncher.
00:05:37.760
So instead of writing a separate shell script, I decided to execute everything in the Ruby script.
00:05:42.240
This compact Ruby script runs y-cruncher with different numbers of disks, ranging from 32 to 72.
00:05:45.560
It's automated, so if you start this script before leaving the office, by the next morning, you should have the final results ready.
00:05:49.720
This is the ERB file, which is just a portion of the overall config.
00:05:51.760
You need to configure each storage volume as a line in the y-cruncher configuration.
00:05:54.200
You can write loops in ERB; here, you simply define one line template and let it expand to multiple lines.
00:05:58.000
Y-cruncher recognizes and uses all these disks.
00:06:02.000
You might ask why I didn't use my favorite template engine.
00:06:05.520
I think that any tool could have worked, but I chose one I was already familiar with.
00:06:08.400
The goal was to finish benchmarking as quickly as possible, rather than learning a new tool.
00:06:12.360
This is my comfortable path.
00:06:15.120
When you run y-cruncher in the terminal, this is what the result looks like.
00:06:18.919
The results are color-coded, and as you run the program, you can see the progress animated.
00:06:22.240
The most important numbers are displayed at the bottom right, along with y-cruncher specific parameters.
00:06:26.919
We need to extract these numbers from this output and save them to a text file.
00:06:30.000
After processing the output, the last command may warn about a binary file.
00:06:33.120
This is because the file contains many escape sequences.
00:06:35.000
I don’t know of any options to disable these escape sequences, so you need to work with them to pull the data you want.
00:06:39.440
This is the final result I arrived at, using a shell one-liner to extract the numbers.
00:06:42.640
Let me explain my thought process.
00:06:44.080
First, we need to filter the lines we care about because the overall output is fairly large.
00:06:47.200
The first regular expression picks out specific keywords like speed and computation.
00:06:49.840
The result will contain some relevant lines, even though there may be extra ones.
00:06:54.920
However, we can further process this text file later.
00:06:58.080
We can now filter out the escape sequences to see the important numbers.
00:07:01.640
I specifically used the GTE (giga translations per second) numbers as markers to help extract values.
00:07:04.560
To ease understanding, I chose to split the extraction into several commands.
00:07:09.920
However, some lines repeat, as y-cruncher writes the final result on separate lines for different outputs.
00:07:13.200
The first group appears multiple times while the last two lines concerning computation appear only once.
00:07:19.360
To manage this, we can utilize the 'sed' command, which allows text substitution.
00:07:23.280
This command can also extract lines by applying specific conditions.
00:07:26.080
Using the 'n' option to output only matching patterns, we define specific line counts to print.
00:07:30.320
Finally, we can filter for exact lines based on the requirements.
00:07:33.920
To clean up, we can remove unnecessary markers we used earlier to help keep things organized.
00:07:36.960
When handling large amounts of data, it’s often best to use structured files such as CSV or TSV.
00:07:40.000
Although some argue that CSV isn't truly structured, it still serves a purpose.
00:07:42.880
In this case, we want to convert our separate entries into a single comma-separated line.
00:07:45.440
You can still use Ruby or any programming language, but there's a handy command in the Linux toolkit for this.
00:07:49.600
The 'paste' command reads input from the file and replaces newline characters with a delimiter of your choice.
00:07:52.880
So, if you use a comma as the delimiter, all lines will be joined on one line, separated by commas.
00:07:55.760
This might not work for all cases, but it’s suitable for our data.
00:08:00.320
Now we have our CSV file ready.
00:08:03.210
Next, we want to process the benchmark results, which are stored across multiple files.
00:08:06.560
Using the 'find' command, we can gather all results into a single CSV file.
00:08:09.440
With our CSV output, we can crunch the numbers—what’s the best tool for that?
00:08:12.320
There is Jupyter, or you can utilize Google Sheets.
00:08:14.400
I chose Google Sheets for its popularity at my workplace.
00:08:17.520
During the benchmarking, I set columns for the parameters I planned to analyze.
00:08:20.480
However, I ended up adding extra parameters as I saw their potential during the process.
00:08:24.960
I even created a notes section in the last column to keep random observations.
00:08:28.000
While it wasn’t structured in a professional manner, it was helpful with fresh memories.
00:08:31.120
There’s another spreadsheet showing the throughput for different numbers of disks.
00:08:34.960
This confirmed that y-cruncher scales quite well up to 72 storage volumes.
00:08:39.000
The difference between my initial configuration and the best yield was substantial.
00:08:42.560
The most crucial figure was the blocks per second, which came from y-cruncher.
00:08:45.520
This specific workflow simulation improved performance by over 200%.
00:08:48.720
If the overall calculation took 157 days, relying on the initial configuration could have taken more than 300 days.
00:08:52.240
By running those benchmarks, I saved more than five months and, consequently, a lot of resources for companies.
00:08:56.640
At this point, you may wonder why I am sharing all this at RubyKaigi.
00:09:01.040
I’m more comfortable with Linux commands for text processing, even though everything can be written in Ruby.
00:09:04.200
This method allows me to experiment more effectively, especially with unknown input formats.
00:09:09.000
It’s a trial-and-error process; after all, you don’t regularly calculate pi every month.
00:09:13.720
Unlike a production system, there are no maintenance concerns.
00:09:16.800
If y-cruncher changes its format in the future, then yes, your program might break.
00:09:20.160
That's why there’s no point in over-optimizing.
00:09:22.560
Also, using Google Sheets is efficient for sharing and collaboration with co-workers.
00:09:25.760
So that became my tool of choice.
00:09:28.160
Finally, at the end of the assessment, I proceeded with the final configurations.
00:09:31.760
A few months later, I had the final results ready.
00:09:34.560
This is the terminal view when y-cruncher finishes the calculation.
00:09:37.600
Now, before we wrap up the conversation, I’d like to share a story about my journey to the world records.
00:09:42.320
This is RubyKaigi, after all, and I believe a story about Ruby will be interesting.
00:09:46.840
My first RubyKaigi was in 2009 in Tokyo when I was still a university student.
00:09:51.200
I tweeted about everything even though I might not have understood everything.
00:09:55.600
My senpai noticed my interest in Ruby from the conference.
00:10:00.000
A few years later, he suggested I attend RailsConf Tokyo in 2013 to learn more about Rails and Ruby.
00:10:04.800
I thought it was cool; I really enjoyed Rails, so I started contributing as a coach.
00:10:08.680
Many years later, I decided to speak at RubyKaigi 2014, presenting a proposal about Rails and diversity in tech.
00:10:13.760
It was accepted, and I had the opportunity to speak for the first time.
00:10:17.600
I appreciate the organizers for maintaining the historical archives of the event.
00:10:21.080
This experience opened more doors for me.
00:10:25.760
I got a second job through a referral from someone I met at RubyKaigi.
00:10:29.280
His impression from my tweets was that I might be a good programmer.
00:10:31.680
I still find that rather risky, but it did bring me success.
00:10:35.120
When I interviewed at Google in 2017, I submitted my RubyKaigi video link as a demonstration of my speaking abilities.
00:10:38.920
The hiring committee appreciated it, and I got an offer.
00:10:43.760
Moreover, they valued my contributions to the Ruby community; such contributions are essential for developers.
00:10:49.760
As it turns out, Google Cloud is one of the best environments for calculating pi.
00:10:54.720
I wanted to calculate pi, but I didn't have access to resources.
00:10:57.440
I don’t have a supercomputer just lying around, fortunately.
00:11:01.920
Google Cloud has P Day celebrations, where we experiment with new cloud technologies.
00:11:06.560
This provided the perfect setting for my idea.
00:11:08.560
With all the right skills, I proposed the idea to my managers and directors.
00:11:12.240
They were intrigued and allowed me to try.
00:11:16.520
You can see from this site that by 2015, 250 billion digits had been calculated.
00:11:20.640
By 2018, the number reached 500 billion.
00:11:25.280
But in 2019, the world record was achieved.
00:11:29.920
That was quite a leap!
00:11:32.000
Greg Wilson was a director at the time, and everyone enjoyed the concept.
00:11:36.080
It was an absolute pleasure working with fellow pi enthusiasts.
00:11:41.000
And that's the story.
00:11:44.080
I think this is a good RubyKaigi talk, as Ruby made these pi records possible.
00:11:47.560
Without Ruby, my career path could have looked very different.
00:11:52.600
I might not have worked at Google, nor broken the world record for pi twice.
00:11:56.000
Ultimately, without these resources, I wouldn't be standing here today, sharing this story.
00:12:00.560
Thank you very much!