00:00:09.800
Alright, if I forgot you, but I thought I was the only one.
00:00:18.600
Oh, okay, okay.
00:00:26.039
So, the title of my talk is 'Dear God, What Am I Doing? Concurrency and Parallel Processing.' I came up with the idea for this talk because once upon a time, I was a very scared man of my computer.
00:00:33.660
I had only written sequential programs, and I had to do some work that required me to make a new thread, start forking, and things that I had no clue about. It scared me.
00:00:47.820
So, if you find yourself in this situation, this talk is for you. If you have experience writing parallel programs and you're comfortable with that, and if you've written programs using Celluloid or your own web server like Puma or Unicorn, then this talk might be a bit boring for you.
00:01:07.100
However, it should still be educational for people like me. So, how many of you here have opened threads and forked stuff before? Okay, well, this might not apply to you, because this struggle was my experience when I first started.
00:01:32.340
For me, it went horribly wrong, but it was a great learning experience. Before we can dive into the different ways we can approach these problems, we need to take a brief detour into machine architecture.
00:01:56.299
Computers are probably one of the most complex things ever built by humans. But when we simplify it, we have threads and processes.
00:02:09.080
From a high-level perspective, it's not so complicated, but behind the scenes, it is very complex. We can break this system down into a diagram.
00:02:24.000
We have a hamster, which represents a thread, and the wheel is the process.
00:02:45.060
So now, we have threads and processes. Processes are composed of threads, and you can scale them out to perform many tasks.
00:03:01.500
When you want your computer to go faster, what you really want to do is copy and have all the hamsters running at the same time. But unfortunately, what usually happens is something like this.
00:03:18.099
And you can get back on and try again, and it just keeps happening. I had many things I wanted to discuss during this talk. The topic is vast, and I had to choose the key points I wanted to highlight.
00:03:39.540
So, if you have no experience, this should get you started, and if you do, it should reinforce what you already know.
00:03:58.379
So, let's move beyond memes and delve into the important stuff. The question we all face, especially with Ruby, is how can we make it faster?
00:04:15.659
Today, Charles talked about the JVM and how it's important to enhance performance. Ruby is generally fast enough, but it can be faster.
00:04:30.419
As a car enthusiast, I like working on cars to make them go faster because it feels great when you're in the driver's seat and hit the gas.
00:04:54.240
To make our computers faster, we need to enable them to perform multiple tasks simultaneously.
00:05:12.060
How many of you chat, browse the internet, and listen to music all at once? I do it constantly.
00:05:24.840
To accomplish concurrent jobs, we have three primitives: processes, threads, and fibers.
00:05:39.600
Processes are separate from everything else; they have their own memory space and are scheduled by the kernel.
00:05:50.039
Threads are inside processes, share the same memory space, and are also scheduled by the kernel. Fibers are similar to threads but give you control over when they start and stop.
00:06:02.220
The kernel decides when your code gets executed, and processes or threads may block depending on what's happening, especially with I/O.
00:06:21.780
The most common blocking operation is I/O, which many in the Node.js community will praise, as non-blocking I/O is considered highly desirable.
00:06:35.580
The main concern when working with threads and processes is understanding the different semantics across various interpreters.
00:06:50.760
The significant distinction is MRI, which is Ruby's original interpreter, compared to others like JRuby and Rubinius that handle threads differently.
00:07:05.520
This handling changed noticeably between Ruby versions 1.8 and 1.9.
00:07:22.680
In Ruby 1.8, there were green threads, which were not backed by native threads and thus had limitations.
00:07:36.820
However, in Ruby 1.9, every thread you instantiate is backed by a native thread object.
00:07:54.599
Now, let’s look at threads. They are an easy way to get started if you need to perform multiple tasks at once.
00:08:11.639
You can simply instantiate as many threads as you want, but you don't have control over when they run.
00:08:26.279
This leads to a common problem: if you create a thread and run your code, sometimes nothing seems to happen.
00:08:38.880
The issue is that you need to wait for the thread to finish. If you're delegating work to child threads, the parent thread must wait for them to complete.
00:08:52.080
You can do this by calling join on an array of threads.
00:09:01.380
Another thing to note about threads is that when you run a piece of code, the order of the output will change each time.
00:09:20.040
Execution order cannot be controlled, and this makes debugging a challenge.
00:09:34.740
Threads have shared memory, which is both an advantage and a source of complications.
00:09:46.200
For instance, if we have two threads, one calculating interest on a shared bank account balance and the other printing it out, we need to be cautious.
00:10:10.860
The issue arises because we need to protect shared variables using locks, so only one thread can access them at a time.
00:10:23.640
You can include a mutex, wrapping access in a block, which will block other threads until it is available.
00:10:41.040
Unfortunately, using locks is required with threads. We'll talk about strategies to avoid locks later.
00:10:54.420
Blocking I/O operations can lead to significant wait times in your application.
00:11:07.200
Ruby 1.9 made improvements for handling blocking operations, allowing for parallel I/O under specific circumstances.
00:11:22.140
When multiple threads wait for I/O, Ruby selects the one that is ready to proceed.
00:11:34.860
Now, let’s talk about processes.
00:11:46.860
Processes are powerful because they can do everything threads can do, but open many more instances based on system resources.
00:11:57.120
You instantiate new processes usually with the fork method, and they inherit a copy of the parent process.
00:12:08.880
Unlike threads, processes don't share memory, so inter-process communication (IPC) becomes necessary, which can be quite complex.
00:12:20.700
As we proceed, fibers can also compose applications, though they have a smaller heap space.
00:12:30.540
When you want to make things run faster, one common approach is parallelizing your operations.
00:12:44.660
For example, we may want to query the first hundred pages of Google searches for the keyword 'Ruby' and do so quickly.
00:13:00.780
However, this task involves I/O, which can block, showcasing issues faced in MRI.
00:13:12.300
We could approach this with either threads or processes.
00:13:28.800
For instance, I attempted a multi-threaded solution that initially took about 45 seconds over my Wi-Fi.
00:13:40.440
However, I knew I could optimize it using all the resources at my disposal.
00:13:53.520
I have four cores, so I thought to instantiate four threads to pull tasks from a queue.
00:14:06.720
Using Ruby's standard library, I leveraged a queue that is partially blocking.
00:14:20.880
This brought running time down to eight seconds—quite a significant improvement.
00:14:33.480
With that success, I decided to experiment with increasing thread counts.
00:14:48.840
I adjusted my command-line arguments to test various thread counts.
00:15:04.200
My previous experiences led me to anticipate a linear performance gain, but that wasn't the case.
00:15:20.300
As seen with thread counts of ten and nine, the overhead from context switching negated performance increases.
00:15:34.800
This is primarily due to the constant context pushing and extensive I/O.
00:15:46.620
Next, we discuss a math-intensive example focusing on Bitcoin hashing, which can be computationally intense.
00:16:00.240
You often need to perform thousands of hashes, especially if you're interested in cryptographic ventures.
00:16:15.240
Here's another example that doesn't involve I/O but focuses on computation.
00:16:30.180
In this case, I want to implement a queue and run multiple threads to maximize the frequency of operations.
00:16:45.480
However, as I went to run the code on my most powerful machine, it turned out to be slow.
00:16:59.940
Here, we need to address an unfortunate aspect known as the Global Interpreter Lock (GIL).
00:17:11.100
The GIL restricts MRI, ensuring that only one thread can execute Ruby code at a time.
00:17:27.720
In our context, this means that only one of my threads can do hashing at a time.
00:17:39.180
This limitation complicates true parallel programming with MRI.
00:17:55.680
Fortunately, JRuby and Rubinius don't have this limitation.
00:18:09.360
This is excellent news for those working with these interpreters.
00:18:22.920
Ruby web servers like Puma are designed to run on JRuby to take advantage of this.
00:18:35.880
Now, let's discuss multi-process setups.
00:18:48.960
Multi-process frameworks like Unicorn start your application then fork off multiple worker processes to handle requests.
00:19:04.680
There’s also the Zeus gem, which forks your Rails app during specific boot stages to avoid redundant boots.
00:19:20.400
With multiple processes, you can spawn threads inside each process for added concurrency.
00:19:35.280
This leads to exciting possibilities, but also invites complexity.
00:19:49.380
In this example, I would demonstrate a process instantiating several threads.
00:20:01.380
However, note the absence of inter-process communication in this straightforward implementation.
00:20:19.920
Having a well-structured example is essential for understanding concurrent processes.
00:20:37.800
These simple examples illustrate the concepts but become complicated as real-world scenarios are introduced.
00:20:55.380
When a long-running process requires interaction with other processes, synchronization becomes crucial.
00:21:18.420
Global state management may necessitate locks or, preferably, immutable data structures.
00:21:35.220
An important quirk in the Ruby standard Library is present in its queue functionality.
00:21:55.020
Using a queue object to manage the items for the threads can lead to deadlocks.
00:22:11.700
The empty call is non-blocking, while pop is blocking, which can lead to synchronization issues.
00:22:27.600
In scenarios with multiple threads, this design exposes us to deadlock situations.
00:22:40.680
These deadlock situations occur because one thread may find the queue empty while another pops an item.
00:23:00.420
To fix this, we could override the queue class to prevent the empty checking problem.
00:23:17.040
Instead of using empty, we use the pop method conditionally to prevent simultaneous access.
00:23:28.920
Having covered simple examples, let's delve deeper into more robust concurrency tools.
00:23:41.640
How many here have used Celluloid? And how many people have used Sidekiq?
00:23:56.940
Sidekiq is built on top of Celluloid and utilizes the actor model, where each actor operates in its own thread.
00:24:19.800
Celluloid simplifies the process of writing concurrent code by handling things like pooling, supervision, and message passing.
00:24:40.020
This structure prevents deadlocks and allows for actions that feel intuitive, akin to writing sequential code.
00:24:55.680
Using Celluloid, one can easily instantiate a worker pool, much like how Sidekiq operates.
00:25:11.100
While it's not necessarily as fast as using multiple processes and threads, the ease of understanding is a huge advantage.
00:25:30.060
Of course, this still comes with the caveat of the global interpreter lock.
00:25:45.660
As stated in Celluloid's documentation, it functions best when working within JRuby or Rubinius.
00:26:03.580
The entire landscape of concurrency is vast, extending from threads to processes, while also involving libraries like EventMachine and Grape.
00:26:22.680
Exploring this path demands careful monitoring and overhead management.
00:26:39.660
How many attended the Immutable Ruby talk earlier?
00:26:56.160
Using immutable objects greatly helps in managing global states.
00:27:12.960
So, how can we make Ruby faster?
00:27:28.860
You could contribute to MRI, collaborate on Rubinius or JRuby, or work with project leads like Evan, Brian, and Charles.
00:27:46.760
I'm somewhat embarrassed because I went through my slide deck much faster than I practiced.
00:28:02.040
Unfortunately, that's it for now.
00:28:10.500
I'd be happy to answer any questions or engage in discussions about other topics.
00:28:51.179
In a Celluloid example, you mentioned it looks at the number of cores. How does that affect multi-threaded processes?
00:29:08.300
The question was regarding the benefit of multiple threads when constrained to a single core.
00:29:22.440
Using Celluloid returns the pool size matching core counts, but is that effective since it's still bound to one core?
00:29:40.200
The operating system schedules those threads, and with support from the kernel, threads can run across multiple cores.
00:29:58.540
On my MacBook Air, even utilizing all available threads can max out performance.
00:30:14.900
Most examples I provided demonstrate this.
00:30:28.200
Anyone else? Yes, there's a question over there.
00:30:45.080
Ultimately, implementing a message bus for communication across long-lived processes might be beneficial.
00:31:09.240
Unicorn uses sockets for communication as shared memory is not feasible, allowing separate processes to communicate.
00:31:27.300
You might consider alternative libraries like DRb for handling communication within your system.
00:31:41.580
To my knowledge, Celluloid doesn't have extensive built-in options for distributing actors across machines.
00:31:59.760
However, it features DCell, which facilitates Celluloid's distributed functionality.
00:32:13.680
I hope this clarifies the discussion; thank you for your attention.