00:00:18.400
So, I am going to be talking about parallel and concurrent programming. I titled this talk 'Dear God, What Am I Doing?' because that's what it felt like for me when I just started. If you've ever felt like that, this talk is for you.
00:00:23.680
If you are scared of these concepts, like when you hear them being discussed, don't worry—you're not alone. Many people have the same reaction. I actually gave this same talk last week at RubyConf Australia, and a lot of people there had experience with these concepts.
00:00:35.840
As a result, I've changed the talk a bit to go faster through the basics and get into some more advanced topics. I named the talk the way I did primarily to show this picture.
00:00:41.280
Yeah, because the first time I actually did some concurrent work, I totally messed it up, and it was just embarrassing. I felt like that dog in the picture during my programming efforts.
00:00:47.760
But I was getting paid for that, which made it a bit more bearable. So, here's a brief introduction to the concepts we'll be using.
00:00:53.360
Computers are the most complex things humans have ever made, but we can actually break them down into a very simple diagram: imagine a hamster on a wheel.
00:01:05.360
In this analogy, the hamster represents a thread, and the wheel represents a process. So, we have processes made up of threads, and the kernel is what's running our processes.
00:01:16.720
When we want our tasks to run faster, we might think we can just do multiple things at once—magically making our code run faster.
00:01:27.119
However, what often happens is we mess up, and we end up falling off the wheel. I had an instance where a thread in an instance variable caused chaos when I tried to kill it.
00:01:38.080
If you kill a thread inappropriately, you can really blow up your system. The key point of this talk is to explore how we can make our programs faster by having them do more things at once.
00:01:51.040
I'm a car guy, and this is a turbocharger. A turbo works by compressing more air and pushing it into the engine to create a more powerful explosion. That's how I envision speeding up my programs: how can I create more acceleration and power by doing multiple things at once?
00:02:06.320
We have three main primitives: processes, threads, and Ruby also gives us fibers. As I mentioned, a process is composed of threads, and the kernel decides when to run those threads.
00:02:19.760
Threads can block for various reasons, primarily due to I/O. Fibers are like threads, but you control when a fiber runs, meaning you can start and stop it.
00:02:28.239
You can create new processes using `Process.new` or `fork`. The most important takeaway is that the kernel is in charge of deciding what runs your code and when.
00:02:37.840
It's also worth noting that things behave differently based on the Ruby interpreter. Prior to Ruby 1.9, MRI used green threads, which is essentially a huge trojan horse.
00:02:49.520
With Ruby 1.9 and onward, every thread is backed by native threads, allowing multiple tasks to occur concurrently, a significant shift from Ruby 1.8.
00:03:01.440
It's also different across interpreters like JRuby and Rubinius. The easiest way to get started with concurrency in Ruby is by using threads.
00:03:12.080
You can create a new thread simply by calling `Thread.new`, which will execute your code in the background. But if this is the only code in your program, you'll run into a problem.
00:03:28.960
The main issue is that you need to join your threads to ensure that your main program waits for them to finish executing. Otherwise, the main thread may exit before the child thread has a chance to run.
00:03:42.959
If you do it correctly by joining your threads, you can see output on the screen as intended. If you run it again, the output may differ.
00:03:56.639
However, you need to understand that you don’t have control over when your threads execute; that is decided by the kernel. Trying to do order-dependent code can lead to issues.
00:04:07.920
Threads share the same memory space, which introduces potential problems. For example, if two threads need access to a bank account simultaneously, that’s a bad situation.
00:04:22.880
To solve this kind of problem, we use locks to make code paths exclusive. Ruby’s standard library provides a simple mutex to handle this requirement.
00:04:36.799
Locks help ensure that only one Ruby thread can execute a specific block of code at any given time.
00:04:48.560
To illustrate, if one thread is changing a balance, we can see that the balance is modified as expected.
00:04:59.760
However, blocking occurs when dealing with I/O operations. When you perform network access or any type of I/O in Ruby, it will block, and the kernel will select a different thread to run.
00:05:12.400
In contrast, Node.js has non-blocking I/O, which is quite efficient. Essentially, I/O operations are what block your threads most of the time.
00:05:26.960
Moving on to processes, you typically create a new process using `fork`. Here’s an example: we fork one thread that modifies the balance while a parent thread prints out the balance.
00:05:42.560
The key distinction between threads and processes is how they manage memory. In a process, memory is not shared like it is between threads.
00:05:54.879
When you fork a process, a new memory space is allocated with all necessary variable references. But if you don’t manage this well, it can lead to chaotic outcomes.
00:06:08.240
I didn’t want to expand on non-blocking I/O, as I think it requires extensive setup and requires significant changes in how we typically write Ruby programs.
00:06:17.600
Fibers are also important but I believe you can achieve more with threads and processes. Fibers allow you to start and stop code execution, offering another option for creating concurrent programs.
00:06:34.080
Let’s look at some examples involving fetching a certain number of Google search results efficiently. You can tackle this problem multi-threaded or multi-process.
00:06:49.760
Before you gain any experience with threads or processes, you might just write a simple loop. This code works, but it’s not nearly as fast as it could be.
00:07:05.680
Take advantage of your multi-core computer to run multiple threads, allowing you to split up the workload using queues. In this example, I’ve created four threads and a queue to handle the work.
00:07:22.080
But why just four threads? Why not 100? That's the next question to address because simply adding more threads complicates the issue.
00:07:36.480
While a single thread might take a long time, using multiple threads will speed things up significantly. However, you need to be mindful of context switching.
00:07:52.160
If you create too many threads, you spend more time switching between them than processing.
00:08:04.480
Another interesting example can be seen with Bitcoin mining, which processes many hashes through brute force, similar to password cracking.
00:08:19.920
If you try to calculate a million hashes using threads, you could initially think that splitting this workload would be efficient.
00:08:33.120
But when you run this large computation, you might face a significant slowdown due to MRI's global interpreter lock, which limits execution to one thread at a time.
00:08:47.600
Thus, when you're looking to do true concurrent programming in Ruby, original MRI isn't your best option. JRuby and Rubinius are better suited due to their lack of a global interpreter lock.
00:09:04.000
To wrap this up, if you're writing multi-threaded or multi-process applications in Ruby, you might want to avoid MRI.
00:09:19.680
Multi-process applications are another avenue to tackle certain problems. However, not all platforms support this. If you're on JRuby or Windows, forking processes can be problematic.
00:09:33.840
You can replace `Thread.new` with `fork` in the same existing code structure; the key difference being that you now need to wait on the child processes.
00:09:48.000
When working with multiple processes, you'll encounter inter-process communication, which this specific example avoids.
00:10:00.000
Typically, communication occurs when you pass block variables into your fork calls. This approach performs comparably to multi-threading but let's consider a more complex example.
00:10:11.120
Processes are composed of threads, meaning you can fork processes and then create threads within those processes. This allows for a good balance of performance.
00:10:24.000
There’s a subtle issue you might encounter with the Ruby standard library’s queue implementation. This may not show up with a small dataset but could become problematic with larger workloads.
00:10:41.280
If you're working with a million tasks, you can face potential deadlocks because the queue's `empty?` method is non-blocking, while `pop` is blocking.
00:10:54.560
Due to this behavior, a situation can arise where the empty check returns true, but when the program attempts to pop, it hangs indefinitely waiting for more data.
00:11:05.440
The solution is simple: wrap the operation of checking the queue and popping it inside a lock, ensuring you'll avoid deadlocks.
00:11:19.680
As a web developer, you likely have a keen interest in improving the speed of your web applications. Tools like Unicorn are particularly popular.
00:11:31.040
Unicorn operates as a pre-forking web server, starting a process and then forking multiple worker processes to handle requests.
00:11:46.160
Having your parent process monitor child processes is essential for ensuring they remain alive as they handle incoming requests.
00:11:59.680
When you have multiple workers, communication happens over sockets as processes cannot communicate simply like threads can.
00:12:12.400
There is an inherent memory cost associated with forking processes, and these processes will duplicate memory pages upon creation.
00:12:24.560
Thus, every bit of memory should be treated meticulously because changing a single page can lead to costly memory duplication for child processes.
00:12:35.520
This is where Unicorn gets complicated; managing memory efficiently while monitoring processes impacts performance.
00:12:47.680
Celluloid emerges as a powerful solution here, streamlining the complexity of managing concurrent applications. Who here has used or heard of Celluloid before?
00:13:00.000
Celluloid is based on the actor model, allowing each actor to execute in its own thread. It utilizes mailboxes for inter-actor communication, helping you avoid the manual management of threads.
00:13:18.240
Written by Tony Arcieri, who recently won a Ruby hero award, Celluloid offers tools to help manage complexities like logging and supervision.
00:13:35.760
In essence, Celluloid alleviates many headaches associated with thread management, making it easier to build robust concurrent applications.
00:13:50.720
Celluloid’s built-in monitoring facility helps keep track of threads, and in case any worker crashes, it restarts automatically.
00:14:05.760
Additionally, you can save worker references for later use across different code segments.
00:14:19.040
Inter-process communication in Celluloid happens through threads, allowing you to send messages between worker objects via their mailboxes.
00:14:32.480
Here's a simple example of how messages can be sent back and forth using Celluloid's infrastructure. This is a trivial example, but the underlying concept can be applied broadly.
00:14:49.600
Celluloid allows you to delay method calls, which return immediately, enabling you to handle other tasks in the background.
00:15:04.960
In terms of preventing deadlocks, Celluloid's scheduling method for calls is designed to mitigate these risks.
00:15:17.920
For example, by creating a pool of workers in Celluloid, you can efficiently fetch and process data with better parallel capabilities.
00:15:30.080
You can specify the number of threads equivalent to your machine's core count, but you’re also free to create more if you anticipate handling greater loads.
00:15:44.720
Additionally, there’s a part of Celluloid called dCell, which stands for distributed Celluloid. It allows you to start workers on different nodes across a network.
00:16:01.440
Using dCell, you can assign a node ID and address for each worker. Under the hood, it utilizes ZeroMQ for communication between these workers.
00:16:16.320
This capability opens up immense possibilities for scaling your programs across multiple machines without any complex setup.
00:16:32.560
You can drop cells into nodes and instantiate numerous worker threads across different machines while enjoying a simple Ruby interface.
00:16:45.280
This flexibility is eye-opening because you can distribute tasks and communicate effortlessly across systems, making it so much easier than traditional methods.
00:17:01.440
Now, let’s address non-blocking I/O. I initially avoided this topic because it requires significant setup time and alters how you write Ruby programs.
00:17:17.360
If you are interested in non-blocking I/O, EventMachine or Celluloid I/O are valid paths to explore.
00:17:30.240
With EventMachine, you configure your entire system to be asynchronous, which necessitates a shift in mindset.
00:17:43.600
However, using something like Celluloid allows for a more object-oriented approach, reducing the setup complexity.
00:17:58.560
With the use of actors and synchronization in Celluloid, you can maintain clarity in your code while benefiting from faster and more efficient runtime.
00:18:12.000
When using the actor model, you can manage state locally within each actor without worrying about sharing across threads.
00:18:26.560
However, take note that there is still complexity in managing state and communicating effectively, largely depending on your application’s needs.
00:18:41.680
If you have any questions or comments, feel free to ask now. Thank you for listening!