Talks

Dear God, what am I doing? Concurrency and parallel processing

This video was recorded on http://wrocloverb.com. You should follow us at https://twitter.com/wrocloverb. See you next year!

Here's a situation we've all be in at one point. We have a program. We
want to make that program faster. We know about threads, processes,
fibers. Then you start programming and you have no clue what to do. I
was there. It sucks. This talk guides you down the rabbit hole and
brings out the other side.

Points covered:
Threads, Fibers
Processes, Forking, Detaching
Parellelism vs Concurrency
The many many different ways I crashed my computer learning these things
Gotchas of each
Common ways you shoot yourself in the foot
Celluoid

This is a learning and informative talk. It's target at intermediate
developers who have ruby experience but never written any multi threaded
code.

wroc_love.rb 2013

00:00:18.400 So, I am going to be talking about parallel and concurrent programming. I titled this talk 'Dear God, What Am I Doing?' because that's what it felt like for me when I just started. If you've ever felt like that, this talk is for you.
00:00:23.680 If you are scared of these concepts, like when you hear them being discussed, don't worry—you're not alone. Many people have the same reaction. I actually gave this same talk last week at RubyConf Australia, and a lot of people there had experience with these concepts.
00:00:35.840 As a result, I've changed the talk a bit to go faster through the basics and get into some more advanced topics. I named the talk the way I did primarily to show this picture.
00:00:41.280 Yeah, because the first time I actually did some concurrent work, I totally messed it up, and it was just embarrassing. I felt like that dog in the picture during my programming efforts.
00:00:47.760 But I was getting paid for that, which made it a bit more bearable. So, here's a brief introduction to the concepts we'll be using.
00:00:53.360 Computers are the most complex things humans have ever made, but we can actually break them down into a very simple diagram: imagine a hamster on a wheel.
00:01:05.360 In this analogy, the hamster represents a thread, and the wheel represents a process. So, we have processes made up of threads, and the kernel is what's running our processes.
00:01:16.720 When we want our tasks to run faster, we might think we can just do multiple things at once—magically making our code run faster.
00:01:27.119 However, what often happens is we mess up, and we end up falling off the wheel. I had an instance where a thread in an instance variable caused chaos when I tried to kill it.
00:01:38.080 If you kill a thread inappropriately, you can really blow up your system. The key point of this talk is to explore how we can make our programs faster by having them do more things at once.
00:01:51.040 I'm a car guy, and this is a turbocharger. A turbo works by compressing more air and pushing it into the engine to create a more powerful explosion. That's how I envision speeding up my programs: how can I create more acceleration and power by doing multiple things at once?
00:02:06.320 We have three main primitives: processes, threads, and Ruby also gives us fibers. As I mentioned, a process is composed of threads, and the kernel decides when to run those threads.
00:02:19.760 Threads can block for various reasons, primarily due to I/O. Fibers are like threads, but you control when a fiber runs, meaning you can start and stop it.
00:02:28.239 You can create new processes using `Process.new` or `fork`. The most important takeaway is that the kernel is in charge of deciding what runs your code and when.
00:02:37.840 It's also worth noting that things behave differently based on the Ruby interpreter. Prior to Ruby 1.9, MRI used green threads, which is essentially a huge trojan horse.
00:02:49.520 With Ruby 1.9 and onward, every thread is backed by native threads, allowing multiple tasks to occur concurrently, a significant shift from Ruby 1.8.
00:03:01.440 It's also different across interpreters like JRuby and Rubinius. The easiest way to get started with concurrency in Ruby is by using threads.
00:03:12.080 You can create a new thread simply by calling `Thread.new`, which will execute your code in the background. But if this is the only code in your program, you'll run into a problem.
00:03:28.960 The main issue is that you need to join your threads to ensure that your main program waits for them to finish executing. Otherwise, the main thread may exit before the child thread has a chance to run.
00:03:42.959 If you do it correctly by joining your threads, you can see output on the screen as intended. If you run it again, the output may differ.
00:03:56.639 However, you need to understand that you don’t have control over when your threads execute; that is decided by the kernel. Trying to do order-dependent code can lead to issues.
00:04:07.920 Threads share the same memory space, which introduces potential problems. For example, if two threads need access to a bank account simultaneously, that’s a bad situation.
00:04:22.880 To solve this kind of problem, we use locks to make code paths exclusive. Ruby’s standard library provides a simple mutex to handle this requirement.
00:04:36.799 Locks help ensure that only one Ruby thread can execute a specific block of code at any given time.
00:04:48.560 To illustrate, if one thread is changing a balance, we can see that the balance is modified as expected.
00:04:59.760 However, blocking occurs when dealing with I/O operations. When you perform network access or any type of I/O in Ruby, it will block, and the kernel will select a different thread to run.
00:05:12.400 In contrast, Node.js has non-blocking I/O, which is quite efficient. Essentially, I/O operations are what block your threads most of the time.
00:05:26.960 Moving on to processes, you typically create a new process using `fork`. Here’s an example: we fork one thread that modifies the balance while a parent thread prints out the balance.
00:05:42.560 The key distinction between threads and processes is how they manage memory. In a process, memory is not shared like it is between threads.
00:05:54.879 When you fork a process, a new memory space is allocated with all necessary variable references. But if you don’t manage this well, it can lead to chaotic outcomes.
00:06:08.240 I didn’t want to expand on non-blocking I/O, as I think it requires extensive setup and requires significant changes in how we typically write Ruby programs.
00:06:17.600 Fibers are also important but I believe you can achieve more with threads and processes. Fibers allow you to start and stop code execution, offering another option for creating concurrent programs.
00:06:34.080 Let’s look at some examples involving fetching a certain number of Google search results efficiently. You can tackle this problem multi-threaded or multi-process.
00:06:49.760 Before you gain any experience with threads or processes, you might just write a simple loop. This code works, but it’s not nearly as fast as it could be.
00:07:05.680 Take advantage of your multi-core computer to run multiple threads, allowing you to split up the workload using queues. In this example, I’ve created four threads and a queue to handle the work.
00:07:22.080 But why just four threads? Why not 100? That's the next question to address because simply adding more threads complicates the issue.
00:07:36.480 While a single thread might take a long time, using multiple threads will speed things up significantly. However, you need to be mindful of context switching.
00:07:52.160 If you create too many threads, you spend more time switching between them than processing.
00:08:04.480 Another interesting example can be seen with Bitcoin mining, which processes many hashes through brute force, similar to password cracking.
00:08:19.920 If you try to calculate a million hashes using threads, you could initially think that splitting this workload would be efficient.
00:08:33.120 But when you run this large computation, you might face a significant slowdown due to MRI's global interpreter lock, which limits execution to one thread at a time.
00:08:47.600 Thus, when you're looking to do true concurrent programming in Ruby, original MRI isn't your best option. JRuby and Rubinius are better suited due to their lack of a global interpreter lock.
00:09:04.000 To wrap this up, if you're writing multi-threaded or multi-process applications in Ruby, you might want to avoid MRI.
00:09:19.680 Multi-process applications are another avenue to tackle certain problems. However, not all platforms support this. If you're on JRuby or Windows, forking processes can be problematic.
00:09:33.840 You can replace `Thread.new` with `fork` in the same existing code structure; the key difference being that you now need to wait on the child processes.
00:09:48.000 When working with multiple processes, you'll encounter inter-process communication, which this specific example avoids.
00:10:00.000 Typically, communication occurs when you pass block variables into your fork calls. This approach performs comparably to multi-threading but let's consider a more complex example.
00:10:11.120 Processes are composed of threads, meaning you can fork processes and then create threads within those processes. This allows for a good balance of performance.
00:10:24.000 There’s a subtle issue you might encounter with the Ruby standard library’s queue implementation. This may not show up with a small dataset but could become problematic with larger workloads.
00:10:41.280 If you're working with a million tasks, you can face potential deadlocks because the queue's `empty?` method is non-blocking, while `pop` is blocking.
00:10:54.560 Due to this behavior, a situation can arise where the empty check returns true, but when the program attempts to pop, it hangs indefinitely waiting for more data.
00:11:05.440 The solution is simple: wrap the operation of checking the queue and popping it inside a lock, ensuring you'll avoid deadlocks.
00:11:19.680 As a web developer, you likely have a keen interest in improving the speed of your web applications. Tools like Unicorn are particularly popular.
00:11:31.040 Unicorn operates as a pre-forking web server, starting a process and then forking multiple worker processes to handle requests.
00:11:46.160 Having your parent process monitor child processes is essential for ensuring they remain alive as they handle incoming requests.
00:11:59.680 When you have multiple workers, communication happens over sockets as processes cannot communicate simply like threads can.
00:12:12.400 There is an inherent memory cost associated with forking processes, and these processes will duplicate memory pages upon creation.
00:12:24.560 Thus, every bit of memory should be treated meticulously because changing a single page can lead to costly memory duplication for child processes.
00:12:35.520 This is where Unicorn gets complicated; managing memory efficiently while monitoring processes impacts performance.
00:12:47.680 Celluloid emerges as a powerful solution here, streamlining the complexity of managing concurrent applications. Who here has used or heard of Celluloid before?
00:13:00.000 Celluloid is based on the actor model, allowing each actor to execute in its own thread. It utilizes mailboxes for inter-actor communication, helping you avoid the manual management of threads.
00:13:18.240 Written by Tony Arcieri, who recently won a Ruby hero award, Celluloid offers tools to help manage complexities like logging and supervision.
00:13:35.760 In essence, Celluloid alleviates many headaches associated with thread management, making it easier to build robust concurrent applications.
00:13:50.720 Celluloid’s built-in monitoring facility helps keep track of threads, and in case any worker crashes, it restarts automatically.
00:14:05.760 Additionally, you can save worker references for later use across different code segments.
00:14:19.040 Inter-process communication in Celluloid happens through threads, allowing you to send messages between worker objects via their mailboxes.
00:14:32.480 Here's a simple example of how messages can be sent back and forth using Celluloid's infrastructure. This is a trivial example, but the underlying concept can be applied broadly.
00:14:49.600 Celluloid allows you to delay method calls, which return immediately, enabling you to handle other tasks in the background.
00:15:04.960 In terms of preventing deadlocks, Celluloid's scheduling method for calls is designed to mitigate these risks.
00:15:17.920 For example, by creating a pool of workers in Celluloid, you can efficiently fetch and process data with better parallel capabilities.
00:15:30.080 You can specify the number of threads equivalent to your machine's core count, but you’re also free to create more if you anticipate handling greater loads.
00:15:44.720 Additionally, there’s a part of Celluloid called dCell, which stands for distributed Celluloid. It allows you to start workers on different nodes across a network.
00:16:01.440 Using dCell, you can assign a node ID and address for each worker. Under the hood, it utilizes ZeroMQ for communication between these workers.
00:16:16.320 This capability opens up immense possibilities for scaling your programs across multiple machines without any complex setup.
00:16:32.560 You can drop cells into nodes and instantiate numerous worker threads across different machines while enjoying a simple Ruby interface.
00:16:45.280 This flexibility is eye-opening because you can distribute tasks and communicate effortlessly across systems, making it so much easier than traditional methods.
00:17:01.440 Now, let’s address non-blocking I/O. I initially avoided this topic because it requires significant setup time and alters how you write Ruby programs.
00:17:17.360 If you are interested in non-blocking I/O, EventMachine or Celluloid I/O are valid paths to explore.
00:17:30.240 With EventMachine, you configure your entire system to be asynchronous, which necessitates a shift in mindset.
00:17:43.600 However, using something like Celluloid allows for a more object-oriented approach, reducing the setup complexity.
00:17:58.560 With the use of actors and synchronization in Celluloid, you can maintain clarity in your code while benefiting from faster and more efficient runtime.
00:18:12.000 When using the actor model, you can manage state locally within each actor without worrying about sharing across threads.
00:18:26.560 However, take note that there is still complexity in managing state and communicating effectively, largely depending on your application’s needs.
00:18:41.680 If you have any questions or comments, feel free to ask now. Thank you for listening!