RubyConf AU 2013
Dear God What Am I Doing? Concurrency and Parallel Processing
Summarized using AI

Dear God What Am I Doing? Concurrency and Parallel Processing

by Adam Hawkins

The video titled "Dear God What Am I Doing? Concurrency and Parallel Processing" features Adam Hawkins at RubyConf AU 2013. The talk addresses the complexities of concurrency and parallel processing in programming, particularly within the Ruby programming language. Adam shares his personal journey of overcoming the confusion surrounding threads, processes, and fibers while aiming to enhance program speed. This informative session targets intermediate Ruby developers who may lack experience with multi-threaded code.

Key points covered in the talk include:

- Introduction to Concurrency and Parallelism: Adam explains the distinction between concurrency (managing multiple tasks simultaneously) and parallelism (executing multiple tasks at the same time).

- Machine Architecture Basics: The talk simplifies the concept of threads and processes with a humorous analogy of a hamster running in a wheel, representing threads within processes.

- Understanding Threads, Processes, and Fibers:

- Processes: They operate independently with their own memory space, making inter-process communication (IPC) necessary.

- Threads: They share memory space, making them easier to work with; however, they can lead to complexities such as race conditions requiring the use of locks.

- Fibers: Offer finer control over execution and can aid in concurrency.

- Handling I/O Operations: Ruby's versions 1.8 and 1.9 treated threading differently, affecting performance particularly during blocking I/O operations.

- Parallelizing Operations: Adam explains experimenting with multi-threading to optimize response times for tasks such as making web queries. His findings showed that while using four threads substantially reduced execution time, scaling beyond that proved inefficient due to context switching overhead.
- Global Interpreter Lock (GIL): Hawkins discusses how the GIL in MRI affects true parallelism in Ruby, a limitation mitigated by using JRuby or Rubinius.
- Key Libraries and Tools: The session introduces Celluloid as a library that simplifies concurrent programming and helps prevent issues like deadlocks, although it also faces GIL limitations.
- Practical Examples: Real-life scenarios, such as querying Google and computational tasks like Bitcoin hashing, illustrate the challenges and considerations when working with concurrency.

In conclusion, Adam’s talk emphasizes that understanding the nuances of concurrency and parallel processing is crucial for Ruby developers seeking to optimize their applications. He encourages exploring different Ruby interpreters and libraries like Celluloid to leverage the full potential of concurrency in programming.

00:00:09.800 Alright, if I forgot you, but I thought I was the only one.
00:00:18.600 Oh, okay, okay.
00:00:26.039 So, the title of my talk is 'Dear God, What Am I Doing? Concurrency and Parallel Processing.' I came up with the idea for this talk because once upon a time, I was a very scared man of my computer.
00:00:33.660 I had only written sequential programs, and I had to do some work that required me to make a new thread, start forking, and things that I had no clue about. It scared me.
00:00:47.820 So, if you find yourself in this situation, this talk is for you. If you have experience writing parallel programs and you're comfortable with that, and if you've written programs using Celluloid or your own web server like Puma or Unicorn, then this talk might be a bit boring for you.
00:01:07.100 However, it should still be educational for people like me. So, how many of you here have opened threads and forked stuff before? Okay, well, this might not apply to you, because this struggle was my experience when I first started.
00:01:32.340 For me, it went horribly wrong, but it was a great learning experience. Before we can dive into the different ways we can approach these problems, we need to take a brief detour into machine architecture.
00:01:56.299 Computers are probably one of the most complex things ever built by humans. But when we simplify it, we have threads and processes.
00:02:09.080 From a high-level perspective, it's not so complicated, but behind the scenes, it is very complex. We can break this system down into a diagram.
00:02:24.000 We have a hamster, which represents a thread, and the wheel is the process.
00:02:45.060 So now, we have threads and processes. Processes are composed of threads, and you can scale them out to perform many tasks.
00:03:01.500 When you want your computer to go faster, what you really want to do is copy and have all the hamsters running at the same time. But unfortunately, what usually happens is something like this.
00:03:18.099 And you can get back on and try again, and it just keeps happening. I had many things I wanted to discuss during this talk. The topic is vast, and I had to choose the key points I wanted to highlight.
00:03:39.540 So, if you have no experience, this should get you started, and if you do, it should reinforce what you already know.
00:03:58.379 So, let's move beyond memes and delve into the important stuff. The question we all face, especially with Ruby, is how can we make it faster?
00:04:15.659 Today, Charles talked about the JVM and how it's important to enhance performance. Ruby is generally fast enough, but it can be faster.
00:04:30.419 As a car enthusiast, I like working on cars to make them go faster because it feels great when you're in the driver's seat and hit the gas.
00:04:54.240 To make our computers faster, we need to enable them to perform multiple tasks simultaneously.
00:05:12.060 How many of you chat, browse the internet, and listen to music all at once? I do it constantly.
00:05:24.840 To accomplish concurrent jobs, we have three primitives: processes, threads, and fibers.
00:05:39.600 Processes are separate from everything else; they have their own memory space and are scheduled by the kernel.
00:05:50.039 Threads are inside processes, share the same memory space, and are also scheduled by the kernel. Fibers are similar to threads but give you control over when they start and stop.
00:06:02.220 The kernel decides when your code gets executed, and processes or threads may block depending on what's happening, especially with I/O.
00:06:21.780 The most common blocking operation is I/O, which many in the Node.js community will praise, as non-blocking I/O is considered highly desirable.
00:06:35.580 The main concern when working with threads and processes is understanding the different semantics across various interpreters.
00:06:50.760 The significant distinction is MRI, which is Ruby's original interpreter, compared to others like JRuby and Rubinius that handle threads differently.
00:07:05.520 This handling changed noticeably between Ruby versions 1.8 and 1.9.
00:07:22.680 In Ruby 1.8, there were green threads, which were not backed by native threads and thus had limitations.
00:07:36.820 However, in Ruby 1.9, every thread you instantiate is backed by a native thread object.
00:07:54.599 Now, let’s look at threads. They are an easy way to get started if you need to perform multiple tasks at once.
00:08:11.639 You can simply instantiate as many threads as you want, but you don't have control over when they run.
00:08:26.279 This leads to a common problem: if you create a thread and run your code, sometimes nothing seems to happen.
00:08:38.880 The issue is that you need to wait for the thread to finish. If you're delegating work to child threads, the parent thread must wait for them to complete.
00:08:52.080 You can do this by calling join on an array of threads.
00:09:01.380 Another thing to note about threads is that when you run a piece of code, the order of the output will change each time.
00:09:20.040 Execution order cannot be controlled, and this makes debugging a challenge.
00:09:34.740 Threads have shared memory, which is both an advantage and a source of complications.
00:09:46.200 For instance, if we have two threads, one calculating interest on a shared bank account balance and the other printing it out, we need to be cautious.
00:10:10.860 The issue arises because we need to protect shared variables using locks, so only one thread can access them at a time.
00:10:23.640 You can include a mutex, wrapping access in a block, which will block other threads until it is available.
00:10:41.040 Unfortunately, using locks is required with threads. We'll talk about strategies to avoid locks later.
00:10:54.420 Blocking I/O operations can lead to significant wait times in your application.
00:11:07.200 Ruby 1.9 made improvements for handling blocking operations, allowing for parallel I/O under specific circumstances.
00:11:22.140 When multiple threads wait for I/O, Ruby selects the one that is ready to proceed.
00:11:34.860 Now, let’s talk about processes.
00:11:46.860 Processes are powerful because they can do everything threads can do, but open many more instances based on system resources.
00:11:57.120 You instantiate new processes usually with the fork method, and they inherit a copy of the parent process.
00:12:08.880 Unlike threads, processes don't share memory, so inter-process communication (IPC) becomes necessary, which can be quite complex.
00:12:20.700 As we proceed, fibers can also compose applications, though they have a smaller heap space.
00:12:30.540 When you want to make things run faster, one common approach is parallelizing your operations.
00:12:44.660 For example, we may want to query the first hundred pages of Google searches for the keyword 'Ruby' and do so quickly.
00:13:00.780 However, this task involves I/O, which can block, showcasing issues faced in MRI.
00:13:12.300 We could approach this with either threads or processes.
00:13:28.800 For instance, I attempted a multi-threaded solution that initially took about 45 seconds over my Wi-Fi.
00:13:40.440 However, I knew I could optimize it using all the resources at my disposal.
00:13:53.520 I have four cores, so I thought to instantiate four threads to pull tasks from a queue.
00:14:06.720 Using Ruby's standard library, I leveraged a queue that is partially blocking.
00:14:20.880 This brought running time down to eight seconds—quite a significant improvement.
00:14:33.480 With that success, I decided to experiment with increasing thread counts.
00:14:48.840 I adjusted my command-line arguments to test various thread counts.
00:15:04.200 My previous experiences led me to anticipate a linear performance gain, but that wasn't the case.
00:15:20.300 As seen with thread counts of ten and nine, the overhead from context switching negated performance increases.
00:15:34.800 This is primarily due to the constant context pushing and extensive I/O.
00:15:46.620 Next, we discuss a math-intensive example focusing on Bitcoin hashing, which can be computationally intense.
00:16:00.240 You often need to perform thousands of hashes, especially if you're interested in cryptographic ventures.
00:16:15.240 Here's another example that doesn't involve I/O but focuses on computation.
00:16:30.180 In this case, I want to implement a queue and run multiple threads to maximize the frequency of operations.
00:16:45.480 However, as I went to run the code on my most powerful machine, it turned out to be slow.
00:16:59.940 Here, we need to address an unfortunate aspect known as the Global Interpreter Lock (GIL).
00:17:11.100 The GIL restricts MRI, ensuring that only one thread can execute Ruby code at a time.
00:17:27.720 In our context, this means that only one of my threads can do hashing at a time.
00:17:39.180 This limitation complicates true parallel programming with MRI.
00:17:55.680 Fortunately, JRuby and Rubinius don't have this limitation.
00:18:09.360 This is excellent news for those working with these interpreters.
00:18:22.920 Ruby web servers like Puma are designed to run on JRuby to take advantage of this.
00:18:35.880 Now, let's discuss multi-process setups.
00:18:48.960 Multi-process frameworks like Unicorn start your application then fork off multiple worker processes to handle requests.
00:19:04.680 There’s also the Zeus gem, which forks your Rails app during specific boot stages to avoid redundant boots.
00:19:20.400 With multiple processes, you can spawn threads inside each process for added concurrency.
00:19:35.280 This leads to exciting possibilities, but also invites complexity.
00:19:49.380 In this example, I would demonstrate a process instantiating several threads.
00:20:01.380 However, note the absence of inter-process communication in this straightforward implementation.
00:20:19.920 Having a well-structured example is essential for understanding concurrent processes.
00:20:37.800 These simple examples illustrate the concepts but become complicated as real-world scenarios are introduced.
00:20:55.380 When a long-running process requires interaction with other processes, synchronization becomes crucial.
00:21:18.420 Global state management may necessitate locks or, preferably, immutable data structures.
00:21:35.220 An important quirk in the Ruby standard Library is present in its queue functionality.
00:21:55.020 Using a queue object to manage the items for the threads can lead to deadlocks.
00:22:11.700 The empty call is non-blocking, while pop is blocking, which can lead to synchronization issues.
00:22:27.600 In scenarios with multiple threads, this design exposes us to deadlock situations.
00:22:40.680 These deadlock situations occur because one thread may find the queue empty while another pops an item.
00:23:00.420 To fix this, we could override the queue class to prevent the empty checking problem.
00:23:17.040 Instead of using empty, we use the pop method conditionally to prevent simultaneous access.
00:23:28.920 Having covered simple examples, let's delve deeper into more robust concurrency tools.
00:23:41.640 How many here have used Celluloid? And how many people have used Sidekiq?
00:23:56.940 Sidekiq is built on top of Celluloid and utilizes the actor model, where each actor operates in its own thread.
00:24:19.800 Celluloid simplifies the process of writing concurrent code by handling things like pooling, supervision, and message passing.
00:24:40.020 This structure prevents deadlocks and allows for actions that feel intuitive, akin to writing sequential code.
00:24:55.680 Using Celluloid, one can easily instantiate a worker pool, much like how Sidekiq operates.
00:25:11.100 While it's not necessarily as fast as using multiple processes and threads, the ease of understanding is a huge advantage.
00:25:30.060 Of course, this still comes with the caveat of the global interpreter lock.
00:25:45.660 As stated in Celluloid's documentation, it functions best when working within JRuby or Rubinius.
00:26:03.580 The entire landscape of concurrency is vast, extending from threads to processes, while also involving libraries like EventMachine and Grape.
00:26:22.680 Exploring this path demands careful monitoring and overhead management.
00:26:39.660 How many attended the Immutable Ruby talk earlier?
00:26:56.160 Using immutable objects greatly helps in managing global states.
00:27:12.960 So, how can we make Ruby faster?
00:27:28.860 You could contribute to MRI, collaborate on Rubinius or JRuby, or work with project leads like Evan, Brian, and Charles.
00:27:46.760 I'm somewhat embarrassed because I went through my slide deck much faster than I practiced.
00:28:02.040 Unfortunately, that's it for now.
00:28:10.500 I'd be happy to answer any questions or engage in discussions about other topics.
00:28:51.179 In a Celluloid example, you mentioned it looks at the number of cores. How does that affect multi-threaded processes?
00:29:08.300 The question was regarding the benefit of multiple threads when constrained to a single core.
00:29:22.440 Using Celluloid returns the pool size matching core counts, but is that effective since it's still bound to one core?
00:29:40.200 The operating system schedules those threads, and with support from the kernel, threads can run across multiple cores.
00:29:58.540 On my MacBook Air, even utilizing all available threads can max out performance.
00:30:14.900 Most examples I provided demonstrate this.
00:30:28.200 Anyone else? Yes, there's a question over there.
00:30:45.080 Ultimately, implementing a message bus for communication across long-lived processes might be beneficial.
00:31:09.240 Unicorn uses sockets for communication as shared memory is not feasible, allowing separate processes to communicate.
00:31:27.300 You might consider alternative libraries like DRb for handling communication within your system.
00:31:41.580 To my knowledge, Celluloid doesn't have extensive built-in options for distributing actors across machines.
00:31:59.760 However, it features DCell, which facilitates Celluloid's distributed functionality.
00:32:13.680 I hope this clarifies the discussion; thank you for your attention.
Explore all talks recorded at RubyConf AU 2013
+21