Let's talk concurrency

00:00:09.599 Hello everyone! My name is José Valim, and I am the co-founder of Platform Tech, a consultancy based in Brazil. We work with Ruby on Rails and collaborate with companies worldwide. You may know me and Platform Tech through our open-source projects, such as Devise and Simple Form.

00:00:16.960 I am also a member of the Rails Core team, and I authored the book 'Crafting Rails Applications,' which discusses Rails internals and how to better understand and utilize Rails. However, today, I'm not here to talk about Rails or even specifically about Ruby. Instead, I will focus on concurrency in general.

00:00:30.599 I want to give you an overview of why concurrency is changing and becoming an increasingly popular topic. Different programming languages are exploring various approaches to concurrency, and I hope to provide you with sufficient information so you can delve deeper into the subject later.

00:01:08.320 Let’s start with a very simple definition of concurrency: executing several tasks simultaneously. For example, imagine you have a large CSV file and want to get the total sum of a specific column. If the file is substantial, it can take a significant amount of time to process. One straightforward approach is to divide this task into two halves, processing each half concurrently to speed up the calculation.

00:01:22.680 This is a basic example of concurrency: breaking a file into parts and calculating the sums simultaneously. Another example familiar to Rails developers involves using Passenger. When deploying a Rails application, you can configure Passenger to start multiple instances. This method represents one form of concurrency—multiprocessing.

00:01:55.040 However, I won't focus on that kind of concurrency today. As we look toward the future, many of us will want to deploy software on machines with 16 or even 24 cores. You wouldn’t want to start 16 Rails instances to utilize all those cores effectively. Thus, my focus today is on single-process multithreaded concurrency, where multiple cores within a single process are effectively utilized.

00:03:23.959 Before discussing concurrency, let’s consider a theoretical world without state or concurrency—a declarative model. This model is heavily based on mathematics, allowing us to perform computations without changing state. Here, we cannot mutate anything. In programming, for instance, in Ruby, you might have an array and append an element to it, resulting in mutation. However, in a declarative model, you would create a new array instead of changing the original.

00:04:05.319 In our theoretical world, we define functional processes—where functions are deterministic. This means that for the same input, the function will yield the same output each time. If we read from a file or generate a random number, those processes cannot be deterministic since the output can change each time we call them. Therefore, in this ideal world, everything in our code lacks side effects.

00:05:07.560 No function can change the output of other functions because the model relies on the same input consistently yielding the same output. This lack of side effects is crucial because it allows the language runtime to perform numerous optimizations. For instance, if we have a function that computes a value based on known input, and we frequently call this function, the language can optimize it by reusing the result each time instead of recalculating it.

00:05:58.000 Following this, let’s introduce concurrency into our state-free theoretical world. Suppose we now have state, but we still have a form of concurrency called data flow variables. Using the same Lambda function, we can define that we want to calculate values concurrently. Instead of calculating them sequentially, we can use threads to compute the results simultaneously.

00:06:49.839 In this model, when the first thread calculates a value, it starts another thread, allowing both calculations to occur in parallel. The runtime simply waits for the completion of these threads and retrieves the values as needed. This method of leveraging threads works smoothly because there is no shared mutable state—each thread works independently, avoiding the race conditions that often plague concurrent programming.

00:08:04.240 However, the need for state introduces complexity. Let’s consider a simple example involving a shared counter that multiple threads are attempting to increment. This leads to shared state issues, where race conditions can occur if multiple threads read and modify the counter simultaneously without proper control.

00:09:11.320 When two threads operate on the same counter, they may end up with inconsistent results due to their concurrent actions. Ensuring that only one thread can modify the counter at a time is essential to prevent these inconsistencies. Most programming languages solve these issues with locks or similar mechanisms.

00:10:02.599 The basic solution typically involves implementing a lock to ensure mutual exclusion. When a thread reaches the code section involving counter modification, it acquires the lock, preventing other threads from entering that section until the first thread has completed its operation.

00:10:38.760 While locks control access effectively, they also place a burden on developers, who must remember to use them correctly. Mistakes can lead to deadlocks or race conditions if locks are forgotten or misapplied. It creates a pessimistic form of concurrency management, limiting the number of threads performing operations.

00:11:15.119 An alternative approach is to use Software Transactional Memory (STM), designed to simplify concurrency by allowing multiple threads to work with shared memory more efficiently. It offers a more optimistic approach, where threads can execute concurrently, and the system only checks for conflicts when they try to commit changes.

00:12:31.920 If no other thread has modified the shared state during the transaction, it commits its changes. If a change has occurred, it rolls back and retries, maintaining consistency without requiring explicit locks in most scenarios. This model is more flexible and robust, as it avoids common pitfalls associated with locking mechanisms.

00:13:30.440 Finally, let's explore the message passing approach to concurrency, which eliminates shared state altogether. Messages are sent between isolated components where state is encapsulated. This approach encourages the design of systems where components collaborate through message passing, increasing modularity and ease of maintenance.

00:14:13.719 In this model, a server function handles incoming messages. It reacts based on the message type—incrementing a counter or returning its current value. By having dedicated instances managing their own state, we can achieve concurrency without worrying about modification conflicts, as threads interact only through message exchange.

00:15:02.760 The message passing model results in higher scalability and simpler coordination, especially for distributed systems, since it can easily expand with additional clients communicating with the server. Implementing location transparency allows seamless integration across different nodes in a network.

00:15:52.960 In conclusion, I discussed several concurrency models: traditional shared memory, locks, Software Transactional Memory, and message passing. Each paradigm has advantages and disadvantages, and selecting the appropriate solution often depends on the specific challenges we face in our applications.

00:17:41.360 If you want to explore these topics further, I recommend resources like 'Seven Languages in Seven Weeks,' which covers several programming languages, including Clojure, Erlang, and Haskell. These languages offer insightful approaches to concurrency and immutability that can enhance your understanding.

00:18:51.480 Lastly, there are frameworks like Celluloid that support message passing in Ruby, allowing you to experiment with concurrent programming paradigms without switching languages. Thank you very much for your attention!

00:19:41.440 Now, I'd like to open the floor for questions. Feel free to ask anything related to concurrency or the topics I covered today.

00:20:36.160 Thank you for your questions! To summarize, message passing indeed runs concurrently in isolated environments, allowing seamless communication between components while managing their state effectively.

00:20:57.680 As we delve into concurrency, it’s important to evaluate each model for its specific circumstances, ensuring effective and efficient program design and performance.