Delton Ding

High Concurrent Ruby Web Development Without Fear

RubyKaigi2017
http://rubykaigi.org/2017/presentations/DeltonDing.html

High Concurrent Ruby Web Development Without Fear
We've been debating on the concurrency solution of Ruby for several years. Numerous custom "evented" drivers have been built, but for most of these projects, developers are required to think in the "evented" way to get things work properly, which not only breaks the elegance of Ruby programming, but also greatly increases the complexity of the refactoring process.

We will then think in Ruby, looking for the solution to make your whole web application "evented" with great meta-programming features of Ruby language itself. So that, you could still concentrate on your business models while programming as usual, but the performance may boost to 5 times faster or more without any hesitation.

RubyKaigi 2017

00:00:00.210 It's time to get started. The next speaker is Mr. Delton Ding, and the title of the presentation is "High Concurrent Ruby Web Development Without Fear."
00:00:10.260 Hello everyone! In Japanese, konichiwa. My name is Delton Ding, and I'm a freelancer based in Shanghai, China. I must admit that I'm feeling quite nervous standing here.
00:00:24.439 I felt a bit sleepy last night. This is my third time visiting Japan and my second time attending RubyKaigi. However, this is my first opportunity to give a talk here, and I was expecting it to be just another trip to Ruby and another trip to Japan.
00:00:42.750 Unfortunately, it turned into a fresh new adventure in Japan. Before coming here, I didn't realize how strong the typhoon would be until I saw an announcement from a nearby animal park informing us that they had removed the animals due to the typhoon. This situation could have turned serious.
00:01:18.570 On September 16th, I had just finished my talk at RubyConf China in Hangzhou. I flew back to Shanghai and attempted to take a plane to Hiroshima. Just as I was arriving at the airport, I received a phone call informing me that my flight was canceled.
00:01:54.810 I had to contact my airline, and they told me that all flights to the Kansai districts—including Osaka and Nagoya—were canceled. Luckily, I managed to land in Tokyo. I had bought a ticket for the Shinkansen, so I could take the Nozomi train without any issues.
00:02:24.830 I thought it would be safe to arrive in Hiroshima before 11:00, but when I reached Tokyo, I found out that the Tokaido-Sanyo Shinkansen was canceled. I asked a junior staff member, and they mentioned it was back on, but there were massive delays.
00:02:38.040 Although I was supposed to arrive in Hiroshima by around 11:00, I actually reached my destination the next day. By the way, it will be the tenth anniversary of RubyConf China next year, so we highly welcome everyone to join the conference.
00:02:57.770 Now, let's dive into our main topic. When I say 'high concurrency,' what do I expect? I am not referencing defeating Erlang, Elixir, Crystal, or any other language. However, that does not mean we should neglect Ruby's performance.
00:03:21.980 I want to tell a story first. I know a startup company in China called Zhu Jie; it’s an app similar to Instagram that was written in PHP for its backend. Back in March 2015, they released a very popular photo feature that garnered 300 million page views every day.
00:03:35.900 Unfortunately, their server eventually broke down. They survived intense pressure, even after expanding to ten API servers, but when they finally reconstructed their backend after a month, most of their users had moved on.
00:04:06.880 The concurrency methods I am expecting are those that could prevent such occurrences. I call it the 'sweet spot.' This sweet spot means it's easy to use, similar to your previous Ruby projects, and avoids unfamiliarity; no need to learn a lot of new things. It also gives you time to migrate to another language if it's necessary.
00:04:19.280 Here's an example: I was about to leave Shanghai Pudong Airport, in a hurry, only ten minutes before my flight was scheduled to take off. However, there was still a very long line in front of the security check.
00:04:34.199 I had to ask people if I could go ahead in line. There was a couple in front of me. Regardless of whether I spoke in English or Chinese, they couldn't understand me. I noticed they had Japanese passports, but I know very little Japanese.
00:04:54.479 I struggled to ask politely, and I said, "Tasukete kudasai," which means 'please help me'. In English, I attempted to express that I was running late for my flight, but I forgot how to say the word 'takeoff'.
00:05:14.090 So I resorted to body language, and fortunately, they understood me. That’s the sweet spot I mean: it should be relatively easy for someone like me, a one-month Japanese learner, to convey a message, allowing enough room for native speakers to grasp the meaning.
00:05:27.569 When discussing increasing concurrency, there's always debate about whether Ruby is slow. I say yes, in some cases.
00:05:37.559 Codeforces conducted a benchmark test on heapsort. They inserted the seventh power of ten elements into the heap and measured the execution time. The resulting graph didn’t look too bad, but the tricky part here is that it's a logarithmic scale. Each line is ten times lower than the previous one.
00:06:07.240 But this doesn't mean much on its own; I say this is not a real use case—it’s a computation-heavy benchmark. Languages that are compiled or have JIT (just-in-time compilation) have an advantage in these benchmarks. However, when we use Ruby in most cases in production—such as web servers—we are generally dealing with I/O-heavy tasks.
00:06:24.180 Therefore, we need to set a real-use case benchmark to identify the actual problems. We also need to be careful to determine whether it's Ruby's problem or the problem with Rack or Rails. I know some people can be confused between Ruby and Rails.
00:06:46.040 For example, here's a benchmark of Sinatra, which is not ready for use since it operates as a Rack server. In my opinion, that’s somewhat misleading because it handles synchronous requests and responses, but when faced with other I/O operations, like a database query, it reverts to working like other servers.
00:07:01.320 The official benchmark used a 'Hello World' case that didn't reflect this issue. Still, from this example, we see that asynchronous behavior is crucial. If we make a database asynchronous, we could gain performance, for example, 3 to 10 times faster.
00:07:26.160 This raises the question of why Ruby needs an asynchronous web server. The Ruby VM has a Global Interpreter Lock (GIL), so a multi-threading model might not provide optimal performance.
00:07:37.320 Additionally, most Ruby web applications, for example, those using Rails, might consume hundreds of megabytes to gigabytes of memory. This further impairs performance under a multi-process model. On the other hand, using a lightweight implementation—like a server that consumes only a few kilobytes—could still handle 10,000 requests per second.
00:08:05.040 However, achieving this in Ruby applications is challenging, and thus asynchronous processing may be the only viable route to enhance concurrency for Ruby web apps.
00:08:24.360 But, it breaks Ruby's programming logic. If you write asynchronous routes, they may not align with the Ruby language we’re familiar with. You have to define the running order of your code, which doesn't please many programmers.
00:08:42.150 Some people even recommend that if you truly want asynchronous behavior, you should write Erlang. However, about ten years ago, I remember the media claimed Erlang would soon be as popular as Java. But recent Tiobe indexes show that Erlang’s popularity has been outperformed even by Crystal.
00:09:06.879 Software development is not a one-man orchestra. Erlang is mathematically pure and great, but it can be quite challenging for a lot of beginners, making it difficult for companies to hire Erlang employees.
00:09:30.120 Ruby is not as well known for its mathematical model, but it incorporates advantages from various languages. So, how do we leverage the advantages of other languages?
00:09:54.360 At RubyKaigi this year, many speakers mentioned the callback hell phenomenon of JavaScript. They claim Ruby is designed to make programmers happy, but using EventMachine for asynchronous development leads to a similar callback hell.
00:10:17.130 For instance, if I make an HTTP request, save data to Redis, then save it to the database before sending a response to the user, I end up in callback hell. EventMachine is just another version of the callback hell we see in JavaScript.
00:10:59.520 Combining the advantages of other languages becomes very challenging. Now let me redirect your attention to Odaiba in Tokyo, which features a mini version of the Statue of Liberty.
00:11:28.300 However, an interesting sight is that just 50 meters away, there's a large shopping mall and a giant Gundam statue, even taller than the Statue of Liberty, along with the Rainbow Bridge behind the statue.
00:11:45.840 When I first saw the bridge, I wondered if any laws regarding LGBT rights had been passed in Japan. But I later learned that it was simply the Rainbow Bridge!
00:12:02.610 When viewing these things separately, everything appears amazing, but when you appreciate all these elements together, it creates a wow factor. However, achieving such combinations is quite challenging.
00:12:26.040 For languages like Erlang or Node, you need to simulate synchronous operations with asynchronous methods. What if we could simulate a synchronous approach with synchronous messages? While it's generally not feasible, it could be possible in web servers.
00:12:44.500 As long as we don't have side effects, we could change the order in which requests are processed on an API server; the only obstacle is maintaining order in I/O inside the request processing.
00:13:00.939 Using fibers can make this task easier. I won't elaborate too much on fibers since another speaker at RubyKaigi discussed fibers and might have deeper insight.
00:13:14.820 In essence, fibers are lightweight, allowing us to switch context. For every request we receive, we can set up a fiber. When the fiber requests I/O—like a database query—it relinquishes control back to the I/O pool. When the I/O operation finishes, the I/O pool resumes the fiber.
00:13:26.220 However, to implement this, we would need to rewrite everything from database drivers to ORMs. Is there a way to accomplish this without rewriting all that code?
00:13:43.530 One method is through metaprogramming. There's an excellent project called Async Routes by Ilya Grigorik that was submitted seven years ago when fibers and EventMachine were newly released. This project faced significant challenges with Rack and Rails.
00:14:04.680 Back then, fibers could only handle four kilobytes of memory, and the Rack middleware operated as a stack inside the fiber. Rails used numerous middlewares, deepening this stack and often exceeding the four kilobyte limit.
00:14:25.559 Despite its experimental nature, this project is no longer maintained. A more recent attempt called Auto Fiber was made just four months ago, and Matt discussed it yesterday. Yet, due to many drivers using C synchronous calls, they are unable to pull instances of I/O objects.
00:14:39.360 Thus, you can't resolve the I/O objects via Auto Fiber or other fiber methods; you might still need to manage synchronous locks manually to make it functional. Auto Fiber sometimes works, and sometimes it doesn't, so manual intervention is still necessary.
00:14:57.690 My project aims to address these issues. We need to hack these drivers manually and ensure the web framework is lightweight. This approach will not only lessen the chances of exceeding fiber memory limits but also enhance fiber performance due to quicker context switching.
00:15:15.750 Last year, we set specific goals during the RubyKaigi to build a web framework that avoids I/O blocking, adopting a lightweight stack design. We introduced metaprogramming for databases, allowing us to integrate engineering capabilities that are production-ready, and not merely toys.
00:15:36.720 Here’s a demonstration: we have an empty project. We added the Gemfile, including dependencies like Redis driver, Redis ORM, and more. However, instead of requiring them directly, we metaprogrammed these drivers to work asynchronously.
00:16:01.720 Then we created the DB models—users and tasks—along with configuring various middlewares. Later on, I’ll explain why middlewares are interesting, particularly mentioning needing a CORS middleware for front-end and back-end separation.
00:16:18.193 Following that, I added routes. The API routes are quite straightforward, resembling Sinatra's design. Notably, unlike Sinatra, where routing must be determined globally, our framework allows routing hierarchy, simplifying modularity.
00:16:39.390 Moreover, our routing can be designed to have features similar to Sinatra. We utilize depth-first search (DFS) before program execution to analyze the mounting chain, flattening it to prevent stack growth.
00:16:57.330 Now, we assume, upon further development, our backend server can handle 2,000 to 3,000 requests per second on a single core and thread. Our example project is available on GitHub for reference.
00:17:23.220 To sum up, when it comes to performance benchmarks, you're still coding as if you're doing it in traditional Ruby, yet experiencing significant performance gains. Interestingly, when version 2.0 of Middlee was released, it exceeded the performance of Express.js for a brief period.
00:17:40.710 However, this was short-lived as the later version of Express.js surpassed it again, but Middlee continues to outperform many alternatives.
00:18:03.970 When discussing middleware in the context of Rack, middleware functions as a stack. Once a request comes in, it delves deeper through each middleware, and upon completion, the middleware stack unfolds in a pop operation.
00:18:22.850 However, by dividing this process into two phases—from the request to the API and back to the response—we introduce an optimization potential. This recursion can be optimized into loops.
00:18:41.370 Thus, middleware optimization results in much lighter stack consumption, which is advantageous. Additionally, there are other tricks to improve performance; for example, the MySQL server routines often do not expose the I/O object.
00:18:57.270 Many asynchronous drivers work hard on this point, while our solution simply extracts the file descriptor number and creates an I/O object tied to that descriptor.
00:19:15.560 This allows us to efficiently assess whether the descriptor is readable. Furthermore, we support real-time communications, such as WebSockets, enhancing performance—our benchmark shows it's about twice as fast as Action Cable.
00:19:35.310 HTTP 2.0 will be a crucial feature in our next milestone release. We're planning to release our official website and tutorial by the end of September, with a stable API rollout expected in October.
00:19:56.620 Concerning I/O improvements, we currently utilize NIO for our engine, and we've detected areas for performance enhancement. While writing C extensions can be risky, we still aim to use widely adopted methods.
00:20:17.690 We've implemented some of these improvements in the event machine, only to discover bugs affecting performance, especially on Mac OS. The author of EventMachine has acknowledged similar issues.
00:20:39.419 During my project migration to NIO, I saw that NIO employs a select model instead of the KQueue model used in Mac OS. When I queried the NIO team, they stated that Mac's implementation differs from the OpenBSD version.
00:20:59.610 Thus, they believed no one would run a production environment on Mac OS, hence no fix was forthcoming. As a result, running Ruby could consume about 30 megabytes of memory per process.
00:21:24.900 We may need to experiment with alternative read modes for Mac OS, especially after the introduction of the GIL model in Ruby 3.
00:21:51.620 In a recent test, we conducted trials on an 8 core server, directly reaching 60,000 to 70,000 requests per second, which is promising.
00:22:10.829 Moving forward, we still plan to add scaffolding to assist Rails developers in transitioning to Middlee, allowing them to retain familiar Ruby grammar while gaining boosted performance.
00:22:30.610 This integration of elements will contribute to a rare combination. Our project, which gained traction last September, has a roadmap that I initially created for myself.
00:22:54.120 Interestingly, the project gained unexpected visibility on GitHub Trending and received interest from prominent figures in the Ruby community, even before the code matured.
00:23:15.420 While responding to initial critiques pushed me to enhance the project, reaching version 0.4.1 gained the attention of other Rubyists investigating the code.
00:23:36.640 I was deeply moved by the recognition of my work and I consider the Ruby community to be undergoing a transition from disillusionment to enlightenment.
00:23:54.470 Nevertheless, not all languages experience this kind of shift. For example, Common Lisp's position on this graph may reflect that it may never return to enlightenment.
00:24:13.640 The languages on the right side are relatively new, boasting excellent features; however, older languages, including Ruby, must adopt modern languages' attributes.
00:24:38.110 While some people may not consider Ruby cool anymore, I remain optimistic due to its active community, which is eagerly contributing to its development.
00:24:58.770 Through these contributions, I believe we can continue to improve Ruby.
00:25:27.790 We have about ten minutes left, so feel free to ask. I’m a bit nervous, so please.
00:26:02.070 That's pretty interesting; there are about twenty to thirty thousand people who may understand Ruby in China, but I believe only a fraction of them—perhaps one-tenth—use Ruby daily in production.
00:26:23.290 Ruby China is an online forum with many discussions taking place about Ruby. It's one of the most vibrant communities among Chinese programming communities.
00:26:37.540 Regarding the RubyConf China conferences, are they conducted in Chinese?
00:26:57.690 No, they're open to international visitors, and they provide translation into English and other languages.
00:27:05.330 Thank you for clarifying that point; I meant the conference language rather than Chinese individuals.
00:27:17.290 I have a couple of questions. Firstly, I understand that the main target of the Midori framework is to be lightweight and not akin to Jelly or other truly concurrent implementations.
00:27:33.280 Is it due to the limitations posed by EventMachine?
00:27:45.950 I would say the fiber issue is a problem, especially since the initial implementation of Midori failed to integrate well with JRuby. We're still attempting to make it work on JRuby.
00:28:00.020 About ten minutes remain, so please go ahead and ask questions.
00:28:16.320 I have two questions. First, is your production server using Midori?
00:28:38.130 Yes, it is.
00:28:50.450 What was the most difficult task in migrating your server to Midori?
00:29:06.420 The most significant challenge was starting from scratch. Since it's not built on Rack, we had to create scripts for initiating the server and write migration scripts even when Sequel was supported.
00:29:23.930 You also included Sequel in your Gemfile but didn't require it initially. It seems you applied metaprogramming to make everything asynchronous. Can you elaborate on how that functions?
00:29:42.830 Indeed, while you don't need to require Sequel explicitly, invoking Midori necessitates Sequel. This arrangement ensures that the order of the gems and metaprogramming processes is respected.
00:30:02.590 However, the complexity of the metaprogramming incurs difficulties, which we worked out over two months to establish functionality.
00:30:19.055 What happens if the classes you modify change with version updates? Does this necessitate reworking your metaprogramming?
00:30:41.790 That's a significant concern, and we ensure we lock our versions to mitigate issues. Still, progress updates could break integrations, meaning we have to evaluate the metaprogramming code when original gems change.
00:31:03.250 What was the unresolved issue with EventMachine that you were worried about? You mentioned a bug lingering for four years.
00:31:22.920 The main challenge is that, when using an asynchronous server with EventMachine on Mac OS, it reaches about 200 requests per second, which is quite low.
00:31:41.200 That’s peculiar. I’ve had code running with EventMachine on Mac OS achieving 15,000 requests per second.
00:32:03.760 I’ve provided a minimal code example that illustrates this issue to the author of EventMachine.
00:32:20.420 There has already been an ongoing ticket regarding the EventMachine's problem.
00:32:34.030 They suggested they would address it in version 1.0, but that expectation has shifted to 1.4, and now it's at 1.3—it’s quite strange.
00:32:44.360 Is there a problem? Yes, it's likely that the subject hasn't been thoroughly examined, but I will investigate if I can replicate it.
00:32:59.750 I’ve been utilizing EventMachine without any issues, so I’ll run some tests to identify any inconsistencies.
00:33:15.970 Is there any other question? Thank you for your time. Enjoy your lunch!