00:00:06.000
Christian and Anatoly welcome everyone to Malmo. You are not in the GR event; this is the Baltic Ruby. My name is Christian, and we both work as senior staff engineers at Zendesk. It's really nice to be here giving this talk.
00:00:19.279
We’re excited about this event because it’s related to what I’m going to talk about. You might be surprised, but this talk is about the history of a Rails monolith. You might wonder how this is related, but if you know a bit about the community, you understand that it’s not just about technology.
00:00:32.119
The main aspects are the politics, the drama, and the mess involved. Let’s be honest; there’s a lot of drama in the Ruby community, and it took me only a few minutes to recall plenty of stories. I’m not criticizing; I actually find the drama entertaining. As Matt once said, open-source communities tend to lose momentum when people become bored, but we seem to thrive on it.
00:01:15.040
A few years ago, I attempted to learn Scala and attended a few conferences. Unfortunately, I can’t show you actual pictures, but let’s just say the average Scala speaker’s appearance left much to be desired.
00:01:32.920
Let’s start discussing the main topic of the talk. Recently, I reflected on my career over the last ten years. There were multiple reasons for this reflection: after ten years abroad, I returned to Barcelona, it marked my tenth anniversary at Zendesk, and I’ve been watching a TV show called "Great People of Culture". It’s an anime series about typical elves, warriors, and wizards, following a tradition of magic that can metaphorically represent technology.
00:02:21.000
One of my favorite examples is the novel "A Wizard of Earthsea", where the idea is that you can perform magic if you know the true names of things. This concept aligns perfectly with abstractions in object-oriented programming.
00:02:56.560
As Arthur C. Clarke famously said, 'Any sufficiently advanced technology is indistinguishable from magic.' How does this relate to my talk? In the anime, the main character is an immortal elf who has lived for 2,000 years. The show is primarily told through flashbacks, showing how certain spells and technologies revolutionized magic. This concept resonates with our view as engineers at Zendesk.
00:03:39.600
Anatoly and I feel we have a rare form of knowledge, much like the elf's forgotten magical wisdom. We both have been with Zendesk for ten years, which is quite rare in this industry. We want to share what we’ve learned, particularly about our Rails monolith.
00:04:21.600
I will discuss not only our monolith at Zendesk but also the concept of monoliths and how it has evolved over the past decade. I can take you back to a time when monolithic architectures were viewed negatively. Five years ago, they were completely out of fashion.
00:04:40.000
There's been a recent comeback, and I believe that many software debates within the engineering community behave like pendulums. Opinions often swing between extremes throughout the decades. While it's important to respect varied opinions, some pendulums can be dangerous if you lean too far in one direction. For instance, remember the time when NoSQL databases were deemed to replace relational databases?
00:05:25.759
I doubt companies that discarded all their relational databases fared well. Personally, I think that the microservices boom was driven by a zero-interest rate phenomenon, whereby companies had abundant funds to optimize their tech before truly needing those optimizations. It's interesting to note that after I presented these ideas a few months ago, DHH echoed a similar sentiment on Twitter.
00:06:17.560
I’m not implying you should never implement microservices; rather, they do make sense in various situations. Just months ago, a friend in Montreal described his company’s overly complex microservices architecture, and I initially thought they were doing great—only to find out they were struggling.
00:06:33.080
It's challenging now with the layoffs that many companies face. Imagine managing thousands of microservices when your workforce is cut in half. Still, let’s not lose hope; there’s a space for microservices, but it’s vital to evaluate when they make sense.
00:07:30.360
Looking at our journey at Zendesk, we initially created a monolith in Rails. The idea was to extract parts of it into gems so we could save time when building new applications. Currently, however, we are at a stage I call the 'event-driven architecture era' where different applications generate events through Kafka, consumed by other applications.
00:08:18.240
Despite the big monolith, we’re experimenting with modularization. We’re using patterns from Shopify to create clearer boundaries within our code. However, before delving too deeply into the history of Rails, I want to touch on the front-end developments over the past decade.
00:08:53.759
When I entered the industry around the early 2010s, it felt like entering the 'Great JavaScript Wars.' Every week, a new JavaScript framework emerged. Back in the day, many big applications used various front-end technologies, like Backbone, Ember, or React. The enthusiasm for creating new frameworks among JavaScript engineers is legendary.
00:09:57.880
At Zendesk, we had our in-house framework, lovingly called 'CJs' (nobody remembers the original name), developed by an unknown engineer. It was never documented and has become a daunting maintenance task over the years. If that engineer is listening, please reach out.
00:10:52.480
Eventually, like many companies, we centralized our efforts on React, rewriting most of our front-end code. I genuinely enjoy working with React, especially with React Native. However, I sometimes wonder if React triumphed simply because people grew weary of constantly learning new frameworks.
00:11:13.840
Let me shift gears and discuss infrastructure and deployment. Over the last 15 years, we transitioned from VPS services like GigaSpaces and Rackspace, eventually moving to our own metal. After using dedicated servers for five years, we decided to migrate to the cloud using AWS.
00:12:03.680
Currently, all our databases run on AWS, utilizing Aurora. I won’t go into detail about this part of our journey—there are blog posts explaining the shift, including interviews with our original CTO.
00:12:45.760
Next, I’d like to discuss how we adopt new technologies at a large company like Zendesk, which has around a thousand engineers spread across multiple global offices. Despite using a variety of technologies, Ruby remains our primary language.
00:13:32.680
In the past, integrating new technology was relatively easy; you just needed to talk to your manager. However, it led to chaos, as engineers had different needs and experiences. To address this, we implemented a 'tech menu,' requiring proposals for new technologies to be reviewed and approved by a team of architects.
00:14:07.680
Some discussions have drawn much attention within Zendesk, particularly around which languages to use for microservices. We accepted Java and Scala, but ultimately Java became the more dominant language due to hiring challenges with Scala.
00:14:57.760
Acquisitions have been instrumental in introducing new technologies. For example, we acquired Zing, a chat company from Singapore, ten years ago, which used Python. Consequently, working at Zendesk could also involve Python.
00:15:43.680
The most intriguing aspect of acquisitions is domain fusion. As a customer support company, we manage tickets, while Zing focused on chat. Thus, integrating the two systems raises conceptual questions about what constitutes a ticket or a chat conversation.
00:16:36.840
To create a common experience for our users, we need to ensure that the integration feels fluid. Our historical experience leads to essential agreements about shared features across various domains, aligning everything to the concept of a 'ticket' as our core building block.
00:17:19.200
Now, I'll hand over to Anatoly, who will dive into database performance.
00:17:50.200
Anatoly begins his presentation by expressing appreciation for Japanese-made products, stating that he uses Ruby at work and pointing out his Japanese-made guitar and watch. He then asks how many in the audience have used Ruby for over 15 years.
00:18:04.800
Anatoly started with Ruby in 2007, and it remains his favorite language. Shifting topics, he indicates that complex systems resemble weather systems—unpredictable. For instance, decisions within large systems may not align with expectations.
00:18:41.200
Anatoly references a saying that all happy systems are alike, while each unhappy system is unhappy in its own way. This challenges the notion that perfection can be achieved without addressing every little detail.
00:19:06.240
Anatoly emphasizes that the database is the primary bottleneck at Zendesk. When queried whether Ruby is fast or slow, he argues that database performance ultimately determines application performance.
00:19:51.760
Drawing attention to complexity, he explains how the size of the data set impacts response time and requires attention. Anatoly makes the point that addressing database efficiency is paramount, as the data structure dominates performance.
00:20:51.360
He highlights three critical aspects of databases: consistency, runtime, and complexity. You cannot have all three at once, and the choice of which two to prioritize shapes your infrastructure design.
00:21:14.240
Anatoly discusses Zendesk's early days when database queries were relatively straightforward, but as data increased, performance declined, exposing the complexity of real-time queries.
00:22:05.960
He notes that as the data set expanded, complex queries would no longer work. They consequently began optimizing the database to address the increase in data while moving away from complexity.
00:22:56.320
Anatoly brings up the Pareto principle, which suggests that a small percentage of queries consume the vast majority of resources. Optimizing these queries can yield significant performance improvements.
00:23:36.480
Turning the focus back to Christian, he shares thoughts on maintaining the monolith, echoing the importance of testing and reliability.
00:24:04.080
Christian mentions an ongoing debate in the Ruby community about testing versus typing. Traditional Ruby philosophy suggests that rigorous testing eliminates the need for strict type definitions.
00:24:38.560
After introducing the Ruby 3 type system, the Zendesk team still relies heavily on tests with over 1.6 million lines of code and approximately 55,078 tests in place.
00:25:09.600
Christian introduces Ryan Davis, the protector of the main branch. He humorously emphasizes the importance of testing, illustrating the significance of ownership in a vast codebase.
00:25:49.840
In the codebase, ownership is assigned to teams or individual engineers to facilitate smoother code reviews and integration.
00:26:17.120
Christian then highlights the historical reliance on MiniTest within Zendesk, as it provides speed and efficiency, running 11,572 tests for each pull request.
00:27:05.480
Despite efforts to enhance reliability, challenges like flaky tests persist due to the vast size of their test suite.
00:27:49.320
Historically, testing required running the entire suite when just a single test failed, but advancements now allow for more efficient testing, improving reliability.
00:28:31.480
Christian underlines Zendesk's commitment to keeping the codebase maintainable using ownership and testing without compromising quality during upgrades.
00:29:14.880
He outlines challenges faced over the years, including heavy reliance on metaprogramming causing difficulties when updating Rails versions and drawing attention to issues illustrated by deleted or modified code in their branches.
00:30:32.960
Christian reiterates how they struggled with maintaining effective features in previously diverged gems and frameworks during transitions to updated Rails.
00:31:54.480
In closing, Christian emphasizes the importance of learning from the past, sharing that they have had several first-mover disadvantages in the competitive Ruby space, often leading to larger issues down the road.
00:32:35.120
To avoid repeating those mistakes, he advises that when you discover unmet needs or challenges within the Rails environment, rather than executing significant changes independently, collaborate with the community for better outcomes.
00:33:09.520
As they wind down their presentation, they emphasize the interconnectedness between successfully scaling performance and managing engineering organizations: while mitigating performance issues is vital, understanding the organizational landscape plays an equally crucial role.
00:34:19.320
Finally, Christian shares reflections on the themes of history, lessons learned, and hopes for the Ruby community's bright future—urging participants to consider potential opportunities at Zendesk as they continue hiring.