00:00:09.160
Okay, let's start the second presentation. Welcome our second speaker.
00:00:14.759
Cristian Planas is a Staff Engineer at Zendesk, and he will bring us the topic about a really huge monolith. So, we are capable of doing that in Rails. Let's welcome Cristian!
00:00:22.439
Good luck, thank you!
00:00:48.800
Hello, my name is Cristian. This is a talk for the Ruby Warsaw Community Conference. You didn't enter the wrong cinema; I'm still going to talk about Rails.
00:00:55.520
Funny enough, until I literally arrived at the theater, I didn't know this talk was going to happen in a cinema.
00:01:06.200
Anyway, I wanted to start by talking about Gladiator 2, and you will see why. So, I decided that it would be nice to put the trailer; it's a short trailer—only 1 minute.
00:01:20.600
[Trailer plays] This is not a matter over which a duel should be fought to the death; it should be settled quietly. If this were to get out, war would follow. Rome, if you think this can be stopped, you're fooling yourselves. All of this madness began with Maximus Meridius, and now, years later, we are paying for his deeds.
00:01:40.920
You rest your bones; I'll finish your quest for you.
00:01:46.399
I'm tired of reliving that nightmare every day. What do you want from me? Revenge. This is the last step in a master plan, long in the making.
00:02:05.039
Let the games begin!
00:02:48.239
I want you to understand that Ruby is very special. We are a really special community. This presentation is about the history of Rails through the monolith of Zendesk, which has been around for 17 years now.
00:03:00.800
Why is storytelling important? If you like cinema, and I love cinema, there's been a lot of discussion about how historically accurate 'Gladiator 2' is.
00:03:11.560
It's not the first time we have this debate; remember Napoleon, the previous movie by Ridley Scott? A lot of people said that they had to add drama to the story, making things more beautiful.
00:03:29.319
Some directors do this. But do you know which film didn't need any embellishment to be epic? Our community is full of dramas; it's actually fantastic. Last year, we saw DHH versus TypeScript; there used to be a website called 'Ruby Dramas.com.' I think they removed it. It's fascinating. Don’t get me wrong, I love this; it’s a huge entertainment value.
00:03:50.879
For about a year or so, I tried to become a Scala developer, and it was terrible. I went to a Scala conference, and the majority of speakers looked like responsible people in nice suits. Meanwhile, I also attended a Ruby conference where the keynote speaker looked very different.
00:04:10.360
I mean, we are special, that’s for sure. Now, let me introduce why I wanted to talk about this topic. For me, it's an interesting moment in my life. Maybe that’s why I'm reflecting on the last 10 to 15 years; that's the time I've been a Rails developer.
00:04:31.080
I moved back to Barcelona, my hometown, after 10 years. I also celebrated my 10th anniversary working for Zendesk.
00:04:41.479
Additionally, I started watching this TV show called 'Feren.' Does anyone know about it? It's an anime show in a typical fantasy environment—like 'Lord of the Rings,' featuring elves, magic, and swords.
00:04:54.240
It follows a beautiful tradition in fantasy of talking about technology and various themes philosophically through magic.
00:05:10.199
One of my favorite novels states that magic is knowing the true names of things. This resonates with me; if you create the right abstraction, you control it. It’s how I think.
00:05:22.280
This metaphor isn’t new; it reflects ideas from various thinkers. What makes 'Feren' special and relevant here is that she (the main character) lives in a world where there are very few elves. Elves are immortal, and the majority of the characters are human.
00:05:36.199
She experiences development through magic and other elements across centuries. Events that initially didn’t matter become significant over time.
00:05:45.440
This mirrors how we develop technology, particularly software. After watching the show, I started identifying with Feren. She's an elf over a thousand years old, possessing forgotten magical knowledge.
00:05:58.400
I feel equally mythical; after spending over 10 years at Zendesk, I possess forgotten magical Rails knowledge I'd like to share.
00:06:05.919
First, I want to address the concept of a monolith. This was quite popular in the early 2010s; it was seen as the way to go. But a few bad ideas arose during that time.
00:06:12.880
By the late 2010s, it became terrible to speak positively about monolithic architectures. Lately, it's become more acceptable to defend them. I find it unsurprising; debates in tech, particularly in software engineering, behave like a pendulum.
00:06:29.959
First, there’s option A, then A all the way, then to option B, and we oscillate between them. I really dislike when people talk about software engineering as a hard science; we are more like architects.
00:06:44.880
This doesn’t mean we should think all ideas are right; it can be a dangerous pendulum. Ideas can sometimes be wrong or impractical. I remember when many proclaimed relational databases were finished.
00:06:54.879
Everyone was told to move to NoSQL. While I use NoSQL databases sometimes, I believe relational databases are still very important.
00:07:09.839
If you thought that was spicy, here’s an even crazier take: I believe the microservices boom in the last decade was a zero interest rate phenomenon, meaning that a great deal of investment was poured into tech.
00:07:26.399
As a result, some things that shouldn't have worked well worked out just fine. I find this slide amusing because just a few weeks after I included it in my presentation, DHH said something similar.
00:07:41.920
Many people probably shared the same sentiment: microservices required an extensive workforce. I recall visiting a friend's startup in Montreal years ago.
00:07:56.480
He told me they had more microservices than customers. I was always worried he'd see my presentation and think I was making fun of him, because he said he was the only person who understood their architecture.
00:08:10.839
He felt they were not doing great. Now that there’s less money in tech, we’re seeing the trade-offs that come with using microservices.
00:08:19.480
Maintaining a service with thousands of microservices is far easier with 8,000 engineers than with just 2,000. The added complexity becomes much clearer in this scenario.
00:08:34.720
Don't misunderstand; I'm not against microservice architecture. I think it's a useful tool in the right circumstances, just like monoliths.
00:08:45.360
It's not like the current CEO of Twitter, who calls microservices a bloodbath; that's pretty crazy to me. My opinion aligns more with that of GitHub's CTO, Jason Gaylord, who says it’s a spectrum: when starting a new company or application, you should begin with a monolith and then gradually break it down into smaller services.
00:09:05.480
This is similar to what we did at Zendesk.
00:09:20.040
Initially, we aimed to create an ecosystem of Rails applications that shared logic. The idea was to centralize logic like authentication through shared files. If you look at our monolith, we have numerous private gems shared by several applications.
00:09:34.720
Then we entered what I call the service era. This period started when we began building services, not so much out of necessity but due to acquiring new companies.
00:09:50.080
For example, we acquired a chat company; you wouldn’t want account information for the chat service mixed with those of other products. Therefore, we needed a centralized account service to share data across applications.
00:10:02.880
Currently, we are in the era of event-driven architecture, where multiple services write events via Kafka, and other services consume them, triggering additional events.
00:10:19.360
Still, we maintain a large Rails monolith and engage with various models. We’re experimenting with different ways to use them.
00:10:30.480
Right now, we're using Prowler, but we’re still deciding how to incorporate it effectively. I didn’t want to move on from this section without mentioning front-end development.
00:10:45.440
If you've been working in tech since the late 2000s or 2010s, as I have, you’ve likely witnessed what I termed the ‘JavaScript Wars.' If you recall 2014, there was a new JavaScript framework making waves every few months.
00:10:58.200
At Zendesk, we still use a mix of Backbone, React, and Ember. I'm surprised we didn't use Angular, honestly.
00:11:09.760
We even have something amusing that we call CJs, a term that no one else understands. The reason for CJs is that one engineer named Cart wrote his own framework and then left the company.
00:11:25.440
As you can imagine, that was a challenge since we didn’t have any documentation.
00:11:32.159
Currently, we are centralizing on React for several reasons. First of all, React is a great framework—personally, I love React Native. But I believe that the community just got tired of so much change.
00:11:43.040
We needed to focus our efforts on learning one framework that would remain useful for a while.
00:11:58.400
Next, I want to discuss adopting new technologies—how we decide what we adopt at Zendesk. We were born as a Ruby company.
00:12:07.599
Yet, in many situations, additional technologies are necessary. Once upon a time, adding new technologies to the stack was relatively easy, and it proved to be a bad idea.
00:12:21.839
The challenge arises when you have a large team, like ours, with 1,900 people in product alone, and everyone adds whatever they want.
00:12:35.839
This chaos led us to establish a 'tech menu.' If you want to start writing applications in Elixir, great! You need to write a document and have it approved by the architectural team.
00:12:48.480
A notable example was deciding which language we wanted to use for writing services. The two winners were Java and Scala. I was on Team Elixir; we even wrote a proposal to adopt it, but it was rejected.
00:13:03.040
They told us that for those proposals, we already had Java and Scala. Over time, due to the size of our company, different parts were using Scala while others were using Java.
00:13:17.920
However, we faced difficulties hiring Scala engineers or teaching existing engineers how to use Scala. Now, more and more of the engineering teams at Zendesk are using Java for services.
00:13:32.080
Interestingly, most technologies are adopted not through intention but rather via acquisitions. Over time, we have acquired quite a few companies—around half of our main products come from acquisitions.
00:13:45.040
One example is our acquisition of Zopim, a company from Singapore. Zopim used Python, which has now become a Zendesk language.
00:13:57.760
A couple of years ago, I presented at RailConf about scaling a Rails application. I shared a personal story about my experience before joining Zendesk.
00:14:11.720
I co-founded a company called Playful Bet, which, although it never made any money, became quite popular. We were one of the 50 most visited websites in Spain, but I was the only engineer, which made scaling incredibly hard.
00:14:28.060
At some point, I realized it didn't make sense to complain when companies like Zendesk, GitHub, or Shopify were successfully scaling their applications.
00:14:38.960
To understand why some claim that 'Rails doesn't scale,' it’s important to note the situation in which Rails became popular. Here's a picture from 2007 featuring the founding partners; you can see only two.
00:14:55.880
The one in front is our first CTO, Morton, and this coincided with the rise of Web 2.0. Before this, websites were static. However, with Web 2.0, users began to interact more.
00:15:10.560
This led to an increase in the needed database interactions, especially with the emergence of platforms like Twitter, which became an example of how Rails could work.
00:15:24.680
It’s humorous that Twitter's struggles with scaling became an example of why Rails might not be reliable, even though they did have issues.
00:15:39.840
Do you know the infamous 'fail whale'? I wrote my master’s thesis based on Twitter’s architecture. The good news is that you are not alone in your experiences.
00:15:55.600
Many companies, including GitHub, Shopify, and Zendesk, have scaled Rails significantly. Shopify's work is incredible; during Black Friday, their CEO Toby Lut tweeted that they were receiving 60 million requests per minute.
00:16:07.520
You might wonder what Twitter leadership thinks about Rails now, considering its history.
00:16:18.760
I have enough material to produce a full presentation on that topic. But here's some shameless self-promotion—I wrote a book about it.
00:16:36.080
One of the most critical aspects of scaling an application is the database. Typically, it's the bottleneck for most applications.
00:16:54.740
There are multiple techniques for optimization, aside from caching. Sharding is one way that surprisingly many people may not be familiar with.
00:17:09.480
The basic idea is that if your database can be fragmented, especially in B2B environments, where you only want to share data with singular accounts, it could lead to a huge problem.
00:17:26.000
So, splitting the database into smaller, manageable units becomes crucial. For instance, if you have a billion records and break it into 1,000 databases, each should average 1 million records.
00:17:35.159
That’s much easier to manage! Another technique we utilize is archiving. We send all data to DynamoDB.
00:17:47.600
As for what data gets archived, that's a decision we continuously evaluate. We also regulate the functionalities afforded to archived data.
00:18:01.000
In general, we’ve found there are three properties in good databases: they maintain a consistent runtime, manage complex queries, and house a large dataset. However, you can only have two of the three.
00:18:19.920
In the startup phase, you can maintain both query complexity and runtime, but as your dataset grows too large, performance often suffers.
00:18:36.480
Microservices tend to scale better because they advocate for simpler queries. When data is distributed among numerous databases serviced by each, complex queries across multiple tables become unmanageable.
00:18:44.560
Next, I'd like to discuss testing and reliability in a big company. We're a large organization at Zendesk, and it’s timely to talk about testing, especially since an antivirus update broke Windows systems.
00:18:56.839
Please, test your applications thoroughly.
00:19:08.720
What I want to focus on here is the traditional debate in the Ruby community: STS versus types. You may have heard that Ruby applications are challenging to maintain due to lack of types.
00:19:23.640
The original premise of Ruby was you don’t need types, as long as you write a substantial number of tests to back it up. It’s critical to have tests, regardless.
00:19:38.320
I think both sides have valid arguments, but at Zendesk, we prioritize tests; we maintain extensive test coverage.
00:19:45.440
To illustrate, we maintain over 1.6 million lines of code. Can anyone guess how many unit tests we have?
00:19:55.440
No, it's not 5 million. It starts with a five.
00:20:00.640
So, we have 557,805 tests.
00:20:11.720
Ryan, one of our Junior Engineers, came to me recently, saying that they were unable to merge a feature because there was someone asking for more tests.
00:20:27.679
I responded, 'Oh, you must be talking about Ryan—he's the author of MiniTest and its main maintainer.' The next question new engineers ask is usually: why do we use MiniTest?
00:20:38.639
There are other more popular testing frameworks in Ruby, but we have our reasons for choosing MiniTest. It's smaller than other options, has less magic, and is much faster.
00:20:53.680
When we tried to apply more features to MiniTest, we ran into challenges. One significant issue is the prevalence of flaky tests, particularly when it comes to performance testing.
00:21:06.480
Right now, we run tests in groups of 2,000. This means if one test fails, you only need to rerun those 2,000 tests, which is much more efficient.
00:21:17.760
Previously, running over 100,000 tests would be necessary to identify a single broken test.
00:21:28.880
One key aspect of this process is ownership. With a significant codebase comes a lot of developers.
00:21:43.360
Historically, our monolith has seen 2,459 commits. While Zendesk has been around for 17 years—which might suggest a lot of code is from the past—570 different engineers pushed code just last year.
00:21:59.720
If we don't set limits and rules, we could easily fall into chaos. To prevent this, we use a GitHub feature that establishes ownership; all files must be owned by at least one team.
00:22:14.480
This makes changes more accountable. If an incident occurs, it’s much easier to detect who to approach.
00:22:31.520
Next, I'd like to discuss upgrading Rails.
00:22:38.720
We've been using Rails for a long time; we started with Rails 1.
00:22:46.640
Our original CEO, Mikel, worked with DHH before creating Rails.
00:23:02.760
Regarding upgrades, tests are invaluable in predicting what may break during an upgrade.
00:23:15.200
That's why we run tests on both the current version and the version we wish to upgrade to, which helps avoid regressions.
00:23:27.920
If a test suite breaks, that’s acceptable; as long as it turns green, it cannot go back to red, which protects everyone’s work during updates.
00:23:44.640
Throughout 17 years of upgrading Rails, we faced many pain points.
00:23:56.480
Too much metaprogramming was a significant issue. While I adore this book that explains Ruby, I found maintaining Rails code with excessive metaprogramming incredibly challenging.
00:24:06.640
In the 2010s, we all embraced this concept until we hit a wall during the upgrades to Rails 5.
00:24:21.840
Attempting to upgrade to Rails 5 broke most of our tests; everything seemed to explode.
00:24:32.920
At that point, we decided to adopt a stricter approach. For years, we had been fixing issues with monkey patches, doing the bare minimum to continue building features.
00:24:52.880
To maintain a well-maintained application, we had to go into full cleanup mode. The transition from Rails 4 to Rails 5 took two and a half years.
00:25:07.920
We meticulously cleaned our code, removing as much magic as possible, which was intensive work.
00:25:21.520
During the last year, we documented the evolution of test failure rates as we refined our approach.
00:25:38.440
What was challenging? One challenge was stronger parameters.
00:25:53.520
Initially, we implemented these slightly before Rails adopted them, necessitating updates across our endpoints to conform with Rails.
00:26:06.720
Another significant challenge arose with how dir and attributes functioned; we relied heavily on this in callbacks. The implications of these changes meant that endpoints broke or returned ambiguous results.
00:26:21.760
While the solutions weren’t incredibly complicated, they required extensive communication with countless teams worldwide.
00:26:36.560
For us, it took at least six months to update effectively. The bright side, however, is that since cleaning magic out of the application has made things significantly easier.
00:26:49.760
My concluding thoughts on this matter are that often, we faced the disadvantage of being first movers.
00:27:05.760
We tried building something ourselves before the community established solutions. Sharding, for instance, is something we've used for over 10 years.
00:27:20.160
We still have our unique logic for sharding, but one of our objectives now is to align with the standard that Rails has adopted since version 6.
00:27:35.840
I advise everyone to join the community. If you think a feature in Rails is deficient, create an issue, engage with others, and avoid working in a silo.
00:27:49.920
In conclusion, I'd like to leave you with a few thoughts.
00:28:05.760
I previously gave a presentation focused solely on scaling Rails performance but later realized that performance isn’t always the most pressing issue for companies.
00:28:20.800
The obviousness of performance problems, like a slow Rails app, is easily noticeable and often results in breaking.
00:28:34.160
Confronting a poorly managed engineering organization, however, is much harder to detect.
00:28:49.200
What defines a well-run engineering environment can even be debated. In my opinion, knowing history helps you appreciate the feel you get when you speak with a seasoned engineer.
00:29:02.720
When you have someone who has been around longer recognizing a problem, it feels empowering to both of you.
00:29:18.320
But the past can also be perilous. Beware of relying on prior knowledge simply because it's tradition.
00:29:32.000
There’s an anti-pattern called the 'Frozen Caveman' that I discovered while researching for this presentation.
00:29:48.080
So, what’s the takeaway? Listen—listen to the history of applications and programming languages, but maintain an emotionally neutral perspective, if possible.
00:30:03.840
I’m happy to be here today because we're currently hiring!
00:30:43.559
Hi, thanks for your speech! I have a question: what's the current version you're running and how lengthy was the last update?
00:30:55.200
I honestly am not sure; I think it's around Rails 7.1 or so?
00:31:05.240
Yes, typically we're about catching up. After moving to 5, we are generally on track, often just one minor version behind.
00:31:14.240
Shopify runs its large Core Monolith on Rails' main branch. Why not do that instead of running two branches?
00:31:29.880
Well, for a time, we didn’t want to jump too far ahead, and in the past, we were running version 4 while Rails 6 was already out.
00:31:41.400
Now, it makes more sense to adopt the main branch as we’re caught up.
00:31:50.480
Hi Cristian, that was an amazing talk! My quick question: I noticed much custom code and libraries at Zendesk.
00:32:02.520
How do you tackle onboarding new engineers? How long does the process take, and could you share two or three best practices?
00:32:14.320
That’s a valuable question. It varies significantly. Depending on which team someone is joining, the code base they’re dealing with may be quite different.
00:32:26.960
For example, if you work in a smaller service, the learning curve is more manageable. However, working in the monolith poses challenges.
00:32:41.679
In essence, you can join a specific team, grow within that framework, and remain there, although switching teams is also an option.
00:32:58.640
Yes, I've personally been in four or five different teams over my time at Zendesk.
00:33:09.360
Regarding event architecture, do you employ techniques like event sourcing or CQRS?
00:33:15.960
Yes, we are moving towards more event sourcing for inter-service communication.
00:33:27.960
Thank you for the talk! How long does it take to run your 55,000 tests on CI?
00:33:39.680
It varies, but it typically takes around 15 to 20 minutes.
00:33:46.480
We run them in groups of 2,000 tests for better performance.
00:33:54.920
Do you have system tests or integration tests? How do you manage these?
00:34:02.479
We have a set of feature tests that run alongside the unit tests. Before deploying, we stage our environment and run API and browser tests.
00:34:12.000
We maintain a Canary test to check for issues prior to deploying commonly.
00:34:23.680
Thank you for the presentation! Considering you have numerous gems, how do you manage versioning?
00:34:30.960
Yes, we manage several gems in our code, and when conflicts occur, we typically merge the pull request from the first one.
00:34:39.720
In recent cases, we focused on collaborative efforts to manage potential conflicts.
00:34:46.560
The last question: how often do you deploy to production?
00:34:58.720
We deploy about two to three times daily. These deployments affect the entire platform.
00:35:06.920
Thank you for your inquiries, everyone!