Terraforming legacy Rails applications

by Vladimir Dementyev

The video titled 'Terraforming Legacy Rails Applications' features Vladimir Dementyev at RailsConf 2019, discussing strategies for enhancing legacy Rails applications. The main theme revolves around transforming legacy codebases into healthy, maintainable ones, analogous to terraforming Mars to make it habitable. The speaker emphasizes the challenges posed by legacy code, including technical debt and the difficulty in integrating new features.

Key points discussed include:
- Defining Legacy Projects: For Dementyev, legacy projects are those over one or two years old, often filled with technical debt and unclear coding practices.
- Development Environment Setup: The importance of setting up a clean development environment using tools like Docker for predictability and ease of use is highlighted. This setup minimizes configuration errors and allows for seamless onboarding of new developers.
- Configuration Management: Best practices for managing application configurations are presented, focusing on keeping sensitive data secure and organized. Dementyev recommends avoiding excessive use of environment variables throughout the codebase.
- Testing Frameworks: The necessity of a robust test suite is compared to maintaining a breathable atmosphere for habitation. Tools like 'Test Prof' are discussed to improve test reliability and speed, preventing them from becoming a bottleneck in development.
- Security and Consistency: The speaker touches on security practices and the importance of consistency between the code and database models. Tools like 'Bundler Audit' and 'Brakeman' are recommended for identifying vulnerabilities.
- Removing Dead Code: Strategies for identifying and removing outdated or unused code and gems are shared, using specific tools such as Traceroute and Factory Trace.
- Automation Tools: He introduces automation tools such as 'Danger' for improving code review efficiency, emphasizing the need for a solid development process to ensure project health.

In conclusion, Dementyev encourages developers to proactively 'terraform' their legacy Rails applications to create a stable environment conducive to development, similar to the ongoing efforts to make Mars habitable. He assures that the strategies and tools discussed can significantly enhance any legacy codebase.

00:00:20.570 Hey everyone! I can barely see you. I didn't think I would need sunglasses in Minneapolis, but you know it's not typical weather here. So welcome to my session. This is the last one for the day, aside from lightning talks. My name is Vladimir, and you can find me on GitHub and Twitter. I occasionally write some stuff—mostly RubyGems, sometimes blog posts, and sometimes tweets.

00:00:32.460 I work for a company called Evil Martians that is based in Brooklyn, New York, and Moscow, Russia. You might have heard about our open-source projects; there are a few of them that have their own stickers. You may have even read my blog. By the way, that's a brand new blog post of mine released on Monday regarding Rails 6 and some of its features.

00:00:41.190 Today's talk is mostly about my work in commercial development, specifically about working with client applications. Most of the projects we work on can be divided into two groups. The first group, which I enjoy most, involves building something new from scratch. That's a rare case. In most cases, we're dealing with legacy projects, which means we have to fix something that someone else built, and it doesn't look shiny. But I like it anyway—it's kind of my jam.

00:01:30.149 What is a legacy project? That’s a huge topic, but from my personal experience, legacy projects are any projects that are one or two years old. That’s typically when you accumulate a bunch of technical debt and when you don’t remember why you wrote certain pieces of code a year ago. As a team leader, I am often the first one to join a project, and one of my first tasks is to prepare the codebase for others to join and start working on features.

00:01:58.830 I can't just drop a handful of developers onto new projects and expect them to be superstars; that's not what I'm here to talk about. Instead, I'll focus on the process of dealing with legacy Rails applications. Today, I will outline the phases it consists of and the tools we use to make the process smoother. But first, let me tell you a bit more about myself. I'm a BoardGameGeek and a big fan of strategy board games. If you share this passion, you might find the title of this talk a bit familiar— and even the cover image. That's not a coincidence; the title of this talk is inspired by a board game called Terraforming Mars.

00:02:49.140 It’s a relatively new game from 2016, but it's very popular and has been a top game for a long time on the BoardGameGeek website. You should definitely check it out if you enjoy board games. The objective of the game is to make Mars a habitable planet to bring people there from overpopulated Earth. During the game, you do a lot of tasks, like raising the oxygen level, restoring the ecosystem, and so on.

00:03:28.019 While playing this game with my friends, the idea came to me that this seems very similar to what we do with legacy projects when we start working on them. So I came up with the concept of 'terraforming applications.' It's actually a similar process of transforming a legacy codebase into a habitable codebase, which means it's a good place to develop new features without expecting hidden dangers. It should be a good environment to work in and have good code to work with.

00:04:25.500 Let’s start with the first phase, which is the setup phase. In a typical board game, you start with a setup phase where you take everything from the box and put it on the table. Sometimes, this process can take longer than the game itself, and it’s not always that fun. Usually, it’s done by one person or maybe a couple of people, but usually one. The first thing I do when joining a project is setting up the development environment.

00:05:21.600 This takes some time, and the goal of this first phase, the landing phase, is to make it easier for others to cold-start the project. In terms of cold starts, it means minimizing the number of actions and time required to start running the Rails server locally, Rails console, or whatever you need to do.

00:06:05.039 Let’s talk about the development environment. By development environment, I mean all the external things like system dependencies, databases, and anything not specifically related to Ruby. We need to make our application up and running, and usually, we see something like this in the project's README. This is the old-school way of setting up a project. You just do a bunch of things: hope the instructions are up-to-date, copy and paste them into your terminal, and cross your fingers.

00:06:56.500 It's too much effort and error-prone. That's why the first thing I do when I join a project is to configure the development environment better. I prefer using new school tools like Docker and containers. By the way, did anyone attend the workshop on containerizing Rails applications? So, a few people, great!

00:07:37.529 Let me share some key points regarding Docker for development. First, it provides a repeatable and predictable environment setup, which is platform-independent. I’m not using version managers like rbenv or RVM. Just using Docker with a Dockerfile and Docker Compose file does the trick.

00:08:09.569 Secondly, it allows you to stay in sync with your development environment. You don't need to update images manually; just keep changes in the repository. The next time you run the project, everything is rebuilt, and you're using what you want without worrying about different environments and different branches.

00:08:38.200 But you should still establish some conventions on image versioning because using 'latest' for everything in your Docker Compose file could lead to breakage. Like any tool, Docker is not without its problems. One of the most common issues is that it can be slow on Macs. I know, since my Mac is five and a half years old and has only four gigabytes of RAM.

00:09:38.080 However, I'm telling you that Docker is good enough for developing Rails applications. You just need to handle your configuration carefully. There are a few tricks like using cache folders, storing ephemeral data in volumes, or even using NFS filesystem—which can be tricky but can give you performance.

00:10:14.230 Let me show you one tool we use to make the development process a little easier, called 'Dippy.' You may not have heard of it. It's a command-line tool originally written in Crystal, but we rewrote it in Ruby because, you know, Ruby is better. It acts like a wrapper over Docker Compose, and more importantly, it supports multiple Docker Compose files.

00:10:56.400 It provides a transparent way to configure the whole application, which may consist of multiple services, from one place. It has useful provisioning hooks and zero-argument integration, which allows you to run all commands from your console without prepending them with Docker Compose. Just type 'rails c,' and under the hood, it will run a container and execute the command inside it.

00:11:30.620 Now, after we configure the Docker environment, we’re ready to run our applications. What do you think happens next? In the realm of space engineering, the first launch typically fails due to some configuration problems. For example, someone might have used a database that’s not locked down.

00:12:04.120 This applies to our codebase. We can add environment variables everywhere; in our code base, we can forget to add them in the sample configuration files. That's why the goal here is to ensure we have sensible defaults and minimize the number of external dependencies, like API services and other configurations.

00:12:44.440 Keeping configuration organized is also essential. Having sensible defaults means that if you need to change something, you should be able to do it without wondering where it needs to be changed in your configuration files. Each required change a developer makes in the project could lead to an error or failure—prompting a panic message in Slack saying, 'Help! I don’t know what to do.' I want to avoid that.

00:13:23.700 Again, coming back to Docker, some aspects come easy with it because you have the configuration file set up to configure services and maintain their relationships. But external dependencies, like AWS for storage services, should not be configured directly in the repository. It might make sense to do so eventually, but when you're working locally for the first time, don’t take on everything that exists in production.

00:14:11.000 Instead, use local services. We don't have credentials for AWS in our local setup, and it doesn't make sense to require them when you first work on the project—especially if you may not end up working on that part of the application. You could be a front-end developer or work on an external API that isn’t part of the main application flow. Just make sure that during development, you don’t force developers to search for credentials all over the project.

00:14:56.900 Now, let's talk about organizing configurations a bit. There’s a problem I call 'app hell'—a situation where there are numerous usages of environment variables all over the application code. Also, environment checks like production, development, or whatever else can make understanding the application more difficult. It’s especially painful when you want to add a custom environment because you're looking for places to add it everywhere.

00:15:49.870 That's why we limit our usage of environment variables to configuration files like development.rb and application.rb. We avoid all the checks for environments in the code and replace them with custom configuration settings to make the codebase more readable and easier to manage. This allows even for changing settings in production without relying solely on the environment name.

00:16:28.610 To ensure this process is smooth, we’ve built a custom RuboCop that detects unsafe usage of environment variables. This helps prevent future users from making the same mistakes. Additionally, we run this RuboCop in our CI/CD pipeline.

00:17:22.640 Now let's discuss another common issue, which is the size of the .env file. The end file could be really big, leading to chaos when managing multiple configuration variables. If you have a production application with more than a hundred variables, it quickly becomes difficult to track what is going on. I'm always looking for a better way to manage configuration in Rails applications.

00:18:23.999 The idea I came up with is to categorize credentials into different types. We have sensitive information (secrets) and non-sensitive data (configuration settings). For sensitive data like API keys, we store those in credentials files secured in the repo. For less critical configuration settings, like a bucket name, you can store those in a named YML config.

00:19:17.879 One thing we want is the ability to override configurations in case it's necessary. If you really need to avoid committing changes, you can just change your environment in production. With Rails 6, we can now store paired environment credentials for different environments. This development has made me rethink how we handle configurations.

00:20:10.760 I use gamified code management, which is quite old but still useful. It handles all the complexity of fetching data from different sources and abstracts that complexity from the developer. You can override any configuration you need, whether from environment variables or by simply editing YML files.

00:20:58.230 Now that we’ve finished the setup phase, we can finally start playing our game. I compare the atmosphere in the game to the test suite in your application because developing without tests is like breathing without air. You will eventually run into issues.

00:21:45.720 Breathing rarefied air requires more effort and can slow you down. Flaky tests cloud the testing environment, and it can lead to unexpected consequences. Our focus during this phase should be on making tests reliable and fast so that they do not block development.

00:22:45.300 To help with this, let me introduce a tool called Test Prof. You might have heard about it; it's been featured in Ruby Weekly. Test Prof is a collection of different profilers and extensions for frameworks, including RuboCop, designed to help refactor tests and speed them up. It was born out of production needs. We used it to speed up our test suite by a factor of four.

00:23:36.820 That being said, improving the speed of our tests could reflect two talks worth of content. I've shared this concept a few times, including at the Paris RB this past summer. One of the key focuses was on the issues caused by slow tests.

00:24:30.880 One issue, as addressed, is the unnecessary use of database cleaners. In many cases, you don’t need this. The only problem arises when tests use multiple threads. Each thread using its connection complicates transactional tests, especially in Rails 5.1.

00:25:38.540 Another popular issue is running background jobs in tests, particularly with Sidekiq. What plagues many developers is that they inadvertently use the inline mode, wasting time by serializing jobs that may not even be necessary for the test.

00:26:30.080 Most cases don't require the execution of jobs during testing. Just flag those tests as dependent on Sidekiq inline. We encountered this challenge and structured our tests to identify those requiring inline Sidekiq jobs effectively.

00:27:41.080 Now, another issue that's widespread is factory cascades, where creating one record leads to the creation of many other records. This cascading behavior becomes expensive, and thus we approached it by writing gems to detect these issues within the test suite.

00:28:39.490 I also prepared a checklist to address flaky tests—it's available on my website with guidelines on what you should check to minimize flakiness in tests. In fact, one tool I created, a factory linter, helps identify unique constraints in databases that lack proper randomness in factories.

00:29:20.900 This tool analyzes factories against the database to reveal potential issues, ensuring that tests are not only fast but also reliable.

00:30:01.210 As for our project’s status, after addressing performance issues and ensuring that tests are both green and fast enough, we are now prepared to bring more team members onto the project. However, the terraforming process is still not complete. As you know, Mars goes from red to blue, and then to green to evoke a sense of transformation.

00:31:39.690 And for that, we need to ensure that it’s a livable planet—able to support life before we bring humans in. So, this includes addressing various issues of security and performance, which we will tackle now.

00:32:05.410 Firstly, about security: at this stage, I'm not talking about a full security audit, but rather a quick security analysis. We can use simple tools like Bundler Audit that scans the project’s dependencies for known vulnerabilities.

00:32:54.800 You’d be surprised to discover how frequently vulnerabilities appear and how fast they get resolved in the Ruby community. Most vulnerabilities are patched quickly. So regular upgrades to patched versions of gems are vital for keeping your application safe.

00:33:55.580 Next is a tool called Brakeman, which scans your code for dangerous patterns, such as using user-defined input directly in the application views. RuboCop is another very helpful tool that isn’t just for formatting code— it also checks for common security mistakes.

00:34:43.940 Now consistency in our application is crucial mainly between our database and business logic. Validation issues can lead to lots of confusion and errors, hence, maintaining this consistency is key.

00:35:24.200 We also have a tool that helps ensure consistency by checking validations against the database schema. This tool reports any mismatches, whether it be missing constraints or foreign key issues, by analyzing both your ActiveRecord models and your database structure.

00:36:18.840 Now, as for documentation and style, I'm a big advocate for maintaining a consistent code style using RuboCop. However, when working with legacy projects, it’s important to be cautious. You shouldn’t just implement the RuboCop configuration straight away; instead, start with a subset of cops that fit the current project structure.

00:36:59.170 You may find the tool Standard helpful here because it offers a good base configuration for RuboCop and can help you avoid the arguments about which style to use. Simply utilize standardization to maintain consistency across your various projects.

00:37:42.110 Finally, concerning side effects in transactions— be wary of operations that happen during transactions. Performing actions like HTTP calls or enqueueing background jobs can lead to unexpected log issues where a notification is sent even if the transaction fails.

00:38:22.990 To tackle this, I created a gem called Isolator that integrates with different Rails frameworks to detect risky operations performed within a transaction. It can raise exceptions in such cases, raising alerts for developers, ensuring that integrity is maintained.

00:39:22.850 Lastly, let's not forget about dead code. Codebases often contain unused methods, templates, and even gems. Cleaning this up is crucial for maintaining a healthy codebase. One approach that helps identify unused gems is tracking object allocations during tests.

00:40:24.420 Tooling is important here, and I’ve started implementing tools that help analyze code relationships. Creating efficiencies here can reveal unused assets that can be safely removed.

00:40:34.090 So as we wrap up, remember, you're not alone in this process. While maintaining legacy code can be challenging, there are tools and methods available to help ease the burden.

00:40:52.290 In conclusion, feel free to start terraforming your legacy applications today. Remember you can seek help from others!

00:41:09.240 Special thanks to everyone here, and if you have any questions, you can find my details online. Thank you and happy coding!