Breaking up with the bundle

by Joel Hawksley

In his talk at RailsConf 2022, Joel Hawksley discusses the evolution and challenges GitHub faced with its custom CSS bundle, which expanded to over 40,000 lines over 14 years. This growth, though initially manageable, became problematic as it led to frequent visual regressions, technical debt, and difficulties in implementing changes. Hawksley highlights the following key points:

History of CSS at GitHub: The CSS started with only a few lines and ballooned to over 40,000, becoming a significant liability for development efficiency. Primarily, the use of a monolithic CSS bundle contributed to these challenges.
Design Systems: GitHub developed a design system called Primer to standardize UI elements. However, despite these efforts, much of the legacy CSS remained, highlighting a disconnect between the design system and existing code.
Measurement and Analysis: To combat the challenges posed by custom CSS, Hawksley and his team implemented measures to analyze their CSS changes. This included using data analytics tools like Datadog to create visual representations of CSS-related metrics, tracking lines added or removed.
Refactoring Efforts: Various attempts to refactor and segment the CSS bundle faced obstacles, particularly due to dependencies created by shared styles across components, leading to difficulties in maintaining reusable code.
Innovative Solutions: The team explored migrating towards a component-driven architecture inspired by React paradigms, such as CSS-in-JS for better encapsulation and critical CSS to serve styles only when needed. This included experiments with new structural approaches using frameworks like View Component.
Visual Regression Testing: To address visual regressions, Hawksley discusses developing methods for computed style testing, akin to version control for styles, which raised awareness and helped mitigate CSS adjustments.
Future Direction: Ultimately, the team concluded that rewriting rather than refactoring was necessary for the long-term sustainability of GitHub's CSS. They aim to rebuild the UI incrementally using the design system and are open sourcing various components to streamline development.

Hawksley emphasizes the importance of learning from both internal and external sources to innovate beyond traditional frameworks and languages, underscoring the need for a collaborative approach to well-deserved engineering challenges as GitHub looks to break away from its legacy CSS bundle.

00:00:12.059 Good morning everyone! It is so good to be back in person, seeing lots of smiling faces. Let's get started.

00:00:24.119 On Friday, October 19th, 2007, GitHub co-founder Chris wrote the first commit for GitHub.com. Less than an hour later, he committed the first stylesheet to the GitHub codebase. It was just a couple of lines, featuring a simple reset of margin and padding, some hard-coded colors for the body, and styled links. About two months later, there were only 238 lines of custom CSS. After another four months, that number increased by an order of magnitude to around 2000 lines. This was right around the time GitHub launched to the public, and then after about five more months, the amount of CSS doubled. Since then, our CSS has continued to grow by another order of magnitude; we eventually plateaued at around 40,000 lines for the past eight years or so. Thank goodness! If you look at this graph, you can imagine where we'd be today if we had continued on that trend.

00:01:33.420 So what happened when this graph stabilized? About 14 to 15, we made a design system called Primer, which provides tools for building UI in what we call the GitHub way. It’s no coincidence that we open-sourced this work in early 2015. We created Primer, and it stabilized the amount of custom CSS we had. Unfortunately, we never finished the work; we didn't really address those 40,000 lines of CSS that continued to linger. That’s what my talk is about today. I'm going to discuss the problems we've faced with existing CSS, share some of our successes, but mostly our failures in addressing them, and how those lessons come together to inform our vision for the future of how we're going to build the UI at GitHub.

00:02:34.379 But first, hello everyone! Again, it's weird to see you all in person. My name is Joel, and I live in Colorado. As a Colorado resident, I'm fulfilling my contractual obligation to show you a picture of my dog and me on the top of a mountain at sunset. This is my dog, Captain, and I above Palisade, on the western slope of Colorado, in an area called Grand Mesa, which has hundreds of lakes over a mile above sea level. It is a wonderful, quiet place in the world.

00:03:05.459 But, as some of you know, I don't really have a permanent home right now. My wife, Caitlin, and I lost our home a couple of months ago in some really terrible wildfires in Colorado. It's been a difficult couple of months. Luckily, I wrote this talk three days before our house burned down, so I was able to prepare. I wouldn't be standing here in front of you without the support I've received from friends, family, and many people in this room.

00:03:31.800 I know way more about homeowners' insurance than I ever thought I would. In fact, I considered, as a joke, giving a talk on homeowners' insurance today, but I figured you came here for Ruby and Rails content, so we’ll stick with that. As for my role at GitHub, I'm a staff engineer on the design infrastructure team. We’re a group of designers and engineers that build and maintain Primer, our design system.

00:04:06.360 And this is probably the most appropriate time to mention we are hiring! If you're a Rails engineer who knows about accessibility and has experience in that area, please see me. We really need someone skilled at it. Anyway, what was I talking about? Design systems! Let's start with a definition.

00:04:43.620 I like this definition from the Nielsen Norman Group: A design system is a set of standards to manage design at scale by reducing redundancy while creating a shared language and visual consistency across different pages and channels. Reducing redundancy for the sake of visual consistency is another way of saying that design systems are abstractions. They take things we do many times and standardize them.

00:05:01.680 However, like most abstractions, design systems are only effective if they are actually used. This leads us to our custom CSS problem. If we look back at our graph of custom CSS, there are still at least 40,000 lines of it in our monolith, which represents an incredible number of places where we aren't using our design system. Custom CSS represents many shortcomings of our system: every single one of those lines is a place we could have used our design system, but we didn't.

00:05:49.860 You might wonder why I’m bringing this up now; it’s 2022, after all. What makes this year different from previous years when this code has been around? The reason is that our custom CSS has become a real liability to our company. Like most technical debt, it eventually catches up with you. One of the biggest problems has been being able to make efficient changes because of this issue.

00:06:03.600 If we look back at our graph, things stabilized around 2015, but if you genuinely think about the history of our company, our user interface did too. Here's an example repository: I went into the Wayback Machine and looked at the Node.js repo because it has been around for so long. Let’s take a quick tour of what Node.js looked like. Here's 2015...

00:06:41.160 In 2016, we moved the tabs to the top. In 2017, we made one of our most controversial changes ever; we made the header dark. I don’t know if you remember, but social media was ablaze with this for a while. In 2018, I can’t tell the difference; can you? Same goes for 2019. In 2020, we did a little facelift, making things a little wider with some visual tweaks, and things have remained pretty similar since then.

00:07:08.880 I don’t think this is a bad thing; I love that we’ve established some stability in a very ever-changing world. Let’s not forget that throughout that timeline, we sold our company to Microsoft for around eight billion dollars, so obviously, we’re doing something right.

00:07:43.560 But, to be honest with you, this outward-facing stability masked a lot of difficulty occurring beneath the surface. It reminds me of a favorite metaphor: when you observe a duck swimming gracefully in a pond, it appears serene, but underwater, it’s a vigorous, chaotic flurry of movement. One instance of this chaos is when we shipped dark mode, which could make for a great talk on its own, but it took us almost two years of engineering work to ship this feature.

00:08:14.460 A significant portion of that effort was because our custom CSS proved to be such a liability that we had to meticulously work through it to enable changes to our color scheme on the fly. This is what I mean by the difficulty in making changes efficiently. What truly pains me is that we have many more ideas for improving not just the look and feel, but the overall functionality of GitHub.

00:08:53.160 However, we struggle with justifying the effort simply due to the high costs associated with making sweeping changes to our codebase. Similarly, it's also been challenging to make modifications to our CSS safely. There's nothing worse than encountering a visual regression, especially when you find out about it on Twitter, which we oftentimes do.

00:09:24.780 In this case, you can see several bugs; for instance, the copy button's icon disappears when you click on it, and the button itself isn’t even the right size. This is just one of many bugs we introduced while attempting to refactor and clean up our custom CSS.

00:09:55.620 When I really think about it, the root of the problem is that bad design doesn't break Continuous Integration (CI). These visual regressions are incredibly difficult to catch, even with the tedious manual testing because they can be buried behind multiple clicks in an interface, or perhaps due to someone’s screen being zoomed in or not maintaining a particular pixel width. Our responsiveness might not function as expected.

00:10:15.360 All these assumptions rest on the awareness of which pages might be affected by the change you just made. For example, if I have a single CSS file and an index template using that file, I simply need to ensure that one file still renders correctly. But what if there are three pages utilizing the same stylesheet? Which pages do I need to verify? It depends on which rules from the stylesheet apply to those pages, and pinpointing that can be quite complex.

00:10:43.560 What happens when there are dozens, or hundreds, or in our case, thousands of pages that draw from the same bundle? These are the things that keep me up at night. The real human cost comes from the inability to make changes safely; it ultimately breaks down your confidence in your work.

00:10:59.640 Those are some of the problems we face. Now let’s discuss our successes, and mostly failures, in addressing them. One great source of comfort for me as I’ve tackled this issue at GitHub has been Jeff Atwood's 2007 blog post titled "Falling into the Pit of Success." In this post, Jeff likens working in C++ to walking on the edge of a bottomless pit.

00:11:40.800 He notes, and I’m paraphrasing, that the problem with C++ is that it does a terrible job of protecting you from your own worst enemy: yourself. When you write code in C++, you’re always circling the pit of despair, one misstep away from plunging into disaster. This resonates with me because CSS has often been my pit of failure. Jeff's article talks about the 'pit of success,' where he explains that if we design our applications properly, our users should be driven into that pit of success.

00:12:16.680 Some may take longer than others, but they should all arrive there eventually. What we needed to do was create an environment where working with CSS in our giant codebase feels like being in the pit of success. Over the past year, we've tried various approaches to better understand and communicate this problem through effective measurement.

00:12:52.860 We’ve worked to split our bundle into functional areas, experimented with various encapsulation techniques for our CSS, and explored ways of diffing the computed styles in the DOM. Let's start with measurement. When I began this project, I had what I refer to as a ‘bad spidey sense’—a hunch that we had a lot of technical debt that was, in fact, growing despite our very real remediation efforts.

00:13:26.220 However, the issue with having a spidey sense is that it's often insufficient to convince others, especially your manager, to understand the problem and provide the necessary resources, whether time or manpower. This situation brought to mind one of my favorite quotes: 'Premature optimization is the root of all evil,' perhaps one of the most recognized quotes in software engineering.

00:13:56.820 In this context, I interpret it to mean that optimization without a clear end goal is a fool's errand. This notion originated from a question my manager asked me when I discussed this work, which was, 'That’s great, but how will we know when we're done?' At our scale, there’s always something more to do. You could unleash hundreds of engineers to refactor code at GitHub, and there’d still be more to address.

00:14:31.680 So we needed a way to measure the problem so that we would have a benchmark for optimization and to determine when we were finished. Thus, we began tracing the story behind those 40,000 lines of code.

00:14:56.220 I approached my colleague, John, and asked how we might measure our custom CSS using Datadog. John is quite proficient with Datadog and created an incredible dashboard that I truly admire. What we do is run a script on every master build that analyzes our CSS and pushes a plethora of statistics to Datadog. This dashboard effectively offers a living, breathing view of how well our design system is performing in real-time.

00:15:31.740 What I particularly appreciate about this dashboard is its ability to generate an emotional response by distilling the complexity into a simple metric: the lines of custom CSS added in the past week. This number can fluctuate in the hundreds, and it even turns green if it’s a negative value, which is quite encouraging. The real beauty of this metric is that it simplifies the complex problem down to one digestible figure.

00:15:56.220 While it isn’t a perfect measurement, it helps people grasp the root of the problem in the moment. John also provided other visualizations to show the extent of our recent changes. For example, a lot of custom CSS was added around tooltips in the past week as shown in the screenshot I took, which prompted us to discuss that code and collaborate with the team responsible for it.

00:16:21.660 In a similar vein, we also track instances where our linters are disabled, akin to RoboCop disabling a specific issue. We track these style disables, and we monitor the broader context through our most disabled linters overall. This helps us identify fundamental deficiencies in our system. For instance, a common override we encounter is in spacing, hinting that our system may not accommodate all the spacing requirements within the application.

00:16:52.680 We also track violations according to code ownership, which is useful since we have hundreds of teams at GitHub. This graph illustrates, for example, that the marketing team has significant CSS usage tied to our site design inside engineering teams, which has sparked discussions about developing a design system for them.

00:17:23.220 Sometimes the answer is as straightforward as measuring the number of lines in a file. This graph indicates that our settings and discussions styles comprise some of the largest codebase elements. While simplistic, this has proven to be one of our more effective measurements.

00:17:54.780 I can attest to its effectiveness because, just a few weeks after John created this dashboard, it was featured in a planning meeting with our engineering leadership. This means that by measuring the problem we were all discussing and complaining about, we made it actionable. We were able to visualize the issue and help others understand its magnitude.

00:18:26.100 Another approach we undertook to address this problem was to try and split up our bundle, an effort we're still actively working on. It all started with this insight: besides CSS being difficult to refactor, our bundles presented real performance implications for our users. For instance, if you visit a pull request page, you might find that only about three percent of the CSS loaded on that page is actually used to render it.

00:18:50.400 This means you could be downloading around 400 kilobytes of CSS, most of which is unnecessary. The challenge was compounded by the fact that since this file was one large bundle, we often broke the cache almost every time we deployed, because someone was changing something in that massive file. Given we deployed many times a day, most of those deployments would invalidate the cache.

00:19:29.520 Referring to our diagram, our solution was to eliminate a single bundle for the entire application, substituting it with bundles specific to the code needed for that particular area.

00:19:58.080 We implemented this method for three areas: marketing, settings, and discussions. We used a class method added to the application controller, which allowed us to declare which stylesheet bundles should load during specific controller actions. This enabled us to move our marketing-specific styles to their own bundle, loading that only on marketing pages.

00:20:32.160 We encountered challenges as we started separating out the custom CSS for the discussions feature. If we return to our diagram, let’s concentrate on the discussions portion. You might expect this to be a simplified representation of our codebase, but in reality, there are approximately 50 templates just within the discussions folder, many being partials.

00:21:06.900 The use of partials introduces a new complication. Partials are created with the intent to reuse view code, for example, people wish to reuse the way we implement reactions in discussions. They might reference that partial in a different template, such as for displaying a new release. Unfortunately, this leads to actual bugs that make it to production because that template, in a certain state, fails to render correctly since we’ve split out the bundle and only served that CSS with discussions, not on the releases page.

00:21:45.720 As a result, we impede our ability to reuse partials as intended, which should foster code reuse across our application. This situation becomes a dilemma, as reusing it requires loading the entire bundle again, bringing us back to square one with a central bundle of CSS.

00:22:10.560 This is a crafty challenge, and no obvious answers emerged. At our scale, my colleagues can confirm that challenges like this increasingly appear. Rails does not meet all our requirements, and Stack Overflow often proves less useful. It can feel quite isolating, to be honest.

00:22:50.040 One way we have approached these difficulties is by looking across the broader web ecosystem for inspiration. We need to investigate how others are tackling similar issues outside of Rails. Currently, a lot of innovation surrounding UI architecture is happening in React, and I find it genuinely inspiring how many bright minds are pioneering new methods for managing the complexity of web application construction.

00:23:26.040 How are they addressing this problem at scale? Here’s an example from the reactive implementation of Primer, our design system. To avoid overcomplicating matters, when we declare a component, we can define it as a style div using a pseudo CSS data structure akin to a hash. We can look up values from our colors object or spacing object.

00:24:04.380 The vital point to understand here is that rather than having global CSS bundles, those styles are defined directly within our React components, along with the template. At render time, we generate a hashed selector with those styles, inlining it at runtime on the page.

00:24:52.260 This architecture is referred to as CSS-in-JS and it delivers several commendable advantages. One aspect often dubbed 'critical CSS' ensures only the necessary CSS for rendering a page is served. Put differently, it means scoring a perfect 100 on this measurement instead of a mere three percent.

00:25:29.160 The second benefit is encapsulation, which fosters confidence that changes will only have the intended effect. In the context of CSS, referencing our React example, CSS-in-JS ensures that the CSS written only impacts the react component it is designed for by generating globally unique class names.

00:25:56.880 These are exciting prospects! How can we incorporate them into our Rails app? We began with a proof of concept, targeting the authors' graph on the repository pulse page. Raise your hand if you’ve seen this before! Surprisingly, more of you than I anticipated! This graph utilizes custom CSS exclusive to this page, yet it loads on every page of GitHub.com.

00:26:43.620 Thus, we initiated a simplification process: we took the class names and appended a random hash during our build step. Instead of referencing that class name directly in the template, we formulated a method that yields the original class name and returns the hashed version using a product of that build step. This technique is known as CSS modules.

00:27:10.560 That’s how we managed to encapsulate those styles! But, how could we deliver that critical CSS only when needed? Our answer arrived in the form of view components. View Component is a framework that we developed at GitHub to bring some of those great ideas from React into the Rails ecosystem.

00:27:54.660 Essentially, view components typically come as pairs: a Ruby file accompanied by an associated template. We simply added a CSS file next to those component files and directed our build pipeline to compile that stylesheet into its own specific bundle just for that component.

00:28:31.740 Here is what it looks like in action: we have our graph atop, followed by our generated selector, and ultimately a link tag with our component-specific bundle. This means we can render this component multiple times on a page, and the browser recognizes that link tag and won’t reload or parse that resource again.

00:29:06.540 As of now, if you navigate to this page, it is live in production, though it’s not quite ready to be open-sourced just yet. That's our journey with CSS encapsulation thus far. This work remains quite experimental, but we believe there’s exciting potential in this approach.

00:29:57.539 The next experiment we undertook was style diffing. Earlier, I mentioned the challenge of making CSS changes with confidence, often learning the hard way through Twitter. The core challenge here is that visual regressions can be tough to identify programmatically, as CSS is notoriously difficult to test effectively. You only discover the outcome of the CSS code when the page is loaded in the browser.

00:30:34.680 One common solution people have suggested to me is to pursue screenshot testing. They recommend using a tool that captures screenshots before and after code changes and compares those two images for any differences. This particular example is from Percy, a vendor I’ve worked with in the past that has a great product. It works well, but it tends to be a mostly manual process.

00:31:05.640 Often, you can see in the screenshot that there’s a green approve button, indicating some level of manual intervention with these changes. One day, while debugging yet another visual regression, I found myself exploring the Chrome DevTools to view the computed styles, and it struck me: what if we could diff them instead of taking screenshots?

00:31:50.640 I started looking into the documentation for the Chrome DevTools protocol, which is the programmatic method for interfacing with the Chrome DevTools. I discovered an API called 'Get Computed Styles for Node,' which takes a DOM node and returns a comprehensive hash detailing every possible CSS property attached to that node and its corresponding value.

00:32:29.640 Next, I contemplated how we could create a test around this data. Fortunately, much of the groundwork had already been completed with a gem called Pherom, which wraps the Chrome DevTools protocol, making it user-friendly. It required a quick Pull Request to add support for that specific API.

00:32:52.920 Now, I gained the ability to work with computed styles. The first step is to load a page in a browser, then select a test node to retrieve its computed styles, as outlined in the actual code using Pherom. If we check that object, we will see it consists of a hash that mirrors what we observed in the DevTools previously.

00:33:18.960 Finally, we can assert against a lookup on that resulting hash, which brings us to a regression test for computed CSS ensuring, for example, that this text is red, which is quite interesting.

00:33:46.440 However, this tests one property. What if we require a more general purpose solution? This is where we envisioned the concept of style snapshots. I won’t have time to delve deep into this idea; we essentially designed a NoSQL database for styles.

00:34:14.520 The framework allows us to compare those two style hashes and format the changed values for an entire node in a user-friendly manner. These detailed test outputs enable us to automate visual regression testing that was previously highly manual in nature.

00:34:43.920 That’s how we leverage style diffing to facilitate safer changes. But what does this imply for the future? Even with everything I’ve shared thus far, I can’t claim we’ve made substantial progress. Many of the challenges I mentioned remain unresolved, if not exacerbated since I began drafting this talk.

00:35:17.280 You may have observed that, if you looked closely, even more custom CSS was added in the past year. It’s tough to admit, but perhaps our efforts simply haven’t been enough. These methods—while interesting to discuss—might not be sustainable.

00:35:54.720 The common thread through this endeavor has been an exploration of our attempts to grapple with this technical debt, to clean it and refactor it. Even with ample time and resources—six individuals dedicated their time full-time for an entire year—it’s clear we’ve fallen short.

00:36:32.880 It has been incredibly time-consuming and costly; the expenses associated with refactoring our CSS are far greater than those for any other code we write at GitHub. To be candid, it’s been unsustainable, highlighting the necessity for us to reconsider our approach.

00:37:11.760 Consequently, our best solution has been to rebuild. Instead of wrestling with all that legacy custom CSS, we are prioritizing rebuilding our UI using our design system, essentially from scratch, on a page-by-page basis.

00:37:50.520 We’ve already made great strides; some of you might have noticed that we’ve extracted nearly 50 open-source view components for our design system, and many more are en route.

00:38:27.720 This, we envision, is the future of UI development in Rails at GitHub. We must transition away from writing custom, hard-to-maintain CSS. This reminds me of one of my favorite engineering memes featuring two O'Reilly books. Imagine applying a similarly transformative mindset to how we approach CSS.

00:39:06.120 There are countless potential solutions—it may take more than one solution to form a well-rounded approach, possibly a toolbox of solutions. Nevertheless, I am quite confident that what we currently have is not the ultimate answer. I hope we can continue to learn from our shared experiences and from different methods employed across the industry since great ideas should never be limited to the technologies in which they are implemented.

00:39:58.680 Ideas have the potential to transcend frameworks and even programming languages. We will need the greatest ideas from wherever they originate if we are ever going to break up with our CSS bundle. Thank you!