Open Source

Cruft and Technical Debt: A Long View

Cruft and Technical Debt: A Long View

by Yehuda Katz

Summary of "Cruft and Technical Debt: A Long View" by Yehuda Katz

In this talk, Yehuda Katz examines the concept of "cruft" and its inevitable presence in software development, likening it to a form of technical obsolescence. He discusses how code that was once well-aligned with initial assumptions may become increasingly misaligned over time, leading to decreased satisfaction for developers. Katz argues that while technical debt can be controlled, cruft accumulates regardless of how well a project is maintained.

Key Points:

  • Cruft as Inevitable: Katz introduces the notion that cruft, akin to technical debt, is an inherent aspect of programming. It is often a reflection of the changing assumptions surrounding a project rather than the result of poor design or coding practices.

  • Assumption Mismatch: Over time, the gap between initial assumptions and current realities widens, consuming more development time. Each year, the same amount of code is written, but the burden of outdated assumptions increases.

  • Open Source Examples: The talk highlights open source projects like jQuery and Rails, which have faced challenges as their foundational assumptions became outdated. For instance, jQuery faced dynamic changes in browser capabilities, while security assumptions in Rails evolved significantly, emphasizing the transient nature of technical underpinnings in software.

  • Technical Obsolescence vs. Technical Debt: Katz distinguishes technical obsolescence from technical debt. While technical debt can be intentionally taken on and managed, obsolescence is unavoidable and must be proactively planned for.

  • The Role of Assumptions: Klein emphasizes the importance of understanding and documenting assumptions within a project to address cruft effectively. Open-source projects must manage rapidly evolving technologies and community practices, necessitating constant re-evaluation of the code.

  • Integration Burden: He notes how good solutions may inadvertently offload assumptions to users, making it difficult to adapt to new paradigms when underlying assumptions change. Projects with high integration overhead risk problems when they can no longer adapt.

Important Examples:

  • jQuery's DOM Readiness API: Katz uses this API to illustrate how assumptions about browsers' capabilities can diverge over time, leading to technical obsolescence.
  • Rails Security Issues: Citing historical vulnerabilities in Rails related to CSRF, Katz discusses the inherent risks when foundational assumptions fail over the software lifecycle.

Key Takeaways:

  • Cruft cannot be wished away; it must be managed with an awareness of how assumptions evolve in the software landscape.
  • Developers should track assumptions and document cruft to facilitate more informed architectural decisions moving forward.
  • Recognizing and addressing cruft requires a mindset shift—viewing cruft not merely as legacy code but as a byproduct of a dynamic and complex environment.

This session serves as a call to action for developers to thoughtfully refactor codebases and embrace evolving best practices in the face of unsustainable cruft.

00:00:08.480 Okay, great! So, let's get going with our next talk. This is one of the longer talks, 45 minutes, which will go by at blinding speed because our next speaker is Yehuda Katz, the man who needs no introduction. So, take it away.
00:00:31.599 Thank you! So today, I'm going to be talking about, as you might have guessed, the idea behind cruft.
00:00:38.640 Yesterday, Sandy gave a really great talk about the reasons that we become less satisfied with our projects over time. Her talk mostly focused on the ways that using better object models, using better object orientation, could avoid that steep decline in satisfaction.
00:00:45.120 I liked it a lot; I felt that to the extent that the problem is poor object design or poor structure, her talk really provided some good tools that you can use to help avoid declining satisfaction over time.
00:00:51.440 My talk is about situations where we cannot avoid this declining satisfaction simply by doing a better job ahead of time. Sandy told you how to contain the mess; today, I'm going to talk about how and when to clean it up.
00:01:15.360 This talk is basically about the declining value of code over time. When you write a piece of code for the first time, it feels great because what the code does matches perfectly with the assumptions made by the code. But over time, those assumptions change—subtly at first, and then faster and faster.
00:01:40.400 In my projects, this often takes a long number of years. The gap between the assumptions that were true when you started building your application and the assumptions that are true years later actually grows faster and faster.
00:01:58.320 Year over year, you're going to write the same amount of code; people write lines of code that are the same each year. However, the weight of those mismatched assumptions, the gap between the assumptions you made when you first wrote the code and the reality years later, consumes more and more development time.
00:02:10.800 I have just a projection of the previous slide: if you assume that you're getting accelerating more and more assumption mismatch over time, you're able to write more code, but the assumptions that were correct in the beginning become incorrect and eventually catch up with you.
00:02:22.720 This sounds a lot like classic technical debt, but there's a big difference. You take on technical debt on purpose, with enough discipline. You imagine that you can eliminate it altogether by thinking about it, in the way Sandy talked about yesterday, or you imagine that you can pay it down.
00:02:35.760 The problem I'm talking about today is more like buying a bunch of stocks in 2008 and never adjusting it when the company tanked. Unlike technical debt, technical obsolescence will build up no matter how hard you try to avoid it. The question isn't how to eliminate it; it's how to plan for it.
00:03:07.280 Today, since I do so much work with open source, I'm going to give some examples from the open source ecosystem. One reason this is very clear in open source is that it works on solving problems generically. If a problem is tricky, often that gap becomes a chasm very quickly, as the generic assumptions made early on rapidly fall out of favor.
00:03:26.319 I've given a couple of talks about concepts being hard, and you might want to watch this to get a sense of what's going on behind the scenes. I think this would be a decent companion talk.
00:03:43.840 The basic idea here is that for everything that seems straightforward, it may appear as though I say render, and then I render a bunch of stuff on the screen. Obviously, it's relatively straightforward, but there's a lot more going on behind the scenes.
00:04:12.000 There's more assumptions building up, and that's not only true for applications in open-source projects like jQuery or Rails, but it's also true for your own applications. Often, applications can take on the characteristics of frameworks or libraries over a long period of time.
00:04:30.479 A good example of this is jQuery's DOM readiness API. DOM readiness is such that solving the part problems generically means there's an assumption for every scenario and every use case.
00:04:41.199 It's not just one assumption—the assumption would be that the browser does not provide a way to find out DOM readiness. Therefore, we will use a hack. That would be one assumption and one outcome. However, there are many different browsers and many different use cases that people use this API for. This means that some assumptions may remain true for a long time, such as in IE6, there is no way to find out DOM readiness without hacks. But there are other assumptions that drift over time.
00:05:10.880 For example, in Chrome, there is no way to find out DOM readiness. You end up with a situation where technical obsolescence creeps up on you because, for a long time, one of the anchoring assumptions of your API stays the same, while other anchoring assumptions slowly drift away.
00:05:27.600 Rails had some examples of this with security. Almost all of the security assumptions that Rails made when it was first built turned out to be incorrect. For instance, there was a significant CSRF example a few years ago where the entire assumption core that Rails was using was simply wrong. When that happens, it's game over.
00:05:58.479 There are also encoding issues. In Ruby 1.8, there was no encoding support, and Ruby 1.9 shipped with encoding. Now, the assumptions have changed. As more people move to Ruby 1.9, the possibility of us using new assumptions doesn’t become evident until some time later.
00:06:04.560 In Ember, we are experiencing similar dynamics early on in this big gap, where the assumptions we're making regarding data bindings or how to build your application structure are still aligned with current practices. For example, since there is no data binding support or object observation in the browser yet, we have to use set and get.
00:06:28.320 That feels good today, and it justifies several architectural decisions. But over time, that gap is going to widen. As more advanced features are shipped, like object observation in Chrome, we may not necessarily be able to adjust right away.
00:06:39.200 The clear assumptions we have today ensure we handle everything ourselves, but slowly those assumptions will drift. Again, in open source, it's especially challenging because you really can't adjust until you have that giant gap.
00:06:52.319 You might find yourself in a better position in proprietary projects, which is just one of many areas where assumptions underpin every decision made. The constraints underlining those assumptions may be valid today, and following them will satisfy you as a programmer, but once those assumptions become outdated, that's one of the main reasons you become dissatisfied with the code.
00:07:10.800 An interesting thing about open source is that this problem is more pronounced with really good solutions. With good solutions, you can often offload many assumptions onto your users, so there's not much impact on the functionality. However, when you have an excellent solution like Rails or Ember that assumes a lot of responsibility, any changes in assumptions can have rapid effects on the project.
00:07:54.000 If later, it turns out that assuming HTML rendering on the server is a bad idea, it won't be a simple fix. Rails took steps to mitigate this, but if client-side development becomes obsolete, Ember could face significant challenges because we took such a major bet.
00:08:41.760 This underscores an important point—people often don’t notice the difference between how much weight a partial solution carries versus a full one until it begins to break down, resulting in a checkbox-oriented feature assessment. Thus, when evaluating libraries, users often ask whether certain features exist or not.
00:09:36.480 I mention this to highlight how understanding technical obsolescence can provide insight into what choices you're going to make. If you have a project intended to last a long time, you need to really plan for technical obsolescence.
00:10:03.200 When doing feature analysis, it's crucial to think about the underlying assumptions. There tends to be an objection that emerges, which is that you can basically avoid all software problems by designing programs that perform one specific task exceptionally well.
00:10:30.960 This argument suggests if something changes or an assumption is invalid, simply remove the part of the program that violates the assumption, replacing it with an improved solution. While writing well-structured object-oriented software can help address certain issues, this viewpoint actually misses the mark.
00:10:48.239 This kind of approach causes complexity to shift to rapidly evolving public APIs, moving the complexity from private APIs, referred to as the 'omega mess', to new instability across public interfaces. Unfortunately, despite our best efforts, we usually don’t achieve perfect interfaces; thus, creating public APIs often leads to more edge cases that must be handled, thereby transferring the integration burden onto the user.
00:11:31.040 This means the user, rather than being able to simply issue a render command in Rails without concern for its underlying mechanics, must now consider where templates are located, what template engine to use, and how to render things with the necessary context without encountering hiccups.
00:12:07.360 A highly effective interface might streamline operations, but unless users clearly understand the integration process, they’ll ultimately bear the brunt of these changes. Additionally, I want to point out that people rarely discuss the entirety of the original Unix philosophy because it often undermines their overly simplistic point.
00:12:30.720 The Unix philosophy is about writing programs that perform one task effectively, working collectively, and handling text streams as universal interfaces. What’s lacking in this philosophy is a defined well-understood public interface, which is necessary for smaller programs that intercommunicate effectively.
00:13:14.080 In the unstable stage of building software or defining a domain, individuals should focus more on achieving stability than on creating standards too early. An example might be how Unix originally functioned effectively with simple text formats, but improved interfaces have become essential as systems stabilize.
00:13:39.200 Thus, it’s crucial to allow interfaces to evolve based on collective understanding and reliability. When building software components, they often interconnect somewhat promiscuously. Developers may aim to manage these interconnections, as early integration has significant costs. While it may seem logical to develop a single extensive control tool that facilitates communication across component interfaces, the reality is that these connections usually dissipate as standards develop.
00:14:39.240 As clarity grows in understanding how components work together and interface, integration makes a product more robust. However, assertiveness is called for in choosing when to prioritize integration; an early push for extensive regulations may lead to diminishing returns.
00:15:10.960 Rack serves as a perfect illustration of project life cycles; during the early days of Rails, it had a very tightly integrated architecture, but hindsight reveals that early frameworks avoided built-in standards. They might improve the ecosystem, but they also cause difficulties in discerning component relationships.
00:15:51.760 Also, while there are many project aspects that would benefit from standardization, Rails decided against it initially, believing that effort spent on establishing standards meant sacrificing time spent building functionality that could deliver value. However, over time, the burden of integration leads to fatigue in the development community to remain consistent, highlighting the impact of opinionated tools in the ecosystem.
00:16:30.560 And so, issues arise even when standardization exists; dependencies can create integration costs that may go unnoticed until they reach a breaking point. The question remains, who bears these integration costs? It's challenging to imagine a scenario in which those costs amount to nothing and are no one's burden.
00:17:01.120 In rapidly evolving domains, tighter integration fosters robustness, countering the notion of standardization. The success of Unix system reliability stemmed from the prolonged effort by AT&T workers who understood the importance of compatibility from a user perspective, ensuring ease of adaptation across tools.
00:17:43.360 Now, what does this translate to in terms of software development? It means that cruft is inevitable. When launching something new, we often lack clarity around the domain; thus, cruft can't merely be attributed to personal shortcomings in design. The real world is complex; assumptions influence everything—from the platforms supported to the regulations guiding projects.
00:18:36.799 As these assumptions evolve, solutions created may appear messy. It's not that the code reflects poor programming; it demonstrates the complexities at play in creating viable solutions. The cruft created might serve as a tool for reconciliation between rapidly shifting priorities. That said, addressing cruft should not be an impulsive act; early intentions stemmed from a demand for effective problem-solving.
00:19:36.320 All too often, people will react defensively when questioned about cruft, anchoring themselves to the belief that there's inherent validity in its existence. However, as projects evolve, knowledge alters. Within the first two years, one might rightfully justify cruft's presence, but as familiarity grows, the echo of received wisdom can stifle progress.
00:20:36.000 Insistence on cautious interaction with cruft only reinforces the misconception that darkness surrounds it; that is, naturally, due to the code's past as a tangible obstacle. This phenomena is common within the Rails ecosystem, with an abundance of seemingly trustworthy code towering over ambiguity.
00:21:16.480 As systems accumulate too much dependency clutter, developers tend to hold on to it as if the code itself contained secrets its creators knew better about, even if the rationale behind the cruft became obsolete.
00:21:50.000 The evolution of technology underscores that once assumptions change, the components of a project can quickly turn into burdens seemingly incapable of being realized until a fresh perspective emerges. When new contributors come on board with different insights, they're often posed uncomfortable questions that prompt re-evaluation.
00:22:26.640 A landscape exists where components have grown outdated, yet teams hesitate to acknowledge this because of the familiarity bred into the project's ecosystem.
00:23:02.960 Connection with obsolete software, such as IE or antiquated enterprise products, allows cruft to proliferate unnoticed as an immune system develops, but a legitimate need may arise to reconsider. The histories maintain their relevance for as long as the code remains in use—even as projects adapt. Brands like jQuery routinely iterate software layers to minimize legacy complications.
00:24:19.360 Many developers may wonder about the gap represented when older assumptions become irrelevant. Through a practice of consciously tracking cruft, as jQuery did, an organization can actively assess the perception of cost in maintaining those old dependencies over time, even realizing it makes more sense to iron out the architecture rather than simply patch the surface-level functionality.
00:25:30.520 Therefore, the life cycle of cruft encompasses simultaneous evolution and defensiveness, where the gaps yield friction between long-held assumptions and dynamic landscapes. Eventually, this leads to defensive stances, which can heighten dissatisfaction with technical debt still rooted in previously valid architectures.
00:26:46.240 In turn, projects might pay for the technical debt that errors into a scrum and further detachment from capability standards that initially facilitated project creation. Clinging to past assumptions can undermine the very frameworks designers sought to construct, and bolt-ons will stack unwieldily until the missing reality forces a rude awakening.
00:27:55.760 An example in jQuery stems from having to handle the issues that were caused by early IE bugs. Much of the cruft that is now considered untidy originated from addressing these issues to deliver superior experiences. Initially, developers thought of these workarounds as clever solutions, yet with decade-long legacies, the price for obsolescence gradually accumulates.
00:29:03.680 As the lifecycle progresses, fine-tuning occurs, compelling contributors to discourage complacency and ponder deeper reasons behind decisions. The crux of the issue lies in understanding how diverged assumptions distilled into parts that no longer meet the primary needs.
00:30:24.640 It is vital to bear in mind that once initial cruft addresses a genuine problem, the system may not entirely shed that burden. Consequently, new iterations will yield opportunities that advance architecture and avoid the unknotted assumptions lingering throughout each release.
00:31:59.920 Thus, serious attention must focus on identifying and documenting cruft within systems like cruft.txt, where assumptions can surface as factors impacting architectural integrity. Furthermore, as you write code that may not yield optimal solutions, it is crucial to record the basis for incorporating that code.
00:33:10.640 This links into the reminder that assumptions can diverge over time; sometimes, preserving support for legacy implications becomes increasingly irrelevant. In this light, open source can benefit by retaining flexible opportunities to reintegrate assumptions.
00:34:09.760 So long as you keep track of assumptions and track edge cases, you will make more informed architectural decisions. Adopting a conscious reflection process allows you to evade emotional bias when considering past impressions. Documentation ensures that new contributors navigate the ever-evolving landscape without apprehension.
00:35:47.720 Above all, I want to emphasize the importance of conversations surrounding technical assumptions and integrate opportunities for growth, so that developers feel empowered when examining cruft and encouraged to address it—all while simultaneously being aware of the potential for redemption.
00:36:39.280 Thank you very much!
00:38:13.520 Josh, do I have time for questions? Yes, my clock says 30 something minutes. Absolutely! Yes, we do have time for questions, and I'm going to start. How about the test suite as a place to track cruft? Someone else asked me this question the last time I gave this spiel.
00:43:34.720 The test suite can certainly be a good place to track cruft, and for every piece of cruft, there should be a test. However, the goal is to maintain a section with the specific job of listing assumptions and how those affect architecture, rather than just running tests. It's important to have tests that confirm assumptions remain valid, particularly for persistent bugs.