API by Design

00:00:08.269 Hello, my name is Wesley, as he mentioned. I go by Jameis online, so that's what you’ll find me using on Twitter and GitHub. I call myself the hacker laureate of Heroku. I don’t know if anybody else actually calls me that, but it’s my story and I’m sticking to it.

00:00:13.889 To give you a little background, some of you may have never heard of this library, which is a Ruby library that provides clients for interacting with a bunch of different cloud services. This was one of my biggest introductions to the world of APIs, because it turns out, if you want to interact with approximately 20 or 30 different cloud services, you need to work with a lot of different APIs and a variety of ideas about what constitutes a good API. It’s a mix of good, bad, and everything in between.

00:00:32.130 This was actually the second or third project I worked on in quick succession, where I was writing multiple clients and interacting with various APIs. Coming out of that experience, I was in a good position to get a job with Heroku. So, when I first started, I continued down that path by working on the command-line tool we now call the Toolbelt. This was yet another API client that gave me the opportunity to become very familiar with the Heroku API we had at the time.

00:00:50.280 Eventually, I transitioned onto the actual API team, where I aimed to address what I perceived as many flaws in the API that existed when I joined. It didn’t take long to realize that consuming and producing APIs are quite distinct. Though many people say, "Oh, you know a lot about APIs; you use them all the time," using them and actually creating them is very different. Consuming APIs is often simpler, although it can be frustrating at times because some APIs are poorly designed.

00:01:12.930 I think that perhaps the saving grace is my philosophy that, ultimately, it’s not the appearance of an API that matters, but rather how it feels to interact with it from a user perspective. After interacting with numerous APIs, I trust my gut reactions. So when we discuss new API designs, I can often say, "That doesn’t feel right to me; it gives me the heebie-jeebies," even though it took time to translate that gut feeling into concrete reasons that justify my concerns.

00:01:39.570 I think it’s really important that users feel empowered. To achieve that, I believe we should borrow insights from domains like user experience design and information design, because essentially, we are trying to provide a good experience for the user interacting with our API. It should be relatively straightforward for them to get on board and get started.

00:02:03.560 I also draw a lot from Ruby’s philosophy, which has been influential for me. For example, the principle of least surprise often comes into play. People may jokingly refer to it as the "principle of Matt’s least surprise" in the Ruby context. The idea is that if you have some knowledge about how something works, then when you’re working with something similar, it shouldn’t behave in a completely different and surprising manner. If it does, you likely have a problem.

00:02:34.209 Instead of it requiring a steep learning curve each time, you should be able to build on what you already know. Consistency and predictability are what we strive for. That being said, working at Heroku on a big, important API is actually the first real API I’ve ever produced, which—no pressure! What’s the worst that could happen? Despite a couple of years in this role, I haven’t done anything too horrific, although some of my colleagues might disagree at times.

00:03:00.360 Let me share a bit about this API that I sort of inherited. It was what I would describe as organic, which, as a consequence, made it somewhat inconsistent and also private. I think these aspects tend to go hand in hand. For us, the only real consumers of this API were the command-line tool and the web page we operated. Since we controlled both sides of the equation, it didn’t matter too much what the API actually looked like, as long as it got the job done.

00:03:28.660 Over time, when people needed to add something new to the API, they would just throw in whatever was necessary, even if it wasn't entirely sufficient. As long as it served its purpose, that was what mattered. Each time, it was a different person adding to the API, and they weren’t particularly concerned with reviewing the existing API for coherence. They just wanted to get the job done. Coming from that world made perfect sense, but I found that I wanted something more carefully crafted and coherent.

00:03:54.260 For me, ideally, going from one resource to another should not be surprising; it should make sense. Having written many clients and worked against this particular API, I was frustrated since, half the time while working on the Toolbelt, I needed to dive into the API codebase to grasp what a specific resource did. They often varied widely from one another, which was not ideal. Additionally, I wanted the API to be public and something I was willing to stand behind and be proud of.

00:04:14.550 To accomplish this, I found it crucial to carefully bring together the various components of the API. This led me to adopt a divide-and-conquer approach. I began building a new version of the API that retained the same resources but introduced a more consistent format by trying to extract common patterns and eliminating all the unique snowflakes that existed within the system.

00:04:44.430 Repeating this process over and over again turned out to be incredibly tedious, especially given the large surface area of the API. We ended up organizing it in a checklist-driven manner, which proved helpful. It established a set of steps, such as whether a resource endpoint met certain criteria or was formatted in a particular way, allowing me to maintain consistency.

00:05:05.600 I didn’t love this process; it wasn’t particularly exciting. However, I recognized its importance, and I trudged through it. It quickly became clear that we were in a period of growth for the API, and we would be adding more and more endpoints. Ultimately, it was something I could no longer manage independently; I needed help.

00:05:23.680 Most of the process up to that point had existed solely in my head. It was effective, and progress was being made, but I couldn't easily bring in other people into it. My colleagues within the company didn’t feel empowered to contribute to this API's development. From there, we began working on different tools and documentation to make the API design and implementation process more approachable.

00:05:54.520 I wanted to walk through some of those tools, as they might be useful or interesting for you as you approach your API concerns. The first one is the HTTP API Design Guide. We created an organization called EnterAgent on GitHub to house these various API design tools and patterns. The rationale for this was that Heroku has a number of open-source projects, and if we just dumped all these additional items there, they would get lost in the noise. We hoped that by consolidating them, we could keep everything closely related together.

00:06:25.100 Some of you may not be familiar with the Twelve-Factor App methodology, which originated from the original CTO of Heroku. It lays out a series of twelve principles that are beneficial for web-scale applications. It articulates the ethos of Heroku’s approach to developing an app in a modern context, offering an excellent set of first principles. While it’s not prescriptive about specific technologies, it emphasizes using environment variables for temporary configurations and maintaining similarity between development and production systems. My hope was to create something similar in the API space, covering best practices that were working for us and elaborating on the reasoning behind them.

00:07:03.500 This part is still a work in progress. People tell me often that I’ve not done a good job explaining why some of these practices are in place; it’s like I have a gut feeling about what is right, but can’t yet articulate why that is. I’ve been trying to improve my explanations, but it’s ongoing. A key takeaway is that it's more about patterns rather than concrete solutions. In many instances, you don't want to solve a specific problem just once; ideally, you want to address a whole class of problems, as similar challenges will inevitably recur.

00:07:49.600 If there’s a pattern you can apply, it’s much more powerful than saying, "Well, maybe this time we should do it this way, and next time we’ll do it differently." That is how we return to organic inconsistency. This process can ultimately create a private state, making the API unusable in a meaningful way, even if it’s strictly accessible. Therefore, it was vital for us to have a straightforward method of describing the APIs we worked on, ensuring that our documentation accurately reflected the interfaces.

00:08:31.700 The challenge of writing out an API from scratch is that everyone does it differently and often leaves out key details. We sought to create something human-readable yet machine-readable. Making it machine-readable would allow us to automate various tasks related to the API. There are several formats in this space, and I’m happy to discuss the pros and cons of different options, but for our purposes, we settled on JSON Schema and JSON Hyper Schema. Both of these are established specifications.

00:09:08.200 JSON Schema allows you to create a JSON document that describes an API, while JSON Hyper Schema adds links to what would otherwise just be a description of JSON objects. This capability enables us to delineate the entire surface area of our API. This provides significant traction, as when someone proposes a new endpoint, they can write a schema and convey precisely what they mean instead of engaging in a vague discussion about what parts might be similar or different.

00:09:37.900 One enhancement we made is that we actually use YAML to write the JSON schemas. Most people on the team prefer not to handwrite JSON for obvious reasons, so this optimization has made our work much easier. Now, I’d like to take a sip of water—if anyone is curious about this, I’ll explain in a moment—but the next tool is called Pyramid. You may be familiar with pyramid schemes, and there was a well-known case recently involving Bernie Madoff, who misled many into investing money with empty promises.

00:10:15.300 Similarly, the tool Pyramid serves as a reminder to remain skeptical. While schemas are powerful, they don't solve every problem, but they are very useful. As for its name, it is abbreviated to PRMD, short for Pyramid. This command-line tool provides functionality to help us manage our work with these schemas. It includes generators to create new resources, which typically require a fair amount of boilerplate code.

00:10:49.500 With RESTful resource operations, there’s a lot of commonality; for example, CRUD operations tend to look similar across different resources. We also have verifiers that take a schema file and check against expectations to detect issues. The documentation for the Heroku API is automatically generated from our schema, which helps us maintain alignment. If we have a schema we all agree on and are confident our implementation matches, the documentation should also reflect the same accuracy, as it all derives from one source.

00:11:27.300 Moreover, we have stubbing capabilities, allowing us to set up a server that can simulate our API. This aids in client development, enabling teams to test commands like cURL to ensure functions behave as intended. This tool has already proven helpful in breaking things down more easily, empowering people to utilize JSON schema effectively, despite the inherent steep learning curve when starting.

00:11:51.210 Returning to how we approach API structure, there remains a sizable gap between having a JSON schema and a fully operational API. To minimize that divide, we developed additional tools, the first of which is called Pliny. In simple terms, Pliny functions similarly to Rails, which serves as a large opinionated framework for creating web pages, while Pliny is a smaller but still opinionated framework tailored specifically for APIs.

00:12:25.890 It does not incorporate notions of views or templates; its primary focus is on API development. Many patterns found within Pliny are derived from our API codebase. Although we haven’t conducted an extensive rewrite to fully adopt its structure, it encapsulates certain useful patterns that have facilitated easier design and implementation of our APIs. When we create new APIs within the company, we generally start from Pliny as a foundation.

00:12:49.410 One significant aspect of this API framework is called Committee. This is a Rack middleware that leverages our JSON schemas to perform vital functions. One powerful function is input validation; through the schema, we define required parameters, including format specifications. For example, if an email address is being submitted, we can stipulate that it must match a particular regex; the Committee can enforce this rule, returning validation errors directly.

00:13:25.920 On the output side, Committee also examines responses to ensure they adhere to specified formats. However, in this instance, we typically do not want to throw an error back to the caller, as that would create a negative user experience. Instead, we generally grab the error and report it to an error reporting service, such as Rollbar or Airbrake, which logs issues indicating non-schema-compliant responses. The preferred course of action is often to adjust the schema and regenerate the documentation to match the actual implementation.

00:14:07.440 Next up is the concept of an umbrella. This idea arose as we continued to grow, acknowledging the limitations of adding new endpoints to our monolith. More frequently, when adding a new resource, which is relatively independent from existing resources, a dedicated team is assigned to create a new API service using Pliny. This new service becomes tied to the existing API framework through a proxy infrastructure.

00:14:38.740 This umbrella concept effectively hides the underlying disparate components from external users. We’ve found this approach immensely beneficial, as we now create net-new APIs while making it less obvious to users that these parts are separate. Consuming APIs also often involves the use of Excon, which I originally extracted from Fog some time ago. It functions as a low-level HTTP library that has been optimized for API client use.

00:15:12.929 The API client use case is somewhat different from that of a general HTTP library. Although it can be utilized in a broad context, Excon is tailored to optimize performance through persistent connections to support multiple requests. It’s also designed to be significantly faster. While I could elaborate on the performance comparisons, the key takeaway is that Excon outperforms many commonly used libraries in speed.

00:15:47.950 I want to share some lessons we’ve learned along the way and provide a clearer picture of my philosophy, illustrating how we’ve concretely implemented some of these patterns. One common pattern we've noticed revolves around deeply nested resources. This particular issue stems from Rails, and historically, it represented how entities related to one another. In the Heroku context, everything used to belong to an app, leading to structures like app/resource/child, which could extend indefinitely.

00:16:17.850 Over time, I’ve come to believe this nesting is not ideal and that we should minimize it anytime possible—avoiding depths greater than one level of nesting. This change necessitates including the parent foreign key in serialized responses, which strengthens the reference clarity. Ensuring easier navigation in the API is critical, as users should be able to find how to interact with resources without struggling to understand the hierarchy.

00:16:50.960 Additionally, there are scenarios where REST APIs don’t fit neatly within the Create, Read, Update, Delete model. In many cases, they relate better to state machine transitions. A concrete example is with dynos on Heroku. You may want to restart or stop a dyno, which doesn't intuitively fit within traditional resource manipulation. We wanted a clear mechanism to highlight this kind of behavior.

00:17:29.230 Initially, I observed operations searching for actions could yield inconsistent API structures, particularly around deeply nested identifiers. Instead, we decided to classify these actions as collections that relate to the primary resource. This approach clarified the intent and illustrates that we're defining discrete actions that can be understood as state transitions.

00:18:09.580 We've experienced situations where a case for singleton patterns emerges, like having a URL for account settings that operates on the current user. However, we found this frustrating, especially in administrative contexts where modifying another user necessitates impersonation. This inconsistency presents auditing complications, so we adopted a tilde, ~, to represent the current user context.

00:18:47.120 Moreover, we strive for input/output parity, meaning the format for creating a resource should closely resemble what’s returned. This similarity simplifies understanding and should carry through the API's design. We leverage UUIDs for scalability and provide friendly identifiers (names) for better flexibility.

00:19:28.600 As we operate in a service-oriented architecture, we’ve implemented request IDs, which allows us to provide traceability across our network of services. Whenever a request is made, you may provide a request ID, or we will generate one for you, enabling us to monitor the request's journey through complex systems.

00:20:08.850 However, we do face challenges such as maintaining a steep learning curve with the various tools we employ. The allure of specialization has its drawbacks; many desire a finely tailored API for personal needs, which hinders our capacity to deliver APIs that serve broader purposes.

00:20:29.720 Another challenge involves change management. As it stands, we monitor stability at the resource level. Our approach includes communication for breaking changes: a week’s notice for prototype stability, a month for development, and a year for production.

00:20:53.650 Already, we recognize that applying stability contracts entirely at the endpoint level presents issues. The method may require us to explore more intuitive versioning strategies. Versioning is inherently challenging: how do you release a new version without causing disruptions to existing users who did not opt into that change?

00:21:36.050 In fact, I would recommend against having a default version in new APIs. This approach often defaults users into unintended changes, putting their integrations at risk. Instead, expecting explicit version definitions from users mitigates disruption.

00:22:10.060 Additionally, the new versions of APIs may or may not necessitate total rewrites of all existing endpoints, which often leads us down a route of frustration. There are meaningful trade-offs regarding operational overhead when managing multiple versions.

00:22:32.100 Ultimately, my conclusion is that I still feel uncertain about many aspects of API design and operation. This lack of clarity is common among those in the field. There’s a reluctance to admit uncertainty in public spaces, but open discussions about challenges and lessons learned can move the industry forward.

00:23:14.440 I welcome you to join in these discussions. Sharing feedback on what works and what doesn’t helps foster a community. I encourage you to reach out to collaborate; it's valuable to discuss the experiences and feelings that shape our perspectives on APIs.

00:23:53.270 You can find me under the handle @jamis and browse the various projects at EnterAgent. I attached links to this slide deck so you can access any relevant resources I discussed. Thank you for your attention.