Broken APIs Break Trust: Tips for Creating and Updating APIs

by Alex Wood

In this session from RailsConf 2018, Alex Wood explores the critical theme of API design, particularly focusing on how broken APIs can damage customer trust. The presentation emphasizes the importance of maintaining backwards compatibility when designing APIs and client libraries.

Key points discussed include:
- Understanding API Changes: Alex outlines what breaking changes are and highlights the significance of avoiding them to maintain user trust and avoid unintended consequences.
- Common Language: Establishing a common terminology is essential for discussing API design effectively. Terms like resources, shapes, and operations help define the parts of an API clearly.
- Client Lifecycle: The interaction between a client and an API is detailed through examples, emphasizing the necessity for older clients to remain functional even as new features are added.
- Design Principles: Strategies for ensuring backwards compatibility are discussed, such as adding new optional parameters instead of changing existing ones, and the potential pitfalls of changing data types or required parameters.
- Safe vs Unsafe Changes: Alex illustrates safe modifications—like adding optional fields—versus unsafe ones, such as removing or renaming existing fields that could lead to client breakage.
- Empathy and User Experience: A core message revolves around understanding the user experience; changes should not disrupt existing workflows unnecessarily.
- Constraints and Validation: Modifications to constraints must be handled delicately to avoid breaking existing client implementations.
- Future Proofing: Encouraging developers to design with future compatibility in mind, considering how new states or exceptions might affect clients across various programming languages.
- Conclusion and Recommendations: In the final segment, Alex shares specific do’s and don’ts of API design, recommending that developers focus on discoverability and usability while avoiding practices like removing or renaming existing API members. This session is an insightful guide for developers aiming to improve API reliability and user satisfaction.

00:00:11 So we're going to go ahead and get started. I know people are kind of trickling in here, but I'm going to take a little bit of time for introductions.

00:00:17 I'm really happy to see how many people are interested in API design. I'm super stoked about that. This talk is called 'Broken APIs Break Trust,' or as someone from the audience suggested, 'Broken APIs, Broken Hearts,' which is awesome. Thanks again!

00:00:30 Today, we'll be talking about API design, client design, and a bit about backwards compatibility and why it's important.

00:00:45 Hi, I'm Alex. I've noticed a trend in some of the talks I've attended where people are posting embarrassing photos from their youth...

00:00:59 I was recently on vacation in Japan, where I have some step family, and I found this photo that I haven't seen in a long time. Unfortunately, I got a little sick on that trip and I'm still recovering, so if I accidentally cough into the mic, just remember that every time I do, somewhere, a puppy is getting adopted.

00:01:25 I work for the Tobias SDK for Ruby. Just out of curiosity, could I see a show of hands from those who have used the Ruby SDK in a device before? That's super awesome! Feel free to ask questions or heckle me later after the talk, depending on how you feel about it.

00:01:41 You can find me on Twitter at Alex Wood with two 'w's because the one without my middle initial was taken. I kind of evaluate how well I've done on a talk by how many people tweet about it, so feel free to tweet if you see everyone on their laptops and phones; I'm just going to assume they are excitedly tweeting about everything they are learning.

00:02:05 Additionally, I do have some bad jokes; I'm sorry or you're welcome in advance. Alright, today we're going to walk through rules and strategies for API design.

00:02:18 Essentially, we want to understand what breaking API changes are, how to avoid them, and how to design your APIs to minimize the need for breaking changes.

00:02:35 We're also going to think about ways we can anticipate how your users may be relying on your API and client designs in ways that you may not have considered.

00:02:48 As I mentioned in my work on a DOS SDK, another part of my job involves conducting API reviews for every API that Amazon releases. We have over a hundred and ten services with a large number of API services that we review.

00:03:06 There are many lessons we've learned over time about decisions we can make in our API design to improve the customer experience. My hope is to convey that to you today.

00:03:22 To start off, I want to establish some definitions and a common language for how we're going to describe APIs.

00:03:31 This is a Rails and services track talk, so we're going to address both how your API design can affect your Rails applications and also how it can influence client library design.

00:03:49 This interaction is important because your community may have a large number of client users, especially if you are releasing Rails APIs, which are becoming more and more common.

00:04:02 Many of your client users may not want to think too hard about the details of your web API implementation; they simply want their code to work.

00:04:21 You might even have community client authors who want to ensure a consistent web API experience.

00:04:27 Now, I want to add a quick note. Could everyone in the room raise their hands one more time if you have seats next to you to help people out? Thank you so much!

00:04:34 I'm finally under the high-quality problems now.

00:04:41 Next, I have here a list of terms that we use for naming parts of an API at AWS.

00:04:39 These are not universal terms, nor are they the only way to describe things. It is just a way to have a common vocabulary for discussions.

00:04:54 We use terms like resources, which is probably familiar if you're a Rails developer, and we also refer to shapes as a way to describe the input and output models, as well as shared components of the API interface itself.

00:05:11 These include input shapes, output shapes, error shapes, and even sub-shapes. We'll talk about all of these in more detail.

00:05:31 A member is a single property that exists in a shape. Often the closest equivalent would be a single column in a database.

00:05:47 However, your APIs are not necessarily going to be one-to-one with the database. So what we're talking about specifically are API-shaped members.

00:06:05 Operations are exposed by a service and can be invoked. In Rails, anything you're writing in your routes file is likely to be one-to-one with an operation.

00:06:25 This is similar for controller actions that will be generated by any kind of scaffolding, so these are basic Rails concepts.

00:06:41 Consider this basic scenario: you have a client interacting with the Ruby SDK since many of you mentioned using it.

00:06:58 The client requests data from an API.

00:07:02 This is a straightforward lifecycle of a web service interaction, and eventually, we'll add new features to our API and launch a new client version that supports those features.

00:07:18 However, as new iterations of an API are released, people will continue to use the older clients, and they will still need to work.

00:07:31 Raise your hand if you're a fan of forced upgrades. Just one? You are an amazing person living on the edge! But for the rest of us, continuously updating clients can lead to a spaghetti nesting problem that can escalate very quickly.

00:07:56 It is very important that older clients continue to function. Updating your clients should not be a requirement to keep things operational.

00:08:12 New client versions should be for new features, not mandatory housekeeping just to keep the lights on.

00:08:23 If you ensure backwards compatibility in your API, the mental model becomes much simpler: your API evolves, your clients evolve to provide new functionality, and users of older clients can continue to use them seamlessly.

00:08:40 If there is no problem, then there is no problem, resulting in little to no forced migration.

00:08:51 There's a recipe for happiness here. Ruby is a language designed to maximize developer happiness, and we should take this into consideration as a design principle.

00:09:06 We want APIs to be backwards compatible so that existing calling patterns and output usage continue to work indefinitely. We also want clients to be forwards compatible, so the only changes required in your code are to support new features.

00:09:37 Other than that one brave soul who raised their hand, nobody enjoys the idea of mandatory updates.

00:09:50 Let's talk about how to model a resource for an API. For our example, we'll use a trip. The trip has travelers, a description, and a flight shape, which we'll discuss in a moment.

00:10:15 These are common representations we use to describe API concepts. You'll notice that we type our members, which is significant.

00:10:28 Even though we're discussing Ruby, where everything is an object, consider that when we open up our customer base, we are not just talking about Ruby.

00:10:44 A flight is a special member, as shapes can be complex. In this case, we have a shape referencing another shape.

00:11:04 Nested shapes allow us to avoid duplication. There might be several points in our data model where we're talking about a flight, using the same information.

00:11:24 If you attended this conference, your flight status may have changed a few times, just like mine did. Changes to that shape need to be reflected everywhere it's used, across input and output shapes.

00:11:42 Now, let's discuss the operations: Get, Create, Update, List, and Delete are fairly standard controller actions. In fact, they usually map one-to-one with the default routes that Rails would create for a resource.

00:11:59 This mapping is useful because it helps you understand the implied model your API provides, and it's essential to think about it with intentionality.

00:12:19 From an API client perspective, let's look at the request-response lifecycle. Users will make a client request, such as retrieving trip information.

00:12:42 This request translates into a web API inquiry, yielding a response shape, which we discussed earlier.

00:12:56 Your client can turn this into a language object, so users don’t need to worry about the details while your web API processes the request.

00:13:12 This example might be slightly altered; initially, I had to change my original flight number due to a significant delay. I had to take a different flight.

00:13:35 You might also receive an exception from the service and surface it through the client. For instance, your system may have deleted the trip from the flight database because you drove instead of flying, which would then be represented as an error raised, and your client would handle it appropriately.

00:14:02 Again, these definitions are useful for designing and reviewing APIs; they're not the only possible descriptions, but they work.

00:14:17 Now, let's move to safe and unsafe API changes. Avoiding breaking changes is crucial, especially if you've experienced a bundle update where everything exploded in production.

00:14:43 Such experiences are terrifying for customers, so it is essential to be intentional in avoiding that kind of pain.

00:14:58 Going back to our trip shape, what happens if we add a new boolean member to indicate whether a trip is confirmed or unconfirmed?

00:15:11 Adding new optional members to a shape, like new output members or new optional inputs, is entirely acceptable. But if I realize that I’ve designed an array of travelers when in my implementation I've only been treating it as one-to-one, I should avoid deleting that.

00:15:29 Changing the type will break older clients. Imagine the poor developer who coded conditions around the travelers shape suddenly getting No Method Errors and nil values—it’s a horrifying experience. We must consider that our clients may use multiple languages.

00:15:59 Once a type has been set, it's a commitment, so sticking with that is crucial. Now, if we return to that happy change where we added a new boolean member to the output shape, let’s see how that would look for older clients.

00:16:58 If we're using the latest version of our client, we can access our new value returned by the service. But if we're using an older client, we access the output values we knew about before, and everything remains functional. If nothing is broken, there’s no need to worry.

00:17:17 Now let's consider pagination when listing trips from a service. If you don't have some way to manage the response, dealing with a million items can become overwhelming.

00:17:39 Pagination allows you to manage your API responses better. If we initially launched with sufficient pagination options, we can now add a new optional parameter called 'confirmed' to limit our list to confirmed trips.

00:18:01 New optional parameters on input shapes are fine, but as we run into scaling problems, we shouldn't make existing optional parameters required, as that would break older clients.

00:18:20 Consider the lack of flexibility for older clients: they must either upgrade or break when faced with added required parameters.

00:18:42 Empathizing with users is critical because breaking changes can lead to ill will. Many of you might be thinking of mandatory deprecations you found frustrating.

00:19:18 It can be tempting to implement changes that seem beneficial, but are really just causing additional overhead for your users.

00:19:30 An ounce of prevention through careful API reviews before launch can help avoid most breaking change scenarios.

00:19:57 Respecting these rules leads to happier users. Although many of these breaking changes might cause compile-time issues in languages like Java, they can show up at runtime in Ruby.

00:20:05 Be aware that not all users are writing tests, so good behavior, including proper error handling, is paramount.

00:20:13 Now let's address modifications regarding shapes. Subtle changes regarding constraints and exceptions are also crucial.

00:20:37 Again, avoid adding new required parameters—doing so breaks existing code.

00:20:48 However, it is acceptable to change existing required parameters to optional ones.

00:20:57 Constraints on both the server and client sides are crucial if you're writing APIs or clients.

00:21:17 If I set a maximum value for responses, it’s reasonable to imply that value without needing to validate it on the client. If I later decide to lower that limit, I shouldn't do it, as it would break validation.

00:21:45 Old code relying on previously accepted values will stop functioning properly, which can be problematic.

00:22:00 On the other hand, increasing allowable values by loosening constraints is permissible, and clients can continue to function normally.

00:22:22 At AWS, we do not validate non-required parameters. API constraints might be checked on the server side.

00:22:38 For instance, there was a time when we had eight-digit instance IDs in EC2, and we ran out. Users of older Ruby SDKs didn’t have problems even if they didn’t upgrade.

00:23:02 However, if we validate that length constraint, customers would need to upgrade their SDK to continue functioning. That's not a good user experience.

00:23:25 Now let's shift to the concepts of states. In our flights example, the status value—a string over the wire—is effectively enumerable in many languages.

00:23:52 Consider the possibilities of flight statuses. If we want to split 'landed' into 'landed on time' and 'landed late,' it would get confusing.

00:24:15 By adding new terminal states however, we risk breaking existing functionality for those clients expecting only certain statuses.

00:24:48 Instead of pulling repeatedly for statuses, we might introduce interim states as lifecycle events that clients can react to.

00:25:11 Interim states allow us to monitor progress without creating infinite loops. If it never reaches a terminal state, our clients should still handle known issues.

00:25:31 It's crucial to add new terminal states with caution and prepare for future-proofing clients so they can handle unexpected enum values.

00:26:10 Now let’s move on to exception handling. Consider when we specify a unique ID that isn’t present.

00:26:25 Receiving a tidy exception is manageable, but splitting exception behavior might lead to deeper issues.

00:26:48 If the existing code expected a certain type of exception, it could cause runtime crashes due to unexpected exception handling.

00:27:06 Implementing solid API design means ensuring clients can gracefully handle exceptions, rather than breaking unexpectedly.

00:27:19 Adding new fields in exceptions is also useful, as it gives clear error messages while maintaining clean core functionality.

00:27:38 Clients will appreciate better error logging and clearer responses.

00:27:54 As customers can be clever, consider the potential pitfalls of your API interactions.

00:28:06 Hiding behaviors can lead to added complexity, and unexpected quirks can emerge.

00:28:15 Even in well-intended scenarios, hasty changes might incur backlash from your users.

00:28:30 Make empathetic decisions during API enhancements to avoid creating burdens for customers.

00:28:46 As we begin to wrap up, let's outline some API design rules. We can also explore related talks that could further pique your interest.

00:29:14 Much of this discussion has been inspired by a talk from my colleagues at this year's Reinvent conference, focusing on embracing change without breaking the world.

00:29:36 Kyle and Jim’s talk is especially useful in understanding the static language side of this problem more deeply.

00:30:02 Another talk at this conference discussed building APIs with Ruby on Rails, and how our API Gateway service can generate SDKs automatically in multiple languages, offering a bridge between these concepts.

00:30:21 For this reason, I want you to check out this slide, which summarizes some critical rules.

00:30:43 For APIs, do add new members and shapes, do add intermediate workflow states carefully, and do add detail to existing exceptions.

00:31:07 Also, do add new opt-in exceptions and loosen constraints. For clients, consider forward compatibility.

00:31:24 Focus on discoverability, which is significant.

00:31:34 As for things to avoid, do not remove or rename member shapes ever, and it’s essential to not change member types.

00:31:52 Avoid adding new terminal workflow states, new exceptions that are not purely opt-in, forking exception behavior, and tightening constraints.

00:32:09 Lastly, while you could validate API constraints during client-side processes, do so with caution.

00:32:34 I hope you found this useful! I've got AWS stickers, so if you have questions, please feel free to ask, or come forward for stickers.

00:32:50 Thank you!