Talks

Designing APIs: Less Data is More

Designing APIs: Less Data is More

by Damir Svrtan

In the talk "Designing APIs: Less Data is More," Damir Svrtan, a senior software engineer at Netflix, discusses effective strategies for designing APIs that avoid the common pitfalls of data overexposure. He emphasizes the importance of creating minimal APIs and operating under the principle that less is often more. Throughout his presentation at RailsConf 2021, he explains how developers frequently build APIs that include excessive data and functionality that go unused, leading to unnecessary maintenance and complexity.

Key Points Discussed:

- Relevance of the Topic: Svrtan outlines his extensive experience in API development, highlighting that many developers create APIs with too much data, leading to bloated implementations.

- Two Principles for API Design:

1. Designing the Minimal API Surface: Focus on not overexposing data, illustrated through examples of unnecessary fields that can complicate future modifications.

- Example: Exposing an author's email field in an API might seem beneficial initially; however, it creates privacy concerns and complicates future changes, such as removing the field under GDPR compliance.

2. Designing from Strict to Loose: Avoid unnecessary flexibility in API design. For instance, making certain input fields optional for clients can lead to unforeseen maintenance burdens.
- Avoiding Redundant Elements: Svrtan discusses the drawbacks of exposing redundant fields and relationships that clients do not use, as well as the negative performance impacts that can result from redundant API queries.

- Defensive Programming: He advocates for protective measures in API design, such as implementing pagination limits to manage resource usage and maintain performance, reflecting on scenarios where developers underestimate future needs.
- Iterative Development: Svrtan stresses the importance of an iterative approach, encouraging developers to engage with clients to understand their real needs and adjust API features accordingly.
- Conclusion: He concludes that reducing unnecessary API complexities allows teams to focus on improving core functionalities, therefore enhancing productivity and reducing the chances of errors. He urges developers to challenge the common assumption that more flexibility is always better, stating that good intentions can lead to inefficient designs unless grounded in actual user needs.

Svrtan's talk serves as a critical reminder of the importance of restraint in API development, ensuring that what is built is not only functional but also maintainable and efficient. He encourages attendees to reach out for further discussion on the topic, emphasizing community engagement.

00:00:05.000 Hi folks, welcome to my talk. Today, I'm going to be talking about designing APIs and why less data is more. I'm super glad I could present to you all today.
00:00:14.400 I'm sad that we're not all together at a nice location where we could meet up, but at the same time, it's great that we can all join this conference from across the globe.
00:00:26.640 Let me introduce myself. My name is Damir Svrtan, my pronouns are he/him, and I work as a Senior Software Engineer at Netflix where I spend most of my time building APIs. I'm recording this from San Francisco, which is where I'm currently based, having moved here about three years ago. Prior to that, I lived in Zagreb, the capital city of Croatia, a small country in Southeastern Europe. That's where I grew up and spent the first 28 years of my life. I used to organize a local Ruby Meetup called Ruby Zagreb, so a big shout-out to all the members of that community.
00:00:51.480 Now, why is this topic relevant to me? I've been building APIs for the last seven years and have seen a pattern where developers like to expose more data than is actually needed. I want to talk about avoiding overhead when designing APIs and the kind of overhead I mean avoiding building bloated, overly flexible APIs with queries that nobody asks for, endpoints that nobody's using, and generally unused functionality, like extra fields and relationships.
00:01:11.940 All of these things often stem from developers trying to be speculative about what's going to be needed in the future. They build things up front that they may never need, which they end up having to maintain. What kind of APIs am I talking about? I'm referring to HTTP-based APIs, such as REST APIs, JSON APIs, even GraphQL, and so on. However, this talk will be API technology agnostic and applicable to all kinds of APIs.
00:01:43.440 Throughout this talk, we're going to be building a blogging platform, something similar to Medium or Dev.to. It will be fairly simple, featuring authors who can release posts, and each post can have comments. We'll discuss two principles for building APIs: the first being designing the minimal API surface, or how not to overexpose data on your APIs; and the second being designing from strict to loose—how to avoid building extra flexibility that nobody asked for.
00:02:07.200 So let's start with the first principle: designing the minimal API surface. I often see a pattern where developers try to be speculative about what's going to be needed in the future, so they overbuild their APIs. I'll break this down into three patterns of a bloated surface that I usually see: redundant fields, redundant relationships, and redundant input fields.
00:02:40.500 First, let's discuss avoiding redundant fields. Imagine we have a requirement from our product management that we need to show the author of the blog post. Let's say we're storing authors in a database and it includes fields such as ID, first name, last name, email, and an avatar URL. In a design drawing inspiration from Medium, we have fields like avatar, first name, and last name, but notice we're not exposing the email.
00:03:14.400 The friendly API developer might suggest exposing the email because someone might find it useful in the future. They might think, "Why wouldn't we just expose that email field right away?" This actually saves time for the business, as it’s easier to expose now than later. However, what if we later need to remove that field for privacy reasons, such as compliance with GDPR or California privacy laws?
00:03:44.760 If we have to deprecate the email field, we might need to go through a deprecation cycle that involves a lot of communication with clients. This could mean sending out emails to clients to inform them of the change and giving them lead time to adjust. For private APIs with only a handful of clients, this process can be manageable, but for platform APIs with many users, this can become overly complex and time-consuming.
00:04:09.600 Having an API where clients can pick and choose which fields they want can help mitigate this issue, as seen in GraphQL or sparse field sets in JSON API specifications. However, without proper observability, you might not know if a field is being used or not, and unnecessary exposure can lead to complications down the road.
00:04:42.300 The technical aspect of removing a field is straightforward, but the communication and coordination required to inform stakeholders and clients can be exhaustive. Next, let's move on to the second part of the first principle: not exposing redundant relationships. This is similar to avoiding redundant fields but has some nuances.
00:05:17.580 Let's say we have a requirement to indicate whether a post has been reviewed, potentially including a reviewer field for future use cases, such as showing who reviewed the post. However, just like the previous example, this can lead to unnecessary complexity and maintenance work.
00:05:45.480 What happens if we later decide that instead of one reviewer, we now need to show multiple reviewers? This would involve another round of deprecation communication, further complicating the client experience and possibly breaking existing implementations. Thus, it's critical to delay decisions where possible and avoid overengineering.
00:06:13.620 Finally, in this first principle, let’s talk about avoiding redundant input fields. This pertains particularly to the payload your API accepts when mutating or changing data, such as in a REST API. An example scenario could involve enabling readers to create and update comments.
00:06:38.160 In this situation, we might define an input for creating a comment that requires two fields: post ID and body. The friendly API developer might suggest mirroring this for the update comment input, which would lead to exposing an ID field as well, even though we only needed the ID and the body.
00:07:07.680 This approach unnecessarily complicates the API's schema, requiring clients to handle logic that shouldn't be their concern. Instead, we should ensure that the logic of our application remains on the server side rather than on the client's side.
00:07:47.880 It's far easier to add things in the future than to remove them. When you start from a strict definition, it allows for flexibility in the future without complicating the client interaction. On the contrary, if we start with flexibility that later requires strictness, you're imposing breaking changes on your clients.
00:08:18.000 The key takeaway here is to avoid exposing redundant fields, relationships, and input fields to minimize bloat in your APIs. Now let's move on to the second principle: moving from strict to loose in API design. This principle is about understanding the balance between flexibility and stability in your API.
00:08:57.960 The first step is to avoid unnecessary flexibility. Your APIs should be designed with a clear understanding of the needs of your clients. This means that if an input is required, make it required, rather than overly flexible.
00:09:28.680 An example of this would be an endpoint designed to fetch comments on a post, which should ideally accept a post ID. The friendly API developer might make the post ID optional, thinking that clients might want to fetch all comments in the future without specifying a post. However, this strategy introduces unnecessary complexity to the application logic.
00:10:02.160 This leads to increased maintenance work when changes arise. More code translates to a higher chance of bugs and performance issues. Therefore, it’s crucial to develop a coherent logic that maintains consistency and effectively serves your clients.
00:10:40.680 Next, let’s talk about defensive programming. It’s essential to ensure your API is built to prevent abuse. For instance, if an API endpoint is designed to fetch comments, consider implementing pagination to avoid potential performance issues.
00:11:06.180 It's better to limit the number of comments returned per request than to push the server capacity beyond its limits. By establishing these limitations early in the process, it prevents larger scaling issues later on.
00:11:30.780 By focusing on these two principles—minimizing exposure of unused data and maintaining a strict yet flexible API structure—you can ensure a smoother experience for both API developers and clients. Additionally, you can save time and resources in the long run.
00:12:12.120 In conclusion, I hope you take away the importance of avoiding redundancy in API design. Redundant structures slow down progress on more critical features. By prioritizing effective API documentation and streamlining queries, you'll optimize your API's performance.
00:12:36.000 Less is more when it comes to data exposure. Although we often think we are helping our clients by building unnecessary features, we may inadvertently hinder development. Thus, it's crucial to communicate effectively with your clients while maintaining control over API structures.
00:13:12.840 Thank you all for listening. If you want to discuss this topic further, feel free to reach out to me afterwards or connect with me on Twitter.