00:00:00.560
Are we ready to move on to our next speaker? Our next speaker is Lisa Karlin Curtis, and she is going to talk about how to stop breaking other people's things.
00:00:08.720
Lisa is a full-stack developer at GoCardless. She started out as a consultant working with the HMRC and then smart meters before accidentally becoming a developer.
00:00:15.200
I would like to hear more about that. She works mainly on a Rails app with some forays into the JavaScript front end and legacy PHP applications.
00:00:21.520
She loves building stuff but is also really interested in how people interact with each other in a work environment.
00:00:26.760
Particularly in software engineering, having seen the old way at Accenture with large-scale waterfall projects, she is now looking at taking the lessons from that environment to the startup scene.
00:00:33.440
About the talk: breaking changes are sad. We’ve all been there; someone changes their API in a way you weren’t expecting, and now you have a live ops incident you need to fix urgently to get your software working again.
00:00:38.879
Of course, many of us are on the other side too—we build APIs that other people's software relies on.
00:00:44.280
All the discussions and later on questions for Lisa can be put in the stream chat. So, please, Lisa, the floor is yours; you can start your presentation.
00:02:04.320
Hi, um, thanks so much for having me. You’re okay? This is really cool! So, yeah, I'm Lisa Karlin Curtis, born and bred in London, England, and I'm a software engineer at GoCardless.
00:02:12.560
I work in our financial orchestration team. We are a payments company that focuses on recurring payments, and I’m going to be talking today about how to stop breaking other people's things.
00:02:18.319
We’re going to start with a sad story. A developer notices they have an endpoint that has a really high latency compared to what they'd expect.
00:02:28.879
They find a performance issue with the code, which is essentially an exacerbated N+1 problem, and they deploy a fix.
00:02:33.920
The latency on the endpoint goes down by half. The developer stares at the beautiful graph with a lovely cliff shape, right? You know, really high and nicely dropping, and they feel really good about themselves.
00:02:39.200
They pat themselves on the back and move on. Somewhere else in the world, another developer gets paged; their database CPU has spiked, and it's struggling to handle the load.
00:02:44.560
They’ve got a bit of service degradation, so what happened here? They start investigating; there’s no obvious cause, and no recent changes were deployed.
00:02:51.280
The request volume is pretty much what they’d expect. They start scaling down their queues to relieve the pressure, which seems to solve the problem.
00:02:56.720
The database recovers, and then they notice something strange: they’ve suddenly started processing webhooks much more quickly than they used to.
00:03:02.640
So it turns out that our integrator, which is on the right hand side of this slide, had a webhook handler that would receive a webhook from us.
00:03:08.959
Then it would make a request back to find the status of the resource; the reason they needed to do that was that the events could be delivered out of order.
00:03:15.519
They wanted to make sure that their status reflected what was in our database, and this was actually the endpoint that we fixed earlier that day.
00:03:22.080
I’m going to use the word integrator a lot, and what I mean is people who are integrating against the API that you are maintaining.
00:03:28.000
Sometimes that will be inside your company, like another team, or sometimes it might be a customer.
00:03:34.000
So back to our story: that webhook handler spent most of its time waiting for our response; it was very I/O bound.
00:03:39.840
Then it would update its own database, so the slow endpoint was essentially rate-limiting the webhook handler's interaction with its own database.
00:03:45.120
It's worth noting that at GoCardless, our webhooks are often a result of batch processes, which means they're really spiky.
00:03:50.480
We send big sets of them a couple of times a day, so as the endpoint got faster during those spikes, the webhook handler started to apply more load to the database than normal.
00:03:56.480
To such an extent that an engineer got paged to resolve a service degradation. The fix here is fairly simple: scale down those webhook handlers so they process fewer webhooks and the database usage returns to normal.
00:04:02.159
Alternatively, beef up your database, but it shows us just how easy it is to accidentally break someone else's stuff even if you're trying to do right by your integrators.
00:04:08.239
So to set the scene, here are some examples of changes that have broken code in the past. Traditional API changes, right? Adding a mandatory field, removing an endpoint, or changing validation logic.
00:04:15.680
I think we're all comfortable with this stuff and why it could break things.
00:04:20.400
Introducing a rate limit or even changing your rate limiting logic—Docker did this reasonably recently, and I think they communicated it very clearly, but it obviously impacted lots of integrators.
00:04:32.560
They also worked hard to provide tooling for integrators to self-serve and understand their impact, which I thought was really cool.
00:04:46.240
Changing an error response string—GoCardless had an issue where we basically found a bug in our own code.
00:04:52.400
We weren’t respecting the Accept-Language header on a few of our endpoints, so somebody would request us with 'Accept-Language: fr' and we would respond with English errors.
00:04:59.200
We noticed this, thought it was bad, and we fixed it. Then we received a call from an integrator claiming we broke their stuff.
00:05:05.280
We were confused, as we hadn’t realized we were breaking their integration.
00:05:11.040
It turned out that they were relying on the previous behavior where we ignored their Accept-Language header and always responded with an English error.
00:05:17.600
They were using that English error to match it against a string, translating it, and displaying something in the UI.
00:05:23.040
Breaking apart a database transaction might seem obvious in some ways when we think about our own systems.
00:05:30.080
We all know that internal consistency is really important, but it's relevant for your integrators too.
00:05:40.160
For example, let's say you have a resource that can be either active or inactive, and when it's deactivated, you create a row in an events table explaining why that happened.
00:05:46.640
It would be quite natural for an integrator to build a UI that explains why this resource was deactivated, helping the user understand what happened.
00:05:54.879
If in the past that event was created inside a database transaction with the status change, and now we break apart that transaction, it creates new scenarios for the integrator to handle.
00:06:01.759
There’s now a possibility for the integrator where the resource can be inactive, but there’s no corresponding event to tell them why.
00:06:07.520
It's entirely plausible that the integrator has assumed this would never happen because it never had, and thus their UI will error if they cannot find the corresponding event.
00:06:13.520
Changing the timing of your batch processing can also lead to issues. As I mentioned, GoCardless is a payments company, and we have a daily batch process that submits instructions to the banks.
00:06:21.679
We can see from our logs that certain integrators create lots of payments just in time, right before our daily payment run.
00:06:29.759
So we know that if we were to change our timings without communicating with them, it could cause significant issues, as a lot of their payments might be delayed.
00:06:38.479
The last example here is reducing the latency on an API call, which is, kind of, what we discussed in that first example.
00:06:43.920
This is probably a good thing overall, but it can have some negative side effects.
00:06:50.320
So today, I'm going to define a breaking change as something where I, as the API developer, do something and someone's integration breaks.
00:06:58.080
That happens fundamentally because an assumption made by that integrator is no longer correct.
00:07:03.760
When this happens, it’s easy to criticize the engineer who made that assumption, but I don’t think that’s particularly productive for a couple of reasons.
00:07:09.840
Firstly, assumptions are inevitable; as a developer, you cannot get anywhere without them.
00:07:15.760
So if you want people to write code, they’re going to make assumptions.
00:07:22.960
Secondly, even if it’s their fault, it’s often your problem.
00:07:31.680
Possibly not if you’re Google or AWS, but for most companies, if your integrators are feeling pain, then you’ll feel it too.
00:07:38.240
Either immediately or in the long-term, when you’re trying to renew contracts.
00:07:45.440
So how do these assumptions actually develop? We can think of these in two categories: explicit and implicit.
00:07:51.520
Explicit assumptions occur when an integrator asks a question, gets an answer, and then builds their system based on that answer.
00:07:58.080
So your first step, if you’re building an integration, is to look at the documentation.
00:08:03.520
It's worth noting that people are quite lazy, and they often skip to the examples without reading any of the narrative text.
00:08:09.280
You need to make sure that your snippets are super representative of how your system is going to behave.
00:08:16.080
They might also look at support articles or blog posts, perhaps stuff you’ve published or something a third party has online.
00:08:21.760
Then, you have ad hoc communication, which includes random emails or phone calls with like a pre-sales team or your solutions engineers.
00:08:27.040
This ad hoc communication drives the assumptions that integrators make about how your software behaves.
00:08:33.280
Other assumptions are more implicit. Industry standards are quite interesting here; if you send me a JSON response, you’re going to give me an application JSON header.
00:08:41.920
I won't need my HTTP client to tell me that it's going to be JSON because it can work it out for itself.
00:08:48.560
I, as an integrator, will assume this never changes. Similarly, I assume that you will keep my secrets safe.
00:08:55.279
If you tell me my access token was used to create something, I will assume it was probably me.
00:09:01.440
This is fine, but in some cases, you can find yourself in trouble, particularly if these standards change.
00:09:07.680
We had a bad incident at GoCardless when we upgraded our HAProxy version, which was observing the new industry standard.
00:09:14.000
The new standard was down-casing all of our outgoing HTTP headers. According to the official textbook, HTTP response headers should not be treated as case-sensitive.
00:09:20.800
But a couple of key integrators had been relying on the previous behavior and had a significant outage.
00:09:26.560
That outage was exacerbated by the fact their requests were being processed, but they weren’t processing our responses.
00:09:34.480
That meant we had two systems that were out of sync in a really unfortunate way.
00:09:38.679
Finally, let’s talk about observed behavior. As an integrator, you want the engineers running the services you use to be constantly improving them and adding features.
00:09:46.480
But you also want them to not touch anything, ensuring that its behavior won't change.
00:09:52.160
As soon as a developer sees something, whether that's an undocumented header or an HTTP response—like a batch process that happens at the same time each day or a particular API latency—they assume it's reliable.
00:09:58.560
They build their systems accordingly. Humans also pattern match aggressively, not just in software but in all walks of life.
00:10:05.760
We see this in the theory of language acquisition; we find it easy to convince ourselves that correlation equals causation.
00:10:11.040
That means that particularly if we can come up with an explanation of why A always means B, however far-fetched, we are quick to accept and rely on it.
00:10:18.560
It’s quite ironic given that we are all developers who are employed to make changes to our own systems.
00:10:25.520
We should understand that they are constantly in flux, yet we all encounter interesting edge cases every day.
00:10:31.040
Someone hits an incredibly unlikely scenario that causes our code to misbehave, but somehow we assume others' code will behave consistently and remain the same forever.
00:10:37.440
None of this stuff is new. A great example, if a bit retro, is MS-DOS, which is an old operating system from Microsoft.
00:10:43.920
MS-DOS was released with a number of documented interrupt calls, hooks, and all that retro stuff, but early application developers weren't able to achieve everything they wanted.
00:10:50.720
This was compounded because Microsoft used undocumented calls in their software, making it impossible to compete using what was only in the documentation.
00:10:57.200
So, like all good engineers, they started decompiling the operating system and wrote lists of undocumented information.
00:11:03.920
The most famous of which is probably Ralph Brown's interrupt list, which became widely shared.
00:11:10.560
Using these undocumented features became so widespread that Microsoft couldn't change anything without breaking these applications.
00:11:16.000
Particularly as an operating system, these applications were a core part of their value proposition, so breaking them clearly wasn't an option.
00:11:22.400
We can think of the interrupt list as analogous to someone writing a blog on Medium called '10 Things You Didn't Know That So-and-So's API Could Do.'
00:11:28.240
It seems innocuous at first, but it can cause problems down the line.
00:11:35.520
Some of these assumptions are also totally unconscious. Once something is stable for a while, we tend to assume it will never break.
00:11:42.080
This is particularly obvious when it comes to resource choices, such as how much CPU or memory to allocate to a particular pod.
00:11:48.480
The napkin math we do is always pretty haphazard. If we’re all being honest, we pull numbers out of thin air, watch them for a bit, and then change them until they seem happy.
00:11:55.120
That works fine as long as what that pod is being asked to do is reasonably consistent over time.
00:12:01.760
But as we've discussed, this might not be true.
00:12:07.840
We can think about this in our first story: the database had plenty of resources until our endpoint got faster.
00:12:13.760
If we want to stop breaking other people's things, we need to help our integrators stop making bad assumptions.
00:12:20.160
When it comes to your documentation, document edge cases. Discoverability is also crucial.
00:12:27.680
Think about SEO, which is search engine optimization, and also the search within your docs site.
00:12:34.240
Don’t ever deliberately leave something undocumented if it's subject to change; just call it out clearly.
00:12:40.800
This gives integrators the best chance of making a good choice.
00:12:47.520
Support articles and blog posts must be kept religiously up to date, and again, try to ensure they’re quite searchable.
00:12:54.079
If you come across third-party blogs that are incorrect, try contacting the author or commenting with the fix needed.
00:13:02.080
You can also point them to an equivalent page on your own doc site.
00:13:10.000
If you get unlucky, that third-party blog content can become the equivalent of Ralph Brown's interrupt list and can fix you to contracts you really don’t want.
00:13:16.240
When it comes to ad hoc communication, consistency is key.
00:13:23.280
If a developer wants to understand what might break someone else's stuff, they need to know what communication is going out.
00:13:30.080
Ideally, this should be in a super-searchable format so they can understand what assumptions might have been made.
00:13:37.600
Many B2B software companies just email random PDFs around, creating shared Slack channels.
00:13:43.840
At that point, as an engineer, you don't stand a chance of knowing what assumptions might have been made.
00:13:50.720
If you’re able to have a central repository for those kinds of materials, it helps.
00:13:57.120
It doesn’t have to be public, but something where you’re repeatedly sharing the same information.
00:14:03.760
Ideally, this information isn't static, but there's an expectation from your integrators that it might change.
00:14:10.160
When it comes to industry standards, just follow them wherever you can.
00:14:17.440
And flag loudly if you can't or where the industry hasn’t yet settled.
00:14:23.680
Also, there’s a lot to think about with observed behavior so we'll give it its own slide.
00:14:30.720
Naming is really important because developers often don’t read narrative docs and instead look at examples.
00:14:37.120
One example is numbers that begin with zero, which often get truncated, such as company registration numbers.
00:14:44.480
We have a field in our API called account number ending, and unfortunately, in Australia, some account numbers have letters in them.
00:14:50.320
This results in confusion for integrators, even though that field is a string.
00:14:56.000
We try to call that out clearly in our docs, even providing examples that highlight those edge cases.
00:15:01.760
You also want to use your documentation to combat pattern matching. If batch timings could change, call that out in the documentation.
00:15:09.600
If you say, 'We currently run this once a day at 11 a.m.,' make sure it’s clear that this timing is likely to change.
00:15:16.000
Expose information about your API that might change, to signal to integrators that what they see now may not always be true.
00:15:22.240
And restrict your own behavior: document a limit and implement it in code to ensure you keep that commitment.
00:15:29.760
We had an issue at GoCardless where an integrator started adding a lot of extra events to their webhooks.
00:15:36.560
Our webhook handlers ran out of memory because they were trying to load way too much data.
00:15:43.760
So if we had known there was a limit on the number of items in a webhook, we could have tested against it.
00:15:50.080
We could have made sure that our pods were resourced appropriately.
00:15:55.840
Complex products make it unlikely that all your integrators will avoid bad assumptions.
00:16:02.560
We need to find strategies to mitigate the impact of our changes.
00:16:10.080
The first thing to remember is that a change isn't either breaking or not. I think this is a completely false binary.
00:16:17.360
If an integrator has done something strange enough, almost anything can be breaking.
00:16:25.760
This binary has historically been used to assign blame. If it’s not breaking, then it’s the integrator’s fault.
00:16:31.920
But as we discussed earlier, it may not be technically your fault, but it’s probably still your problem.
00:16:38.960
If your biggest customer's integration breaks, the fact that you didn’t break the rules will be little consolation.
00:16:45.040
So instead of viewing it as a yes/no question, we should think in terms of probabilities.
00:16:52.960
How likely is it that someone has made this assumption? How likely is it that this will cause an issue?
00:16:59.520
How severe do we think that issue might be? Not all breaking changes are equal.
00:17:05.680
Some changes are 100% breaking—killing an endpoint, for example. You'll have a lot of unhappy integrators.
00:17:12.160
But many changes fall somewhere between 0% and 100% breaking.
00:17:19.840
Try to empathize with your integrators about the assumptions they might have made.
00:17:26.800
Use people in your organization who are less familiar with the specifics than you are as rubber ducks.
00:17:34.240
If possible, talk to them. The more you talk to your integrators, the more you will understand the mistakes they might make.
00:17:40.800
If you can find ways to dogfood your APIs, this can help you find tripwires.
00:17:47.680
This is particularly good as an onboarding exercise; we ask our new joiners to build an integration against our API, putting them in the shoes of integrators.
00:17:54.080
This also helps you keep your docs and guides up-to-date, introducing them to your product in an accessible way.
00:18:02.720
Sometimes you can measure this: add observability to help you look for people relying on undocumented behavior.
00:18:08.960
For example, I've mentioned we see a spike in payment create requests every day just before our payment run.
00:18:15.760
This approach can help identify which integrators might be impacted so you can reach out to them specifically.
00:18:22.320
Some of you may be wondering what about semantic versioning (semver).
00:18:28.080
Now, don't get me wrong, semantic versioning is awesome provided it's used appropriately.
00:18:34.560
The identification of the release type is correct, and this is a great way to release potentially breaking changes.
00:18:40.720
We should use this not just for packages but also for APIs and web hooks wherever possible.
00:18:47.040
This solves some of our problems but not all of them. As someone who maintains a public API, there are lots of changes that can't be applied this way.
00:18:53.120
For instance, the timing of our batch processing or reducing the latency on an endpoint.
00:19:00.080
Not everything can be applied on an opt-in basis at a merchant-by-merchant level.
00:19:06.720
Additionally, every new version you support increases the complexity of your system.
00:19:13.120
Complexity leads to risk; it makes it harder to debug things and can cause other issues.
00:19:19.520
There’s a trade-off to make. If a major version doesn't work for your use case, I recommend scaling your release approach.
00:19:25.520
This depends on how many integrators you think have made bad assumptions and what impact those might have.
00:19:31.840
We want different strategies at different levels; if we over-communicate, we get into a 'boy who cried wolf' situation.
00:19:38.160
No one reads emails sent to them, and their integrations end up breaking anyway.
00:19:45.120
Strangely enough, the email in their inbox that they didn’t read doesn’t seem to make them feel any better.
00:19:52.000
To handle this, start with pull communications. Update your docs or a changelog.
00:19:58.800
This is particularly useful to help integrators recover after they've found an issue.
00:20:05.040
Then you can upgrade to push communications, like a newsletter or an email.
00:20:11.440
This is where it gets tough. We all ignore many emails every day, so try to ensure the content is as relevant as possible.
00:20:18.160
Don’t tell integrators about changes to features they don’t use and resist the temptation to include marketing content.
00:20:24.720
If you’re really worried, use explicitly acknowledged communications. This works well if you have a few key integrators you want to check in with.
00:20:31.680
For instance, if these are the only people relying on this functionality or just a couple of particularly important integrators.
00:20:38.080
It’s important to make these kinds of changes often. It’s a muscle you need to practice; otherwise, both you and your integrators get scared.
00:20:45.360
You may forget how or lose the infrastructure to do it.
00:20:52.720
And if you’re really unlucky, the cultural incentive is to argue that a change isn’t breaking and release things without rigor.
00:20:59.760
We can also mitigate the impact of a breaking change by considering how to release it.
00:21:06.160
If possible, make those changes incrementally to give early warning signs to your integrators.
00:21:13.040
For example, apply the new behavior to a percentage of requests; this helps integrators avoid performance cliffs.
00:21:20.080
It could turn a potential outage into a minor service degradation.
00:21:27.760
Many integrators will have near-miss alerting to help them identify problems before they cause significant damage.
00:21:35.280
If you have a sandbox environment, it's a great candidate for applying changes.
00:21:42.720
Making changes there as long as integrators are actively using it can act as the canary in the coal mine.
00:21:50.360
This helps flag changes you didn’t think were dangerous but might be a little bit trickier than you thought.
00:21:57.920
Finally, think about rolling back. If your biggest integrator calls you to tell you that you’ve broken their integration, it’s nice to have a kill switch.
00:22:05.040
This is based on the nature of the change, but it's good to know what your kill switches are and to be clear about when they are possible.
00:22:12.080
As soon as that call comes in, you want to know your options and be able to react quickly.
00:22:18.560
The only way to truly avoid breaking other people’s things is not to change anything at all, and often even that is not possible.
00:22:25.760
So instead, we should think in terms of managing risk.
00:22:32.240
We've talked about ways to prevent these issues by helping your integrators make good assumptions in the first place.
00:22:39.760
It is crucial to build and maintain the capability to communicate when making potentially breaking changes.
00:22:46.560
But you aren’t a mind reader, and integrators are sometimes careless under pressure, just like you.
00:22:53.200
Be cautious and assume that your integrators didn’t read the docs perfectly or maybe at all and may have cut corners.
00:23:00.000
They may not have the observability of their systems that you might hope or expect.
00:23:06.320
You need to find the balance between caution and product delivery that's right for your organization.
00:23:13.600
For all the modern talk of 'move fast and break things,' it is still painful when things break.
00:23:21.440
Recovering can take a lot of time and energy.
00:23:29.680
Building trust with your integrators is critical to the success of a product.
00:23:36.480
But so is delivering features. We may not completely stop breaking people's things, but we can make it much less likely and much less severe.
00:23:43.480
I really hope you’ve enjoyed the talk. Thank you so much for listening. Please find me on Twitter @PatrickEdge if you’d like to chat about anything we’ve covered today.
00:23:50.800
I hope you all have a great day.
00:26:29.679
Thank you, Lisa! I know it's really hard to deliver a talk without an audience, so let me read some feedback for you.
00:26:35.840
People enjoyed your talk. Such great feedback! Wow! That was the best talk of the day so far.
00:26:42.320
Thank you! The audience really enjoyed your speech. Are we ready to go to the questions? Yes? Let's go!
00:27:10.240
What about decisions? How should we document past decisions, like why did we do that?
00:27:16.320
Um, I think this is interesting, as there's an internal and an external side to this.
00:27:25.600
Internally, I believe the best documentation is in git because it sticks around longer and is the easiest way to make stuff discoverable.
00:27:32.320
There are a bunch of talks about this, but include the 'why' in commit messages.
00:27:45.280
Try to ensure your commit messages are atomic, following the best practices learned early in development.
00:27:50.800
For larger decisions impacting your integrators, pushing that information out is key.
00:27:56.480
A blog is a great way to communicate that kind of 'point-in-time' reasoning—this is why we’re doing this.
00:28:02.200
It helps buy people in. If they need to make a change to mirror what you've done, you want to convey that there’s a good reason behind it.
00:28:09.200
Communicate the benefits it will bring them while being sorry for the pain it may cause.
00:28:16.000
I think a blog really distinguishes that type of communication from your documentation, which should be static.
00:28:24.480
Thank you for the good idea.
00:28:32.320
The next question is can you apply the insights from this talk to user experience?
00:28:45.280
Certainly, users are often surprised by sudden changes on the website or app.
00:28:52.640
I think one of the best things to do is A/B testing and to roll things out to a percentage of your users.
00:29:01.280
This is particularly useful for a big user base and can provide early warning signs if something isn’t right.
00:29:07.680
When it comes to UX, the best thing you can do, as horrifying as it is, is to watch people using your tool.
00:29:14.720
Incredibly painful, but it’s truly the best way to learn about expectations.
00:29:22.480
We would love to see the best memes about this in our chat or later on Twitter.
00:29:28.720
Now, let’s move on to naming. How do you convince people that naming is critical?
00:29:37.640
It's obvious to me, but I’ve struggled to convince others.
00:29:42.720
I think it's about observed behavior. Developers don’t read documentation; they read a minimum number of words to get stuff working.
00:29:49.920
When you explain that, everyone internally goes, 'Oh, yes! I do that too.'
00:29:56.320
If you utilize examples, they will see the name as the most front-and-center information about what that field means.
00:30:02.800
If you get that wrong, you’ll spend the rest of your life putting signposts and flags everywhere.
00:30:08.240
Framing it like that really helps convey its importance.
00:30:15.200
You can also share horror stories, like the time Australian account numbers threw me for a loop.
00:30:21.440
These anecdotes help illustrate the importance of naming.
00:30:28.080
Let’s move to the topic of versioning APIs.
00:30:36.000
Breaking changes should not be in the same version of the API, right? That’s where versioning comes in.
00:30:43.440
Adding new attributes is often considered a non-breaking change.
00:30:50.080
However, what happens when a client breaks because they received unexpected attributes?
00:30:55.840
In that case, whose fault is it? I don’t think knowing whose fault it is is a useful question.
00:31:03.040
The industry standard increasingly is that adding fields should not be a breaking change, so clients should discard unexpected keys in a JSON response.
00:31:10.160
It's essential to signal that this standard exists in your documentation.
00:31:17.200
You should also note that many libraries auto-generate clients, which helps maintain expected behavior.
00:31:23.680
If you can provide those libraries, it greatly aids integrators.
00:31:30.000
Regarding the assumptions: if assumptions are unavoidable, what should we leave out of documentation?
00:31:36.640
This is a trade-off, and it's difficult to give a generic answer.
00:31:42.960
The point about communication comes down to the 'boy who cried wolf' situation.
00:31:50.080
If you receive too many emails that are unnecessary, you'll stop reading them.
00:31:56.800
So, push communications can be dangerous, especially when inundated with marketing material.
00:32:03.680
Narrowing your audience is helpful; tell only the relevant people about the product changes.
00:32:10.080
You should document the most important things—the ones with the highest impact.
00:32:17.760
For example, if someone misinterprets something and that leads to double charging a customer, that is critical.
00:32:25.040
Conversely, if the outcome is simply displaying a string that isn’t quite right, you can be more relaxed.
00:32:31.920
You need to manage the likelihood of assumptions alongside potential impacts.
00:32:41.120
Now, do you ever skip nice changes because you know it will create a lot of work to communicate the change?
00:32:48.160
Of course, I would never do that! Yes, everybody has faced this situation.
00:32:54.960
The problem typically is that either you don’t change it, or you do change it but don’t tell anyone.
00:33:02.080
You need to keep those incentives to treat integrators with respect, which means reducing friction.
00:33:08.680
Build tools that make it easy to communicate, get feedback on that communication, and adhere to style guidelines.
00:33:15.280
Clarifying responsibility for communication can alleviate confusion.
00:33:22.600
Have the tooling and processes to reduce friction; otherwise, integrators may end up with a worse service, or their systems will break.
00:33:30.080
What about deprecated fields? How long should we keep them forward?
00:33:37.480
I apologize to our integrators! We often keep them forever.
00:33:44.720
Your policy should consider the cost to the person making the change versus the risk of keeping multiple versions.
00:33:52.640
People often mistakenly think keeping deprecated fields doesn’t impact anyone, but having multiple versions introduces complexity.
00:34:00.080
Anything that can help reduce complexity is a positive thing, including eliminating deprecated elements.
00:34:08.080
Set a hard line—commit to a specific day, and if the world isn't on fire, the deprecated elements will be removed.
00:34:14.840
Make sure every team feels empowered to enforce this policy; it's critical for system health and helps everyone out.
00:34:24.080
Thank you, Lisa! You truly are talented, and the audience has greatly enjoyed your presentation.
00:34:30.720
Huge thanks to you, and let’s hope to see you in the chat! You can find Lisa on Twitter!