00:00:16.199
Okay, hi everyone.
00:00:25.439
My name is Michał Zajączkowski de Mezer, and I'm here from Naguro. My first takeaway is that no matter how much you prepare, you always encounter technical issues.
00:00:32.399
But the team here is amazing, so let's give them some applause.
00:00:44.460
I feel really privileged and honored to be here, to share my thoughts and perspective with you.
00:00:49.620
Especially since my presentation doesn't include a single Ruby line of code; in fact, it doesn't have any code at all. It's language agnostic.
00:00:56.120
Gathering all these thoughts has been very useful to me, and I hope that at least some of you wonderful people will find some useful or fresh insights.
00:01:21.540
Before I start, I would like to give special thanks to Mikhail Bronikowski, who contacted me to give this speech. I wouldn't be here without him.
00:01:34.200
Today, I'm going to discuss how to ensure systems do what we want and take care of themselves.
00:01:41.759
Sounds bold, right? Stability heaven.
00:01:50.159
I would really like to know about your experiences, but in my career as a back-end engineer, I often find myself and my colleagues operating in production.
00:01:55.640
More often than I would like, we are hot-fixing production, manipulating production data, and ensuring everything goes smoothly. This is bad for many reasons.
00:02:15.720
I believe these issues can be avoided by design, at least some of them. What I often see is that people struggle with not respecting or providing certain processing guarantees.
00:02:27.360
Of course, bugs create many issues; I won't give you any silver bullets, but I hope to provide you with a bag of hints and a useful perspective to consider for systems that will lead you toward this goal.
00:02:40.819
Let's start with a broad view. Any systems we build are made up of many components that process data and communicate with each other.
00:02:48.480
Here's my first piece of advice: when connecting these components, use a simple abstract pattern.
00:03:00.540
This will help you design and code components that take care of themselves.
00:03:08.940
We have various backgrounds here, so as these elements may sound a bit vague, let's drill down into the details.
00:03:24.060
I won’t be saying anything new; this is knowledge gathered from wiser people through many experiences. For the sake of simplicity, I'll use the abstraction of message-passing.
00:03:39.900
In message-passing, various actors send or receive messages and process data in between. A very important thing to remember is that failures can happen at any time.
00:03:45.239
No matter how good your system is, anything can break, and this is the most truthful fact in this world.
00:03:52.380
This message-passing concept can be applied to almost anything. We had great talks before about various communication patterns.
00:04:02.400
We've had discussions about queuing jobs and event sourcing. Many times, I was inspired by previous speakers discussing event-passing.
00:04:16.320
To continue, we need a few more terms to remember. We have three key terms: execution modes, processing guarantees, and delivery semantics.
00:04:31.680
Let’s go through them, and you will see they’re not too difficult. The first one is 'at most once.' This means that whenever I want to do something, I trigger it just once.
00:04:45.860
You have to be a bit paranoid about what you know about what you did to really understand what that means.
00:05:00.540
If I’m not sure whether I did something, even if I have some traits that suggest I might have done it, I won’t try again.
00:05:06.380
This may mean the action didn't actually happen, and that is the risk. The next execution mode is 'at least once,' which is the other side of being paranoid.
00:05:39.060
If I'm uncertain whether something was completed or executed, I will attempt to do it as many times as needed until I’m sure about it.
00:06:02.700
As you may imagine, the risk here is that an action can potentially happen multiple times, which is also not ideal.
00:06:15.479
What we all actually want is 'exactly once.' When a sender sends a message, the intention is to send it once. Similarly, when the receiver gets the message, they want it processed only once.
00:06:38.640
Achieving this is challenging and hard without a reliable mechanism to ensure it.
00:06:44.400
But we have a recipe, so let’s look at the first component of the recipe: 'at least once.' What does this mean in practice? It means we need to retry our messages.
00:07:00.540
Messages are crucial because we don't want to lose them. If we send a message and something breaks, we need to retry.
00:07:12.799
We want to ensure we receive confirmation of success. That's great, but what happens if the receiver is in trouble and the sender doesn’t know?