Talks

Dirty Magic for Resilient API Dependencies

Imagine that you want to build a system which depends on external service, e.g., logistics, payments or notifications service. These systems have its life-cycle which you have to be in sync with. Sergey will share how to treat issues you could face, using the examples of DHL, UPS, Zoho, eBay integrations.

Sergey is an experienced Software Developer interested in building a strong and sustainable community around OSS and likes to discuss different ways to profile, debug and optimize applications. He also loves dogs and is a drummer and musician.

RubyConf TH 2019

00:00:06.720 Hello, everyone! To start things off, I have several questions for you. How many of you are using a microservices architecture in your projects? Okay, and how many of you use a single-page application where the front end and back end are separate systems that communicate through an API? And lastly, how many of you depend on external systems like notification, logistics, or payment services? If that's the case, this talk will definitely make sense to you. For those who don't have such dependencies, don't be upset; the use cases I'll discuss today can be applied to any distributed systems, and sooner or later, you'll likely face similar issues.
00:00:20.890 I want to start by introducing myself. I come from Russia, specifically from Saint Petersburg, where I live with my dog and my drum kit. This year, I've changed my lifestyle and started traveling a lot. I've visited various events, meetups, and conferences in different countries because I believe strongly in open-source solutions. I couldn’t imagine creating something valuable without discussing ideas and solutions with others; it's crucial for me to know how I can help. Today, I'm here representing a company called Evil Merchants. Our goal is to help our customers build their businesses at warp speed. We achieve this by creating, using, and contributing to open source. The project I'm currently working on is called eBay for Business, or eBay Mac, which aims to improve the selling experience for eBay sellers globally. We also focus on logistics solutions for our clients.
00:01:34.000 Interestingly, eBay Mac is entirely built on API integrations, meaning almost every action a user takes results in changes to external systems that we don’t have direct access to. For instance, let me share a story about an eBay bug: at one point, they decided to send us a request that contained an address with no values. The response was fully compliant with the schema but lacked essential information regarding the address where we were to ship a parcel. Naturally, we were unable to react quickly and had to fix the issue in a hurry. During the entire lifecycle of our project, we’ve encountered countless situations where external systems introduced bugs in our application.
00:02:11.000 Over time, our application became so unstable that we could not produce features. This forced us to stop and carefully consider how we could improve our stack and handle these challenges. The approach that helped us was what we call the 'contracts approach.' I don’t mean contracts in the legal sense, but as a general framework. When discussing contracts, we must keep in mind three key elements: preconditions, postconditions, and invariants. This approach allows us to separate our system into functionally cohesive blocks and apply those preconditions to our input while establishing postconditions for the output. Invariants represent basic validations.
00:02:57.000 If those validations pass, we can proceed as usual. However, if something goes wrong and any of the validations fail, we prevent the propagation of the error and attempt to fallback our system into an emergency mode where that error or behavior does not disrupt our system. So, how does this work with our integrations? At a high level, when eBay starts sending us confusing data—which sometimes happens—we switch our application to fallback mode automatically. This gives us time to adapt the contract to that specific API and manage the situation. If there's a temporary outage in the service, we can remain in fallback mode until the service is restored and everything goes back to normal.
00:04:11.000 We strive to introduce contracts for every significant API integration within our application. This ensures a better understanding of how we create our solutions and how we generate ideas. Let me share a story about our evolution, particularly how we learned to first control the input and output for our API dependencies, how we learned to control their state, and finally how we designed effective solutions around these principles. In eBay Mac, I lead the logistics development branch, so all my examples will be drawn from logistics. We began our integration work with DHL, which initially seemed quite straightforward: we needed to propagate eBay orders directly to DHL with the click of a button.
00:04:57.000 It sounds simple enough; we had the address and other necessary information. However, we released it too quickly, and it didn't work. The problem arose because the entire system did not function as documented. There were many more rules than we initially anticipated—obscure rules for extracting data and handling errors. This forced us to adapt our solutions, and eventually, we understood the importance of validating and filtering data before sending it to external systems. In summary, our overall solution involved a multi-step process.
00:05:40.000 We first converted eBay orders into parcels through straightforward validations, discarding any orders that we could not process, such as those that lacked an address. Next, we performed additional validations before sending requests to DHL. We then prepared for any confusing content DHL might return, attempting to recognize and validate it again before relaying any information to the user. At this point, we built our first chapter, which we called the 'non-policy.' Our overall solution resembled a pipeline where we processed eBay orders through validations before passing them on to DHL.
00:06:43.000 Additionally, we employed sampling, which allowed us to record requests and responses, tagging them with relevant information. Initially, we used two tags: 'success' and 'unexpected behavior.' We discovered that only a minor percentage of requests were successful and began investigating other tags that indicated various errors. We started digging into the recorded requests and responses to extract valuable insights. From this experimentation, we established what we called a 'policy object.' This is a straightforward abstraction over validations.
00:08:01.000 Now, if the initial validations succeeded, we could map our data and send requests. The 'mapper' served as a service object that transformed the Active Record model into a suitable XML request for a specific endpoint. For example, we created a mapper for tariff requests to DHL. It was a straightforward solution, leveraging a library called 'Brandish Eliezer,' which is effective for handling such tasks.
00:08:29.000 During our integration process with UPS—while the general structure seemed similar to that of DHL—we faced a challenge. We were unable to reuse any lines of code from the DHL integration because it was messy. This prompted me to think critically about how to refactor it in order to allow code reuse, ultimately making the solution more elegant. It took almost two years of searching and implementing before I found a reliable, maintainable solution that successfully addressed our requirements.
00:09:51.000 Interestingly, one of my colleagues suggested I explore functional programming principles, particularly Haskell. While it may have contributed to the lengthy process—learning a new paradigm of programming—I certainly gained valuable insights into data validation approaches. I want to share insights that could save you valuable development time. The two key takeaways involve composition and error processing. Specifically, in functional programming, there exist constructs known as 'algebraic data types.' A seemingly complex concept, but in practice, it means we can define operations, similar to defining algebra on anything—yes, even pizzas.
00:11:31.000 To apply these principles to our 'policy objects,' I implemented a system where a product of two policy objects creates another policy object that only validates when all arguments are valid. Conversely, the sum of two policies results in a policy object that is valid if at least one of the underlying policies is valid. This powerful system allows us to express complex validation logic simply. When validating responses from our external systems, we first ensure the response is proper JSON; only then do we verify the content against the expected output or any known errors.
00:13:24.000 Within this level of abstraction, we articulate how we communicate with UPS in terms of expectations and handling errors. Next, we introduced 'refinement types'—these are formed from algebraic data types and predicated on rules to define valid states. A straightforward example might involve validation of cat types. Imagine needing to introduce a refinement type into your project; you’d need to think in terms of specification boxes with labels and accompanying validation rules. Our entire architecture was organized around these concepts, and when errors arise, they clearly inform us about validation failures. Implementing this methodology requires a consistent interface to manage the outputs we expect.
00:16:22.000 As we delve deeper, we find that unpacking these types is pretty straightforward. With predefined types that detail why a validation failed, our overall solution aims to simplify the understanding of our contract obligations with external APIs. When constructed correctly, our interfaces can be resilient enough to handle external service inconsistencies and ultimately deliver a smooth user experience. We maintain strong relationships between our various components. When errors arise, the context is preserved, allowing us to troubleshoot effectively by pinpointing exactly where failures occurred within the pipeline.
00:20:04.000 We also created instrumentation tools to monitor the number of contract matches and validate the accuracy of data being exchanged across systems. Some of the tools we used include Yabba, to gather metrics in our Ruby or Rails applications, along with Sniffer, to capture HTTP session information. This effort allows us to continue refining our overall approach. Throughout this presentation, we've looked at the evolution of our system, and naturally, our future focus will include the continued evolution of sagas to manage distributed transactions more effectively. We are aiming to introduce a fully distributed Ruby library for event processing to enhance our ecosystem.
00:26:01.000 I believe this will pave the way for the eventual rollout of the dirty pipeline, along with adequate documentation that boasts examples, which makes it easier for developers to adopt our systems. If you're interested in delving deeper into the concepts we covered today, I recommend checking out some key resources. One must-read is 'Category Theory for Programmers' which gets you acquainted with the best Haskell and functional programming introductions available. For those anxious about Haskell, consider starting with 'Haskell by a Third,' which offers tidbits to ease you into functional programming from a Ruby perspective.
00:27:08.000 In conclusion, I hope you've found this talk valuable and that these insights will assist you in building more resilient integrations. Thank you for your attention!