Issues with asynchronous interaction

00:00:12.250 Hello, my name is Anna. Today, I am learning Luxembourgish as part of my language education. We are going to talk about issues related to asynchronous interaction between clients and servers, and/or between servers.

00:00:30.649 I work at a company called Tech 3D as a backend developer. I enjoy working with different programming languages because it can be interesting. The last language I was trying to learn is C#. I found that there are many features in C# that I miss in Ruby, for example, methods of loading or interfaces.

00:01:06.260 In our company, every backend developer is responsible for a part of our infrastructure because we do not have dedicated DevOps or SRE engineers. This can be tricky because when you are responsible for server infrastructure, you cannot just be a developer; you need to possess skills from different areas. Ideally, this responsibility should be shared among two or three different people, but in our reality, it falls solely on our backend team. However, this gives us great benefits as we can think about a feature not just from a coding perspective, but also at the infrastructure level.

00:01:57.320 Six years ago, we started a project called Code View Shape. This project is related to 3D scanning and allows our clients to upload their 3D models, share, and rotate them. At the start, we hosted it on Heroku, where we had one free dyno and were happy with our small application and limited clientele.

00:02:12.310 Things changed when we scanned a popular donut, and its creator, Adele Morse, was in Russia. This model gained a lot of attention from the media, and many news agencies published articles about it. Unfortunately, we were totally unprepared for this increase in traffic, and we experienced significant downtime for several days after it. Eventually, we received a bill at the end of the month that made us realize we had to improve our infrastructure.

00:03:00.489 Now, I would like to ask you, have you ever experienced downtime? Last year? Last month? Last week? Well, I didn't experience downtime personally. However, I think downtime can be acceptable because you never know what kind of capacity you will get, and sometimes it can grow in unpredictable ways. So let's discuss interaction patterns and how they can help you avoid downtime, particularly in terms of requests.

00:04:06.629 When a small application starts, it often begins with synchronous requests, which expect a reply shortly after. As it evolves, it might implement some asynchronous interaction. However, a myriad of patterns exist that go beyond these two terms. For instance, there's asynchronous interaction with a timeout, where your application receives a request and waits for a specific time for a response or until a timeout error occurs. But once the timeout occurs, the application doesn't wait for a response. This can be useful when you need to implement part of a feature that has time-sensitive limits, such as booking a flight where you must receive information within 30 seconds.

00:04:53.220 Another pattern is asynchronous requests with a notification timer. In this instance, the client sends a request, the server processes it, and a timer starts on the client side. The client waits until the timer expires or the reply arrives, whichever comes first. There are also patterns involving one request and multiple responses that include mandatory and optional responses. For example, when placing an order, the client may receive information about stock availability while having the order created. There's also unsolicited notifications, where the client receives updates without requesting them, similar to how bank applications notify customers when new credit is available.

00:08:02.410 Let’s journey back 20 to 25 years when requests were more synchronous and error-prone, and scaling often meant buying the biggest hardware available. During that time, most applications were monolithic, relying solely on hardware upgrades to scale. However, once the biggest hardware was purchased, further scaling called for reevaluating infrastructure. It’s important to not just scale up but also scale out.

00:09:00.920 To illustrate different scaling strategies, let’s explore three axes of scaling: vertical, horizontal, and Z-axis scaling. Horizontal scaling involves duplicating resources, such as front servers behind a load balancer. Vertical scaling, on the other hand, allows you to split your application by resources or objects, enabling you to scale specific parts of your application as necessary. For example, if everything is hosted on a single server with a database, codebase, background workers, and caching, scaling any one component could impact others. Separating components onto different hardware allows individual components to scale independently.

00:10:09.670 Z-axis scaling refers to partitioning your data by cases. For instance, if you're operating an online sales platform with numerous small shops, and one particular shop starts achieving significantly more sales, it might be beneficial to allocate dedicated resources for that shop. However, scaling can be hindered when application parts rely on the same hardware, making it difficult to manage effectively. For threat mitigation, it’s crucial to keep your database on separate hardware, ensuring that all components, from databases to caching to background workers, can operate optimally.

00:11:17.000 Moreover, your application should be capable of scaling down as well. You do not need to maintain a large state between requests or background jobs. For example, if uploading a sizable file, consider where to store that content temporarily without overloading your instances. Additionally, your application must be prepared for failures. Failures are an unavoidable part of our profession as developers, and the responsibility often can’t lie solely with us; server interruptions or data center issues can occur. Therefore, we need to devise methods to make our applications resilient against such failures.

00:12:23.020 To ensure your application is robust, proper monitoring is crucial. This includes monitoring API calls, hardware states such as CPU and memory usage, and backup status. Monitoring tools can help identify bottlenecks within your application to prevent performance issues. Let's imagine that you encounter issues with your application and reach out to the support team. Meanwhile, your business owners are kept in the loop about the trouble, and you're out having a good time, which isn’t ideal. Monitoring is essential to avoid these situations. Additionally, log aggregation is vital, especially when discussing horizontal scaling, to view logs remotely considering that multiple instances may process requests at their own pace.

00:14:07.870 A good practice is implementing distributed tracing, especially in microservices architecture. When a request hits different parts of your application, you need to trace it effectively. By assigning an identification number to each request, you can track its progress through various components. In doing so, it becomes clear what happens at each stage of the request lifecycle. For instance, if you identify a request processed on the front server, which creates a background job, and then passes to another backend server before returning a response to the client, it becomes easier to troubleshoot issues. Exception tracking is also an important practice. While simple methods such as email notifications can be useful, they become overwhelming with increased user counts.

00:17:07.490 With a high volume of errors, you can easily drown in an influx of notifications. Specialized tools can help aggregate information about exceptions, allowing you to see which exceptions affect the most users, how often they occur, and at what intervals. More importantly, you want to learn from every failure, so discussing problems and solutions with your teammates is important. Sharing knowledge aids in improving future responses to critical situations and enhances overall team performance.

00:18:32.020 QA teams do not inherently increase the quality of your applications; they mitigate risks. It is up to developers to ensure a high quality within the codebase, while QA can help minimize the risks by providing checks and tracking repetitive issues. Regularly checking backup systems is essential as well. It's essential to ensure that your database or static resources are backed up correctly; neglecting this can lead to significant issues. A backup ticket left unaddressed is not an acceptable solution. Ensure that you have a robust deployment process in place, allowing you to roll back changes if needed. Automating this process is advisable.

00:19:43.560 Monitoring continues to be a key aspect of employing autoscaling strategies; accurate metrics must be established to indicate when scaling up or down is required. Having dashboards to visualize this data can significantly improve decision-making processes, allowing quick responses to potential issues. Finally, the importance of minimizing single points of failure cannot be overstated. Implementing practices such as master-slave database setups, hot swappable failover, and effective load balancing is necessary to maintain the system's integrity.

00:20:56.820 Applications should evolve in a way that embraces monitoring and continuous improvement. Understanding where scaling decisions are made is necessary for fostering a responsive system. Engaging in discussions about scaling objectives, setting up monitoring dashboards, and employing smooth autoscaling strategies prepares your application not only for growth but also ensures sustainable operations. With the right approach, you can save costs while positioning your company for remarkable capacity handling.

00:22:31.260 Let’s return to interactions; we’ve discussed many patterns before, particularly synchronous and asynchronous methods, but we must not forget about one-way messages. These signify that our application must perform core functions even without a stable internet connection. For example, a user should still be able to create tweets or shares without interruption. It is more beneficial if an application handles these instances seamlessly rather than alerting users to an internet outage. I hope applications across the board will become increasingly user-friendly and stable.

00:23:56.260 Ultimately, it’s critical to remain competent, learn from past failures, and grow by engaging in discussions with your peers. Read extensively, converse at conferences, and engage critically with ideas; the sharing of information fosters a healthy collaborative environment. Thank you for your attention.