RailsConf 2019

Background Processing, Serverless Style

Background Processing, Serverless Style

by Ben Bleything

In the presentation "Background Processing, Serverless Style," delivered by Ben Bleything at RailsConf 2019, background processing within modern application development is explored, particularly focusing on the integration of serverless architecture. Bleything covers the evolution of background processing, emphasizing its necessity for improving user experience by offloading time-consuming tasks from the main application cycle. He introduces the concept of serverless computing and its benefits, detailing how it can enhance backend efficiency.

Key points discussed include:
- Definition of Serverless Computing: Serverless computing allows developers to execute code without needing to provision or manage servers. Users are charged only for actual computing time, leading to potentially lower costs.
- Background Processing Scenarios: Typical examples include handling video uploads or processing text for searches in applications. These scenarios illustrate the importance of removing long-running tasks from the main request cycle.
- Advantages of Serverless: The serverless model scales automatically, allowing applications to handle varying workloads without manual intervention. This flexibility reduces operational overhead, making it easier for developers to focus on writing code rather than managing infrastructure.
- Trade-offs and Caveats: Despite its advantages, serverless computing can introduce complexities, such as difficulties in monitoring and understanding the flow of data. Bleything highlights the challenges of maintaining oversight in systems designed with serverless architectures, emphasizing the need for thoughtful planning and understanding of limits.
- Implementation Examples: Practical examples such as using webhooks from GitHub for continuous integration and handling video uploads through event-driven functions are provided to illustrate how serverless processing can simplify application design.
- Monitoring and Alerting: The necessity for monitoring serverless functions is stressed, as well as the importance of understanding potential pitfalls, such as message duplication in event-driven systems.

Conclusion: Bleything concludes by encouraging developers to experiment with serverless architectures, weighing benefits against the complexities they introduce. He urges community members to share their experiences with these technologies, fostering a collaborative learning environment.

00:00:20.810 Hi everybody! My name is Ben Bleything, and my pronouns are he/him. I'm here to talk to you today about what I think is absolutely the most exciting aspect of modern application development: background processing.
00:00:33.210 Thank you! I also want to talk about serverless technology and how you can combine it with background processing to achieve interesting results. A little bit about me: I've been involved with Ruby for a long time, almost 15 years. I've worked on some pretty interesting applications, including GitHub, LivingSocial, and White Pages.
00:00:46.589 I've also been involved with smaller but equally fascinating projects in animation, financial services, and indie music licensing. Throughout my career, I've focused mostly on infrastructure, operations, and architecture.
00:01:03.270 Currently, I am a Developer Advocate at Google, where my job includes thinking about how to modernize our systems and architectures to make them better for both developers and users by adopting new technologies. By 'new,' I don’t necessarily mean emerging technologies, but rather those that are new to us. My goal is to help improve your development experience.
00:01:46.800 Before we dive in, I want to make a few disclaimers. First of all, this is not a sales pitch. I'm not here to convince you to adopt any specific technology or to use Google Cloud Platform (GCP). You can use whatever works for you, and if you already have a cloud provider, that's great! Ultimately, I don't want you to feel pressured to switch to anything if you're already happy with what you have.
00:02:21.720 So, the truth is that you probably don't need serverless technologies. I think they're exciting and hold potential for innovation, but exploring this as a community will reveal interesting ways to use these technologies to enhance our applications.
00:03:00.280 My aim today is to share insights gained during my research for this talk, which may inspire you to experiment on your own or save you some time. With that introductory material out of the way, let’s discuss background processing. I suspect that most of you are familiar with this concept.
00:03:31.100 This is when you have tasks that take too long during a request cycle. You want to move those tasks out of the main processing flow to avoid timing out for your users or creating a poor experience. For example, think of building a new YouTube-like platform where users upload videos. You certainly don't want users to wait through the transcoding process.
00:04:10.910 I apologize; I developed a cough this morning, so I'll be hitting the water a lot and potentially muting myself to cough. Anyway, with background processing, most folks are likely familiar with tools like Sidekiq or Rescue, and there are many others like Delay Job, Sneakers, Backburner, and Sucker Punch. It's a common practice in modern application development.
00:04:59.990 Just out of curiosity, how many of you have used something like that before? Good, I thought so. Is anyone using what they would consider serverless solutions for background processing right now? I'd love to talk to you later, as it's possible I might say something you disagree with, and I want to learn from you.
00:06:02.090 What is serverless? I came into this discussion with a somewhat vague understanding, having been in this field for a while. I asked my friends for a tweet-sized definition of serverless, and I received some enlightening responses. Jason Watkins, an experienced Rubyist from Portland, said, 'Function as a service works fine for me,' and I believe that's a common sentiment.
00:06:55.889 Asha mentioned that serverless operating typically operates by charging for every unit of compute or instance used, and you don’t pay when nothing is being executed. At Google, we refer to this as 'scale to zero.'
00:07:42.769 Before cloud services emerged, you had to buy hardware in advance, which often required significant planning. For example, a year and a half ago, I had to order a quarter-million-dollar setup for a client, which took us nine months to obtain due to a global SSD shortage.
00:08:16.859 But the advent of cloud services like EC2 changed that, allowing resources to be provisioned on-demand. Serverless takes this a step further; rather than paying for unused capacity, you only pay for what you actually use.
00:09:11.009 While VMs require you to specify how many CPU cores, how much RAM, and storage beforehand, serverless means that if no one is using the function, you’re not charged. You can dynamically scale to meet demand and ignore the complexity of capacity planning.
00:09:31.699 My colleague Sandeep said serverless implies no manual scaling or provisioning. We’ve had auto-scaling for a while at VM and Kubernetes levels, but serverless reduces your operational workload significantly. Although I come from an operations background, it’s worth noting that while operations responsibilities might theoretically decrease, they’re often just pushed to the cloud provider.
00:10:05.490 As a result, while your teams won't handle certain operational tasks, you're outsourcing them to providers that take care of managing functions for you. Serverless can indeed reduce overhead, and I think it can benefit development teams at various levels.
00:10:52.160 To clarify, the concept of Functions as a Service (FaaS) is where you take a small chunk of code and run it on a serverless framework, which can receive HTTP requests or respond to events triggered by your application. Most providers offer an array of event-driven triggers. This is an area with immense potential.
00:11:43.880 Now let's consider a basic example of a webhook handler. Imagine you decide to create a continuous integration (CI) tool that interacts with GitHub webhooks to build artifacts. In a simple flow, GitHub sends a webhook to your Rails application, which queues that job in Sidekiq.
00:12:16.750 However, if you want to make this serverless, you could use the aforementioned HTTP method and leverage AWS API Gateway to manage incoming requests and trigger functions running on AWS Lambda. The idea of an API Gateway is that it helps to standardize incoming webhooks, allowing you to route the request directly to a function.
00:13:02.540 This means you don’t need a full Rails app to handle webhooks; just a function that responds to events. HTTP has become the standard transport in our industry, so if you need to get into serverless, using an API Gateway is a low barrier to entry.
00:13:43.800 Now, you might wonder what the real advantages are for this setup. While using serverless can simplify your codebase, there are benefits to separating it from your current infrastructure.
00:14:10.350 In addition, handling background tasks serverlessly enables you to scale more easily. For instance, if your popularity spikes and you notice increased demand, serverless solutions can automatically scale to match your usage.
00:14:56.690 Moreover, decoupling your processing code can lead to a cleaner architecture, especially if you choose to integrate it with diverse tech stacks. For instance, if your main application is built with Rails but you need to run machine learning tasks, you might want to create Python functions instead.
00:15:27.430 However, you should be mindful of potential downsides. Serverless systems can sometimes be opaque, and with increased complexity comes additional moving parts that require careful monitoring. When you're not running the core processing in your application, it's essential to ensure everything is functioning correctly.
00:15:57.120 Now, let’s move to an example involving text processing, such as a new social network. Users will post updates, so you’ll need to index those for searchability, implement spam filtering, and analytics to track user interactions. When you receive those updates, you must decide whether to handle them synchronously during the request cycle or offload them to background processing.
00:16:30.030 If you opt for background processing, you could use tools to simultaneously handle database interactions and save data in Elasticsearch. Although synchronous processing can work for low-volume applications, scaling these operations can become challenging.
00:17:15.770 If your Rails app triggers background processing, it can pull data from the database, process it, and write back to your database or Elasticsearch, significantly speeding up your request times.
00:17:59.090 So how about serverless processing for these types of tasks? You could take a message queue approach, where the Rails app pushes a message to a queue that subsequently triggers functions.
00:18:42.500 For example, Google Cloud offers Pub/Sub that can manage this messaging. Your processing tasks such as spam filtering, indexing, or performing analytics can run simultaneously in the cloud. This decoupling through messaging can help systems scale effectively in case of high user demand.
00:19:21.960 Moreover, if you suddenly gain millions of users who generate significant traffic, serverless infrastructures can handle this demand automatically. You can focus on improving your core application without worrying about capacities.
00:20:04.300 Yet it’s essential to remain aware of the limitations and intricacies of your setup for proper management. If you are utilizing messaging queues, either for processing or integrating with external systems, the value is derived from reduced complexity.
00:20:42.330 Let's take a scenario involving uploading a video to a rails application, such as a new YouTube. Initially, the uploaded file could go into an object store like S3 or Google Cloud Storage. While this handles basic file upload management, transcoding the file into a suitable format afterwards relates well to background tasks.
00:21:29.110 Using background processing allows you to extract metadata, handle event-driven functions for triggering video processing workflows, and can effectively manage these tasks without hindering the user experience.
00:22:09.510 For instance, when a file upload completes to a cloud storage service, it can signal your functions to execute transcoding, notify users, or perform additional tasks, while still following best practices to avoid loops and duplicative processing.
00:22:59.160 Video transcoding can be computationally intensive, which justifies the use of serverless due to elastic capabilities. Ensuring you have access to sufficient resources is critical to maintain performance.
00:23:36.900 To clarify, serverless solutions enable you to leverage cloud-managed services, event architectures, and third-party integrations exposed to messaging infrastructures, helping you focus on your user-centric features.
00:24:14.170 As previously referenced through text processing and video transcoding examples, serverless technology can provide significant advantages like streamlined operations over traditional setups, all while scaling easily.
00:25:05.420 I want to remind you about important considerations. For instance, while the cloud providers allow automatic scaling, they do impose limits on simultaneous executions. This prevents one malicious user from overwhelming the system and incurring immense costs.
00:25:53.380 Engaging in capacity planning exercises will unveil aspects of operational design that developers may not consider regularly. Part of this initiative involves understanding the nuances and semantics of the systems and services in use.
00:26:40.290 Speaking of messaging systems, another aspect to keep in mind is potential duplicate messages. Understand that most cloud providers offer at least once delivery, and they do not guarantee order. So it might be beneficial to implement deduplication logic once you gain familiarity.
00:27:29.220 Being cautious will help prevent repeating the same messages or potentially unintended actions in your applications, ensuring functional correctness.
00:28:17.979 Moreover, with added complexity arises the need for monitoring. From logging to alerting and operational dashboards, all systems should be monitored closely to ensure everything operates as intended and address issues promptly.
00:29:00.739 As we wrap up, if you want to learn more about serverless technology, I recommend exploring the product landing pages of major providers like AWS, Azure, and Google Cloud Platform. They contain a wealth of information on their offerings.
00:29:50.379 On the left side, you'll discover the main products, namely AWS Lambda, Azure Functions, and Google Cloud Functions. On the right, several open-source projects provide similar functionality, including Apache OpenWhisk and Oracle’s FN.
00:30:40.650 Additionally, Fission.io serves as one of numerous frameworks that put Functions as a Service on top of Kubernetes.
00:31:38.309 Thank you all so much for attending! You can find me online; I’m Bleything everywhere. Feel free to reach out via GitHub, Twitter, or my website. I’ll be at the Google booth in the exhibition hall for the rest of the day, so please drop by to chat.
00:32:00.000 I especially want to hear from you if you are working with some of these technologies, and I’d love to know where I’m getting it wrong. Thanks again!