Talks

How to Get to Zero Unhandled Exceptions in Production

In the talk, I’m going to explain how to categorize exceptions and their level of impact. Present use cases and code samples of common problems in a Rails application. How to make sure your background workers run without issues and how to debug exceptions.

RubyConf TH 2019

00:00:07.120 Hello everybody.
00:00:10.080 Thank you, Radoslav, for the introduction. You can find me on the internet at various places.
00:00:13.510 I come from a small country in Europe called Bulgaria. It looks bigger in this picture than it actually is.
00:00:21.200 Currently, I’m the Head of Engineering at a startup called RoadHunt. I usually like to include a lot of code in my slides.
00:00:26.500 All my slides are already available at this address on Speaker Deck. I mention this because I've noticed that during my talks, many people spend time taking photos of the slides.
00:00:44.300 So, if you find this talk interesting, you can check it out later. One of my core beliefs about technology is that context is king. You cannot do anything if you don't understand where something comes from.
00:01:07.250 To understand this better, let’s talk about production. Our production is a traditional Ruby on Rails application that has transitioned to a single-page application using React and GraphQL. Right now, we are beta testing a brand new application called YourStacks, which is built with a very similar stack.
00:01:25.969 Currently, our engineering team consists of seven people. Our production architecture has three tiers: an OGS app responsible for server-side rendering, a Rails API server, and another group of containers dealing with background jobs, which most of you probably know as Sidekiq.
00:02:02.780 When starting up a new application like the one we're developing now, your stacks can feel like driving a fancy new car where everything is fast and enjoyable.
00:02:15.050 Initially, you're very happy developing and everything looks great. However, as you try to fix some issues, exceptions start appearing. You may begin to view these exceptions as cute little bunnies, considering them harmless, but over time, it becomes clear that something is seriously wrong.
00:02:37.000 The problem arises when you allow the situation to linger for too long. A few years ago, we introduced an initiative called Happy Friday, a day when engineers could focus on fixing bugs due to the rapid pace of product iterations.
00:03:17.590 Happy Fridays allowed us to tackle technical debt and other issues, which involved taking two hours every Friday to fix exceptions. This process provided us a lot of flexibility, and even now, when we look at exceptions, we see that most of them are resolved very quickly.
00:04:19.750 Now, I can open up a Friday and hardly see anything because most of the exceptions have been addressed. This leads me to my first tip: maintain a report around exceptions. This report should be specific to your organization and relevant to your projects.
00:05:23.460 It’s essential to have a structured approach to treat exceptions like regular work. The rest of this talk will provide actionable tips to help you handle exceptions more effectively.
00:06:10.970 There’s a great resource, a book by Avdi Grimm called Exceptional Ruby. It’s the best book I’ve read on exception handling and provides valuable insights into how the exception system works in Ruby.
00:06:41.990 In typical code, when using exceptions, you might use 'rescue' without realizing it could potentially silence important errors. The goal of an exception tracker is to provide you with clear information about your application’s errors.
00:07:04.120 When dealing with exceptions, keep the error messages informative. If you’re rescuing; from a specific error, ensure to comment on why that exception occurred, especially if it’s not obvious, such as file errors or network failures.
00:08:10.360 Avoid adding specific user names in the notes, as this information can become outdated or hard to trace. Instead, document the context and reasoning in your comments.
00:08:45.600 To improve your system's resilience to exceptions, be explicit about the exceptions you are handling. Monitoring is also crucial. If you lack monitoring, it’s difficult to understand what’s going on inside your application.
00:09:02.020 Using monitoring tools such as Sentry can help you track exceptions effectively. Separate projects for different server types can also simplify exception handling, reducing noise from blending exceptions from various sources.
00:09:37.910 My systematic approach helps reduce this noise by filtering out non-actionable errors, making it easier to focus on significant exceptions that genuinely need attention.
00:10:58.120 For instance, we learned that a common exception, 'invalid byte sequence in UTF-8', could be resolved by changing the encoding of input, which we track systematically.
00:11:43.020 My third tip is to reduce the noise by filtering exceptions to only show issues you can act on. This way, when you confront a legitimate issue like 'undefined method status for nil', you're more likely to tackle it promptly.
00:12:49.000 If you hide a problem instead of addressing it, it can lead to greater issues down the line. So, ensure you fix the root cause rather than masking exceptions.
00:14:24.860 By investigating the underlying problems, you can improve the integrity and performance of your system, learning valuable lessons along the way.
00:15:14.870 Implementing guards against common issues, like ensuring an account has a subscription, helps prevent exceptions stemming from logical flaws in your code, ultimately enhancing your application’s reliability.
00:16:05.450 Adding strategic logging to your exception tracker provides a clear record of issues encountered, enabling quicker troubleshooting. It’s important to keep monitoring tools updated with new context as you refine your application.
00:17:02.270 Networking exceptions can be particularly tricky, especially with all the different libraries and dependencies in your application. Keeping track of these can streamline your debugging process. Again, create a module to handle network-related exceptions to simplify error management.
00:19:04.200 When you start using these tools, especially in high-demand environments like with APIs, you'll tend to see your exceptions becoming more manageable over time. Using a 'retry on' feature in Rails 6 can also automate handling network-related exceptions, so you don't need to worry about them constantly.
00:21:25.500 Lastly, it’s essential to foster a culture around managing exceptions. Ensure your teams share knowledge on common exception types and standardized handling mechanisms. This approach can greatly enhance your applications' stability.
00:25:04.870 Thank you all for listening to my talk! I hope my tips for managing exceptions in production have been helpful.