00:00:11.900
Hi everyone, thanks for sticking around this late. I'm not sure if you're actually staying to see me or if that's because this is the room where they're serving happy hour after my talk. Hopefully, it's for me! So thank you. My name is Brad Urani, I work at Procore.
00:00:22.920
We make software for the construction industry. Procore is a giant suite of tools for the industry, so many features that I don’t think one person knows them all. It’s one of the oldest, largest, and most mature Rails apps on earth, with about 2 million lines of code and around 90 to 100 people working on it every single day.
00:00:39.149
We deploy it four or five times a day and encounter a lot of errors. This isn’t because we’re not careful; it’s just due to the fact that we release a lot of really advanced business tools. We are constantly surprised by the ways our customers use them in interesting and creative ways that we can’t often predict.
00:00:59.219
As a result, we often find ourselves shocked by some of the exceptions that pop up in our logs when customers do things we weren’t expecting. We have to react quickly because we care about our customers. They pay a lot of money for our software and expect us to be right there answering their support questions.
00:01:22.020
A lot of the content of this talk comes from techniques I’ve learned, and our team has learned over the years, to make errors more useful, create architectures that facilitate troubleshooting potential problems, and simply make our lives easier.
00:01:40.229
I really enjoy this subject. I know I'm a bit strange, but I get excited about error handling. Maybe it’s because when I go to sleep at night, at least I know that if something goes wrong, I will be able to troubleshoot it easily and figure out what happened.
00:02:06.180
By the way, this is RailsConf, right? Everyone makes their presentations funny, putting in lots of memes. I’m not very good at picking out memes; technical content comes naturally to me, but not humor. I put some funny memes in here, but I didn’t like them, so I took them out and put them all at the end. Stick around, and you’ll find a random selection of completely irrelevant memes.
00:02:28.410
Let’s see... this talk starts off with fundamentals, going through some basic and simple things, but it gets advanced pretty quickly. So if you're here for more advanced architectures, stick around. It ramps up.
00:02:41.910
A word of caution: when you start talking about what you can do with errors and exceptions — I’ll explain the difference in a second — it’s really easy to overdo it. I’ll show you a lot of tools and techniques that could add complexity to your application. Ensure that this complexity is necessary and not for no reason.
00:03:00.570
If you're working with vanilla Rails, like MVC and a lot of CRUD operations, don't start using everything I’m showing you here. These architectures are best suited for reactive situations where you discover gaps in your knowledge or find that your error reports are coming through without the context you need.
00:03:24.450
I’ve seen this scenario play out with junior and senior engineers who get overly excited about the new power they’ve learned and create something elaborate that isn’t necessary. So, don’t overdo it.
00:03:42.720
By the way, most of these techniques apply to any language that supports exceptions. A lot of the best literature on this subject comes from the Java world, and it really originates all the way back to the C++ days.
00:03:56.160
To get started, here’s a simple example: in Ruby, raising an error is straightforward. You can simply pass a string, which is equivalent to raising a runtime error. In Ruby, errors are a hierarchy. At the top of the hierarchy, we have the 'Exception' class, and everything below it extends from 'Exception', including what we mostly deal with today: 'StandardError'.
00:04:15.660
I mentioned the difference between 'Exception' and 'Error.' The distinction is that 'Error' is a subclass of 'Exception'. For those who are visually inclined, you can think of it as creating our own tree with our own exception hierarchy. StandardError extends 'Exception', and many subclasses like ZeroDivisionError extend StandardError.
00:04:49.050
When rescuing, we can do so in a way that typically rescues just 'StandardError', which will not rescue other exceptions unless they are subclasses of StandardError.
00:05:05.400
Never do this: rescuing from 'Exception' at the root of that hierarchy can lead to capturing all sorts of unpredictable errors, like
00:05:23.820
If you want to see the consequences of running out of memory while still executing the program, that would be an example of 'NoMemoryError.' 'SignalException' occurs when you press Control-C.
00:05:45.330
Rescuing from 'StandardError' avoids these pitfalls, allowing you to manage what errors you want to capture.
00:06:01.520
You can make your own serious error, for example, 'NachosError,' by extending 'StandardError'. Inside the framework, and in popular gems, there is typically an exception hierarchy already created.
00:06:13.740
Let’s talk more about Rails. This is one area where I had a good meme, but I won't show it now. However, if you're thinking of having children, avoid it because you end up watching insufferable cartoons like Thomas the Tank Engine.
00:06:31.280
Yet, if you do have children, make sure they watch it because it's about trains on the Island of Sodor that cooperate. It’s a great parable for creating a well-functioning development team.
00:06:51.960
Here's a typical Rails controller, straight from the scaffold. We call 'user.save', which triggers validations. These can throw errors in very unexpected scenarios, like when you can’t connect to your database. This is a completely predictable error that can occur even without using 'save!'.
00:07:12.010
If you think you can just rescue everything with a catch-all, that’s an example of poor error handling. This can lead to situations where unexpected errors occur. Rule number one is: don’t rescue just because you can. Sometimes, the best option is to let the error show the generic 500 error page.
00:07:52.400
Consider who your audiences are: the users, developers, and computers. For users, we need to show proper navigational error messages, redirect them accordingly, and ensure they have a good experience.
00:08:12.150
For developers, we want descriptive error messages to troubleshoot effectively, including additional metadata relevant to the error.
00:08:29.350
For computers, we must use status codes properly for APIs to facilitate programming against them.
00:08:44.580
Our goals include ensuring user-facing error messages are helpful. We should control status codes, add contextual data, and send notifications properly, whether through email, SMS, or Slack, ensuring they reach the right team.
00:09:14.970
This is our method again. There are problems if we rescue without understanding the context. If we simply swallow errors, we end up with no evidence of something having gone wrong, which makes troubleshooting much more difficult.
00:09:40.430
Do not swallow errors. Always ensure that if something bad happens, there’s traceable evidence of it in your system.
00:10:05.050
Error reporting tools have greatly improved since the days of building solutions ourselves. They’ve become highly configurable and easy to use. For instance, we use Bug Snag at Procore, which has proved to be reliable.
00:10:30.600
They provide features such as graphs, charts, and excellent filtering capabilities. It’s easy to set up with minimal configuration. Most of them offer a free tier, so take advantage of that. However, understand the power-user features to unlock even more utility.
00:10:55.970
For your error classes, when you start your projects, I recommend creating a utility class with a method called 'handle'. This will allow you to connect your errors and initiate the error reporting solution.
00:11:26.090
Most systems report severity levels, which can be useful for filtering alerts. In the reporting systems, you can configure what alerts you want and their severity to avoid being overwhelmed by notifications. Customize severity levels to ensure you only get alerted about critical issues.
00:11:50.230
Also, having a way to add metadata for filtering will enhance your team's ability to respond to alerts effectively, letting you adjust notifications to avoid unnecessary noise.
00:12:16.890
Adding contextual data and flexibility in notifications can help your teams significantly by targeting alerts to the right audiences. It’s a very powerful feature for controlling how your team engages with reported errors.
00:12:38.110
Additionally, creating custom error classes can significantly enhance your application architecture's power, especially in scenarios with deep call stacks. For example, in an online shoe store, checking inventory may involve multiple layers of processing.
00:12:55.780
If there’s an issue when a customer tries to purchase an item that’s out of stock, we need to communicate that error up to the controller level from deep in the call stack.
00:13:12.740
Using exceptions appropriately allows us to raise errors at the end of the call stack, and then catch them at the top where we can affect change, like showing user messages.
00:13:41.100
This raises the question of control flow: do we want to affect the user experience and provide them useful feedback? By allowing these errors to bubble up, we can control how the application behaves when errors occur.
00:14:02.920
Finally, remember that if there’s ambiguity in user-facing messages or if they could expose sensitive information, avoid using exception messages directly. This is especially important in production environments.
00:14:24.960
For authorization errors, creating a robust hierarchy can be essential for managing permissions across your application. This allows you to recover gracefully from errors that may arise from all over the application.
00:14:41.680
For instance, instead of failing with a generic error, you can route users to specific guidelines related to their situation by raising contextual errors.
00:14:55.410
Creating a custom exception tree can help gather information that allows you to tailor your responses and improve workflows based on different error types.
00:15:23.030
By enhancing our error handling system, we've enabled a structured way to manage errors and also apply forwards and backtracking for resolution efforts.
00:15:48.220
The overall point of this is to provide meaningful interactions for your team and end-users by establishing a robust error handling strategy that scales.
00:16:12.780
Wrapping errors correctly allows us to add context and clarity. For instance, when working in API-driven architectures, we can derive better status codes and user-friendly messages.
00:16:34.000
To summarize what we've accomplished: we've looked at how enhancing user-facing messages improves user experience, we gained control over HTTP status codes, and we learned to gather and apply contextual metadata in handling errors.
00:16:54.630
We’ve changed how notifications are sent to ensure the right team gets the right message at the right time. By building effective reporting structures, we can target alerts more effectively.
00:17:16.430
All this can tie into logs that help with tracking errors, offering a comprehensive view from user impact to developer needs.
00:17:38.040
In project implementations, you might want to visualize complex interactions within your error-handling structure for clarity, as this can get complicated.
00:18:01.050
For further reading, I can share some great blog posts that delve deeper into error handling and reporting tools, providing additional capabilities.
00:18:21.840
I’m Brad Urani. Follow me on Twitter, I love to connect with professionals on LinkedIn, and I’m also on Mastodon for those on decentralized social networks.
00:18:37.220
I work in Santa Barbara at Procore, a growing company with an incredible team.”
00:19:01.260
I invite each of you to check out my colleague Derek’s talk tomorrow about building a powerful API structure.
00:19:19.060
Now, here are some unrelated memes as promised! They are completely irrelevant, but you've made it this far!
00:19:45.880
Thank you for your time, and I would love to take questions afterwards.