The Doctor Is In: Using checkups to find bugs in production

In this video, Ryan Laughlin presents at RailsConf 2018 on the necessity and implementation of checkups to detect bugs in production environments. He emphasizes that while development is often focused on writing tests to catch bugs before deployment, it is crucial to also consider how to detect unforeseen issues after code is live.

Key points discussed include:
- Recognition of Bugs in Production: Every engineer makes mistakes, and it is vital to acknowledge that bugs can and often will occur in production environments.
- Limitations of Testing: Although testing is fundamental to catch bugs in the development phase, it cannot identify every potential issue, particularly edge cases that were not anticipated during the testing phase.
- The Unique Nature of Production Environments: Production setups often differ greatly from testing setups, leading to unique bugs that may not emerge until users interact with the live application.
- Monitoring Tools: He discusses existing tools like exception reporting but highlights their limitations, including the noise generated by false alarms and their inability to catch non-exceptional bugs.
- Introduction to Checkups: Checkups are proposed as a proactive approach, akin to tests for production. They allow developers to set expectations for app behavior in a live environment and periodically check these assertions. In the case of a failure, alerts are generated to prompt investigation and resolution.
- Real-World Example: Ryan describes an implementation of checkups regarding user email addresses in an app he developed, revealing how a race condition resulted in users having zero email addresses. By leveraging checkups, they could quickly identify and rectify the oversight that standard testing missed.

- Best Practices for Implementation: Checkups can be integrated through various methods, including rake tasks or Active Record callbacks, to ensure continuous monitoring.
- Insights & Outcomes: Checkups not only help detect bugs but also mitigate issues in real-time, thereby preserving user trust and application reliability.

Conclusions & Takeaways:
- The speaker encourages embracing the concept of checkups as a vital aspect of production health, akin to how tests are considered essential for development. By incorporating checkups, developers can transform silent problems into visible issues during the crucial operational phase of an application, ultimately improving system integrity and user experience.

Availability of slides and further engagement is invited via social media, encouraging discussions on enhancing production monitoring practices among developers.

The Doctor Is In: Using checkups to find bugs in production
Ryan Laughlin • April 17, 2018 • Pittsburgh, PA

RailsConf 2018: The Doctor Is In: Using checkups to find bugs in production by Ryan Laughlin

A good test suite can help you catch errors in development, but how do you know if your code starts misbehaving in production?

In this talk, we’ll learn about checkups: a powerful and flexible way to ensure that your code continues to work as intended after you deploy. Through real-world examples, we’ll see how adding a simple suite of checkups to your app can help you detect unforeseen issues, including tricky problems like race conditions and data corruption. We’ll even look at how checkups can mitigate much larger disasters in real-time. Come give your app’s health a boost!

RailsConf 2018