Knobs, buttons & switches: Operating your application at scale

The video, titled "Knobs, buttons & switches: Operating your application at scale," presented by Amy Unger at RailsConf 2018, addresses the strategies to manage application resilience in the face of failures. Aimed at backend engineers, the talk underscores the importance of implementing controls that can help applications respond gracefully to various types of failures, particularly during times of stress.

Key Points Discussed:

- Introduction and Motivation: Amy begins with a metaphor comparing the control that pilots and Captain Kirk have over their environments to the control developers should maintain over their applications. This sets the stage for discussing failure management strategies.

- Scope of Discussion: The focus is on daily failures rather than catastrophic ones, implying that developers should prepare for typical issues their applications may face instead of solely planning for disasters.

- Tools for Control:

- Maintenance Mode: Establishing a clear maintenance mode that can be easily activated to convey important information to users.

- Read-Only Mode: Allowing users to access information without making changes when modifying the application is risky.

- Feature Flags: Using feature flags not just for new features but also for selective enabling of app components during issues.

- Rate Limiting: Implementing controls to mitigate excessive or malicious traffic, ensuring that important requests are prioritized.

- Stopping Non-Critical Work: The ability to halt low-priority tasks during critical application load to free up resources.

- Known Unknowns: Using flags to manage new features or code changes that might impact performance unexpectedly.

- Circuit Breakers: Mechanisms to avoid overwhelming dependent services by intelligently limiting calls based on success rates or response times.

Implementation Considerations: Amy emphasizes the importance of choosing appropriate methods to store switch states (e.g., environment variables or databases) and the need for clear visibility over these settings during incidents.
Caveats for Developers: The necessity of testing the effectiveness of switches and maintaining clear documentation of their status is highlighted.
Conclusion: The main takeaway from the talk is the advocacy for proactively preparing applications to handle failure scenarios through various controls and features that offer developers flexibility and control over application behavior during crises.
- The goal is to ensure that when issues arise, developers have the means to mitigate impact proactively, rather than being caught off guard.

Knobs, buttons & switches: Operating your application at scale
Amy Unger • April 17, 2018 • Pittsburgh, PA

RailsConf 2018:Knobs, buttons & switches: Operating your application at scale by Amy Unger

Pilots have the flight deck, Captain Kirk had his bridge, but what do you have for managing failure in your application?

Every app comes under stress, whether it's from downstream failures to unmaintainable high load to a spike in intensive requests. We'll cover code patterns you can use to change the behavior of your application on the fly to gracefully fail.

You’ll walk away from this talk with tools you can have on hand to ensure you remain in control even when your application is under stress.

RailsConf 2018