Knobs, Levers and Buttons: tools for operating your application at scale

In her presentation at RubyConf AU 2018, Amy Unger discusses strategies for enhancing application resilience through various operational tools, likening them to knobs and switches used by pilots in aircraft. The talk focuses on how developers can better manage their applications during stressful situations and prevent failures from escalating into critical issues.

Key Points Discussed:
- Application Resilience: The importance of being able to adjust application behavior during both minor failures and serious outages.
- Seven Tools for Resilience:
- Maintenance Mode: Implement a simple environment variable switch to activate a maintenance page during emergencies, ensuring users are informed without needing complex configurations.
- Read-Only Mode: Offer users access to non-modifiable information even if backend services are down, helping maintain user engagement.
- Feature Flags: Utilize global feature flags to control access to features based on user groups, which is particularly useful for multi-tenant applications.
- Rate Limiting: Protect against denial-of-service attacks and manage performance by limiting the number of requests your application accepts during peak times.
- Stopping Non-Essential Work: Prioritize essential tasks by pausing non-critical jobs when resource limits are hit, thus conserving resources for important functions.
- Deployment Flags: Manage deployments under uncertain conditions by using flags that allow for quick rollback if issues arise with new code.
- Circuit Breakers: Implement circuit breakers to stop sending requests to downstream services that are experiencing high error rates, thus preventing overload and maintaining user experience.

Unger emphasizes that successful implementation of these tools requires visibility and control over application status at all times. She encourages developers to test these strategies in various scenarios to ensure they work effectively during critical times. The talk concludes with the notion that having these adjustable controls allows for better adaptability and a more resilient application environment.

Unger invites the audience to think about incorporating these practices in their applications to enhance operational resilience and appreciates the support from Heroku's Sydney office.

In summary, the key takeaway is that by managing control mechanisms effectively, developers can significantly improve the resilience and reliability of their applications during unexpected failures.