Screaming Zombies and Other Tales: Race Condition Woes

The video titled "Screaming Zombies and Other Tales: Race Condition Woes" presented by Joshua Larson at RubyConf 2020 explores the concept of race conditions in software development. A race condition occurs when multiple processes interact with shared data and depend on the timing of operations. The session begins with a whimsical question, "What is the sound of a zombie screaming?" and ties it to the underlying theme of race conditions. Key points discussed throughout the talk include:

Understanding Race Conditions: The session explains that operations in programming, such as incrementing a variable, may not be atomic. This lack of atomicity can lead to unexpected behaviors when multiple processes attempt to read and write shared data simultaneously, resulting in race conditions.
Consequences of Race Conditions: Larson highlights how race conditions can manifest as bugs, causing issues in systems, particularly in concurrent environments.
Examples of Race Condition Stories: Four specific stories are shared to illustrate the impact of race conditions:
- Smurfy's Law: A scenario where simultaneous data reads caused test failures due to shared test buckets.
- Out of the Void: A problem in transaction processing where voided transactions incorrectly settled due to timing discrepancies.
- Ghost in the Machine: A situation with IoT devices and stale sensor data, revealing the server's inability to handle messages arriving out of order.
- Screaming Zombies: This story details how a metrics reporting tool at Braintree produced incorrect data outputs due to improperly managed reporting processes.
Solutions for Managing Race Conditions: The talk emphasizes methods to mitigate race conditions, such as separating data instances, managing operation sequences, and removing redundant processes.

Joshua concludes the presentation with a reminder of the complexity of race conditions in modern software architectures and the need for a robust approach in system design, while humorously reaffirming that the scream of a zombie is ultimately silence.

Screaming Zombies and Other Tales: Race Condition Woes
Joshua Larson • December 18, 2020 • Online

What is the sound of a zombie screaming?

Race conditions are a problem that crop up everywhere. This talk will go over what a race condition is, and what it takes for a system to be vulnerable to them. Then we’ll walk through four stories of race conditions in production, including one that we named the “Screaming Zombies” bug.

You’ll leave this talk with a greater appreciation for how to build and analyze concurrent systems, and several fun stories for how things can go amusingly wrong.

And if you were wondering about the question at the top, the answer is: Silence

Josh Larson
Josh is a full-time programmer, part-time human, whose interests include weird programming, physics, math, and trying to make software reliably be better. When he’s not writing code or equations, he’s probably biking somewhere or watching something on HBO.

RubyConf 2020