DevOps

Summarized using AI

Workshop: Run your first game day

Thai Wood • November 08, 2022 • Denver, CO

In the workshop titled "Run Your First Game Day" presented by Thai Wood at RubyConf 2021, attendees learn about the significance and mechanics of conducting tabletop exercises to improve incident response capabilities without waiting for actual incidents. The workshop covers the structure and setup of game days, with the aim of enhancing communication, collaboration, and comfort among team members during crisis situations. Thai shares insights from his background as an EMT, emphasizing the need for practice scenarios akin to how emergency responders train for real-life incidents.

Key Points Discussed:

- Definition of Game Days: Game days are simulations designed to practice incident response through tabletop exercises, where participants discuss and navigate theoretical scenarios collectively.

- Benefits:

- Develops comfort with processes and communication during incidents.

- Improves team collaboration and exposes assumptions that may exist within teams.

- Facilitates learning how to interact in high-pressure environments without the stress of real incidents.

- Setup Instructions:

- Participants should collaborate on creating scenarios based on past incidents or hypothetical situations.

- A simple formula is provided for developing scenarios: choose a trigger, a complication, and a resolution goal.

- Facilitator's Role: Emphasizes improvisation and being prepared to guide participants through scenarios without needing specialized training. The facilitator's familiarity with the process is more important than the intricacies of the scenario itself.

- Post-Game Day: Gathering feedback is crucial for continuous improvement. The facilitator should encourage discussions to evaluate comfort levels and identify areas for future focus.

- Real-World Applications: The workshop encourages using creative and absurd scenarios, such as being paged for an alien incident, to alleviate the pressure of perfection and stimulate engagement among participants.

In conclusion, the workshop serves to empower attendees to implement game days effectively within their teams, fostering an environment of continuous learning and process enhancement in incident response preparedness.

Workshop: Run your first game day
Thai Wood • November 08, 2022 • Denver, CO

Want to get better and incident response without waiting for actual incidents? Learn how to use table top exercises to practice your incident response framework, develop common ground, and improve communication during an incident.

You'll learn how to run table top game days, including how to set them up, how to design scenarios and how to encourage participation, all with techniques supported by real word experience and supported by research.

RubyConf 2021

00:00:10.880 I don't know about all of you, but I'm slowly remembering how this in-person stuff works.
00:00:11.040 It's harder than I remember. Thanks to our AV crew for responding well to this morning's incident, as it were.
00:00:24.000 It's somewhat related to what we're going to talk about today.
00:00:31.119 I'll be telling you a bit about how you can run your first game day, and my hope and goal here is that by the time you go home, you will be confident in doing at least your first one, and hopefully many more.
00:00:39.360 So to start, why am I talking about this? That question sort of works both ways. I'm talking about this because I've applied it and I've seen it work in a lot of different areas. There's research that says it works really well. A lot of smart people say so, so I believe them.
00:01:01.280 We'll be focusing on how to do this to improve your incident response, but if you have some other similar type process that you want to apply this to, there's no reason you can't do that.
00:01:05.680 Why I am talking about this is because I first was introduced to this world of game days and simulations through my previous career as an EMT. I worked for a 911 service for a time and was trained in search and rescue, and this was a very common way of learning. You might get a lecture, read a book, or something else, and eventually, you would have to apply it in the field.
00:01:30.799 But there was always a chance to have this in-between step, where you had the chance to kind of play with these ideas and work them out. When I came to tech, I kept kind of waiting for that in-between moment, the difference between someone sort of throwing documentation at me and me actually having to carry a pager and do the thing.
00:01:56.000 At least in my career, that moment never really came. I wanted there to be something between "Hey, read this" and "Hey, now you're doing it potentially at 3 AM!" I kept waiting, thinking surely it was going to come.
00:02:03.520 Surely no one would expect me to be responsible for potentially millions of dollars of infrastructure sales without giving me some ground to practice, right? That would be crazy! We would never do that... Oops, I was wrong.
00:02:25.000 So I wanted to bring that world that I had experienced into tech. I kept hearing folks talk about game days. I'd talk to folks at conferences, like this one or DevOps days, and I'd ask, 'Are you doing game days?' They would reply, 'Oh yeah, I heard about game days!' or sometimes people would call it 'Ops D&D.' I don't play D&D, so that was lost on me a little bit, but I thought, 'Yeah, you know, that's neat. Do you do it?' Everyone said, 'No, but I want to.'
00:02:51.360 So I'm curious, does anyone here actually do game days today? One? Cool, two? Okay, so not zero! I'm half about that. I'm hoping by the end that if I were to ask this again next year, we would get the majority of the room here.
00:03:10.000 So here's what we're going to go over today. I will say in advance, these slides are text only and designed to hopefully be easy to read. There is no motion, music, or video that you'll need to follow along with.
00:03:21.840 I'll cover more than what's on the slide, so if you are reading a transcription of this or listening to the audio only, you'll still be able to follow along. I'm going to go over what a game day is, some of the things it's made of, how to actually put that together and turn that into an event, how to make it actually stick.
00:03:42.560 So even though I say you'll be able to run your first one, I also hope it won't be your last. And finally, at the end, we'll do some Q&A, and I would like all of you to start thinking now about what questions you have that, if answered, would help you feel comfortable starting when you get home. I'll hopefully answer most of those as we go, but I'm going to leave lots of time at the end to make sure that I have the opportunity to address all of your concerns.
00:04:10.479 So first, what is a game day? To me, a game day is time you take outside of your work, probably coding or what have you, to practice using some sort of simulation, usually a tabletop exercise. When we're talking about simulations, we're not trying to clone anything. Don't worry, it's just talking through a scenario.
00:04:33.680 Now, you might have heard a lot about tabletop exercises from some sources, like the U.S. federal government, which is a fan, and the World Health Organization. Different organizations do this a lot. DHS says a tabletop exercise is a discussion-based exercise in response to a scenario intended to generate a dialogue about various issues to facilitate a conceptual discussion.
00:04:55.560 So, really, what is this? A tabletop exercise is a lower-speed, lower-tempo chance to work with ideas, to work with people, and to work with procedures. You do so by talking through what you want to do, what you're thinking about doing, what you feel like you should do, and asking questions along the way.
00:05:10.679 I will say, not being a tabletop RPG person myself, I am a little hesitant to make the analogy to actual tabletop games. The difference here is that no one is going to be trying to trick you or surprise you in some way. If you're participating in one that someone else is hosting, hopefully you are not required to be especially creative with your actions.
00:05:47.840 I want to go over some of the things I said about what this does for you, and one of the things it does is help you be more comfortable with a process. I don't know how it works for you folks, but you ever read something and you get to the end, and you're like, 'Yeah, okay, I got it,' and then the next day or even later that day as you go to actually do it, you kind of stop and think, 'What was I supposed to do?' It felt so easy when you were reading it, right?
00:06:04.609 Because someone wrote it in a cohesive set of steps that sounds right and feels comfortable. We have this inclination to fill it in and say, 'Yeah, okay, I'm kind of nodding along,' but it's a whole different matter when we actually have to do it. That is the real point of these tabletop exercises: to build comfort with it and also experience in working and communicating with others.
00:06:40.000 I'm sorry, folks, if you don't like talking to people. This is probably not for you. The communication part is one of the biggest benefits of a tabletop exercise. This might sound strange, but I've noticed that in technology, at least in my career, we say things like, 'Maybe your incident response framework documents, if you have them, will say, 'Tell the channel that you are the incident commander.' Ask someone about something.' That sounds fine until you have to do it.
00:07:01.160 What do you say? How do you ask it? Are you ordering them? Are you asking them? Do you say please? How do you do this? It can be very complicated. If you don't believe me, I can tell you that in other domains, this is heavily practiced. When you go to school to become an EMT, you will go through up to two years of school or training. The federal government sends you a nice little patch that you will never do anything with because no one sells patches onto their uniforms.
00:07:34.800 But even then, when the federal government has said, 'Great, you are official,' when you go to get your first job, your first few weeks will actually be with a field training officer. So even then, you’ll still have someone training and working with you. You might think, 'Well, that's because it's medicine; it's very serious.' But actually, one of the things a good field training officer does is she has a card in her pocket, and the very first time you are expected to talk on the radio, to call ahead and let the hospital know what you're doing, she will pull it out and hand it to you because she knows people get nervous.
00:08:06.640 You've been through two years of school and clinical hours, and all this, but people still get nervous in that moment. They don't know what to say. They understand that, yes, they have to communicate this information, but how? What's the etiquette? What's the protocol? Even then, we practice that communication, and that is a level of detail and practice that, after you've done it, you might be very comfortable forever after giving that report over the radio.
00:08:37.200 But without that practice, how long might it take for you to develop that comfort or even know that you were doing it right? It's also a chance to learn about how other people work. This is especially good if you're doing it with your incident response team, group of SREs, or if you're all on call for your application. This gives you a chance to develop some intuition and the ability to predict what someone's going to do.
00:09:01.440 You might be in a tabletop, and you might go through something and say, 'Jane has this great idea.' You would have never thought of that approach; your reading of the document was completely different. Now all of a sudden, you have an idea about how that person will work in the moment, and that can be useful later.
00:09:27.440 This also works kind of two ways. This isn't just you conforming to a process; it lets you surface assumptions. Assumptions that you might be making about your coworkers, about how the process works, or the tools available, and it also lets you identify issues in that process. The time where we're reading and nodding along is not always great at detecting things that don't make sense.
00:09:53.440 When we're reading it, we're not in the same mindset as when we are actually doing it. This is also a chance to affect the process, not just have the process affect you. It should work both ways, and there's no real right or wrong; it's for you to discover a fit somewhere in between.
00:10:14.560 So I want to touch on two things: What do tabletop exercises not do? They are not about the scenario. Sorry, really creative DMs in here. Those skills may be useful later, but for your first several or hundreds of game days, you probably won't need them. It can be useful to design a very specific scenario.
00:10:38.399 But reality is messy. The point of the game day is the friends you made along the way! Sorry, I couldn't help myself. It's working through it. The odds that you will make a scenario that exactly reflects the reality you will face on call are just astronomical. It is so unlikely that I suggest you don't try.
00:11:04.280 Trying to create a scenario that perfectly mirrors reality can give folks the wrong idea. The point of tabletops is not to say, 'Okay, we've trained for every scenario, now we're ready.' No, we're saying we can't ever know what's going to happen. I'm sure every one of us in this room has a weird story about something that broke, and in the moment, you said things like, 'Wait, how did this ever work? It should never have worked!'
00:11:25.000 It's so difficult for us to predict those, and we should be focusing on the parts that we can, like how we communicate, how we work together, and how we decide how this process should function. Those are the things we can prepare for in advance; it's not the scenario that matters.
00:11:49.280 Which is funny because I've actually found that it's the scenario that's the number one thing that keeps people from being able to run their first tabletop. When I've talked to folks and interviewed them about this, they say, 'I don't really know what we would do.' And I tell you, it's not about the scenario. Don't worry!
00:12:09.760 This is an overview of just how to get started. I know this is very like draw the owl when you read this, but we'll go through each one. In large part, you'll make up a scenario. You'll get some folks together in a room, virtual, some combination. I suggest that whichever you pick, I'm not going to say one is better than the other, but I suggest that it reflect how you intend to work in the future.
00:12:48.000 If you think it's likely that you're going to be virtual, or you know for a fact that you are, do it that way. If you happen to have an offsite or something where you will be seeing people in person, this might be useful to do in person. But I'm going to suggest that if you have to choose one, you choose the way that you actually intend to work.
00:13:09.520 You'll talk through the scenario, and then you'll ask how it went. That's about it. But as I said, this is the point where folks tend to get hung up, which is developing these scenarios. So I'm going to give you a few pieces of advice. For one, you can just look through the incidents you've already had.
00:13:32.560 If you write these things down, if you have some sort of process, if you don't, maybe this is a good reason to get one. You can think of one that you've participated in; it doesn't have to be in your current environment. It doesn't even have to be one you participated in directly. Maybe you read a really cool one.
00:13:47.840 The scenario is essentially a story you're going to work through, so you can take pieces that are useful and discard pieces that aren't. You can smash two, three, or four pieces together; that's fine. But if you really, really can't think of anything, I'm going to remind you that the scenarios are not that important, especially for your first one, two, or ten game days.
00:14:08.000 So this is the formula that I will give you: pick a trigger, pick something that complicates it slightly, and again, we're not trying to trick people here. There are no gotchas here. Then pick a goal and resolution. The main point is to work with people and process, not against the scenario.
00:14:31.680 I'm going to ask each of you to take a few minutes to make your own scenario. Again, if you don't have any ideas, I've put the steps up here. We're only going to take about two minutes here. Whether you suggest you write it down, paper, pen, type—whatever works best for you! Pick a trigger. This is typically a page of some sort. I always start mine with, 'You are awoken by pager duty. The alert says something...'
00:15:05.080 It can be that the database appears to be down, the disk appears to be full. I think that for the folks here, I don't think we're short on the number of alerts we've seen. Next, you'll pick a complication, and again, a complication is just a chance to think about how to extend this scenario in a way that wasn't just me sitting by myself solving it, because chances are that's not the way you're going to work.
00:15:34.279 We want to add a complication that makes it something else. It can be as simple as in the scenario I happen to not be a database expert and it was a database issue. There's the complication; it can be that the disk was full, but I don't actually have access to that machine or the participant to access that machine; whoever is doing this scenario.
00:16:01.680 Then, pick a goal and a resolution. Let's say you're going through this with someone across the table or across the Zoom. How do you know when they're done? They're not going to know because you have a scenario.
00:16:25.280 You’ll want to pick in advance when it's done. Maybe they restarted the database, or they had to edit the MySQL config file for whatever reason and restart the service. You know that when they've done that, it's done. Or you can time-box it too; say they reach a certain condition.
00:16:50.240 I'll give you folks just a couple of minutes to develop your own scenario, and then I'll show you how, as simple as that may seem, it's enough to get you started.
00:17:07.760 And then I'll ask if anyone else has a scenario they're especially proud of, if anyone wants to share theirs.
00:17:15.120 The reason I say here that this scenario doesn't matter, and this might feel awkward at first, is because just as you will become and the people you run this game day with will become more familiar with the process, so will you. Running your first game day is key to being able to run the second and to gain the experience to make this feel more comfortable.
00:17:44.480 I've worked with a lot of different folks and teams where this can feel awkward for some people, like it seems too simple. And we know the reality is messier, but this is the way to start to interact with that and to have your first game day. And then after you run your first tabletop exercise, or the second, you'll also slowly get more familiar with this. You'll feel more comfortable writing the next scenario, or you'll find yourself seeing something and thinking, 'Oh, that's good. I'm going to use that.'
00:18:05.040 So does anyone have a scenario or piece of a scenario that they want to share? It can just be a trigger you like or a complication or something like that.
00:18:26.760 We'd like to hear some complications, such as if a deployment failed to build and then also failed to roll back. So that was fun. I like it. I think I saw a hand over here.
00:18:34.560 Keys got committed and pushed to a repository.
00:18:43.360 That's exactly right, and I can tell you folks have no shortage of these. Each one of these and any combination of the trigger and complications will give you plenty of material. Any change you make in these, you can run again. This isn't something that you need to reinvent each time.
00:19:03.920 So I want to talk a bit about what you'll actually do once you sit down to do your game day itself. In my experience, the best way to start is to have it last only a couple of hours at most. You don't really need your computers unless you're doing it virtually.
00:19:21.440 I actually suggest that unless you're using them to follow along, pull up documentation, take notes, you don't need technology here at all. You really just want to get people together in a room, real or otherwise.
00:19:39.520 So put some time on the calendar. I want to talk about, I call it positions here, just so we can talk about roles separately. Usually, we say roles. We think of maybe incident commander versus subject matter expert versus all these different things. For tabletop exercises, there are only two big divisions: the facilitator, which you folks will be ready to be, and the participants. Anyone else who is doing it with you.
00:19:54.080 That's the minimum I want to point out that I call them 'participants,' not 'learners,' because facilitators, you'll be learning too. Nor are they 'trainees,' because facilitators, you're gaining experience as well—you're gaining experience with how the participants are thinking, where they struggle, and what surprises them—all while gaining experience with running the exercise itself.
00:20:27.360 As the facilitator, you know the scenario, and you are in a position to answer questions. This is typically how participants will interact with you. In some of the scenarios we heard, a facilitator told me that there seems to be a DNS issue. I might ask, 'Okay, try to ping the DNS server?' Because it's a series of easily memorable repeating numbers for a reason. 'Does it work?' I don't actually have to type it—I just say, 'Does it work?'
00:20:55.760 And as a facilitator, it's up to you to improvise if you need to. Some of the questions will feel easy, and some of them will not, and that's fine. Just make something up. 'No, it doesn't work.' Okay, I'm going to do something else. You can also fill in for other roles if needed; just if you happen to be in a small group and you're practicing your incident response, and you say, 'Well, I need someone to be communications liaison,' depending on how your framework works.
00:21:29.760 If you're short on people, you can say, 'Okay, I'll be your communications person. Tell me what you need.' Then you're still back in that role of facilitator, where you are still just gathering information from the participant, and it's up to you to make decisions. If there's a question conflict, it's up to you. There is no real wrong answer here; I encourage you to improvise.
00:21:50.560 This is the other thing other than scenarios I find people getting hung up on: 'Do you need special training to be a facilitator?' This is a complicated enough subject that I made a whole slide for it: No, you don't. Again, this is a chance to work with your peers, work with process. No one is requiring you to be perfect at this. No one is requiring you to get every technical detail of the scenario right. That's not what this is about.
00:22:26.799 You don't need special training. Obviously, the more you've done this, the more comfortable you'll be with it, but that doesn't need to stop you from doing it. There's no class you need to take, no certification you need; this isn't Agile! It's fine; it's all right.
00:22:45.840 So I want you to put yourself in the role of a participant, and I've briefly put up here the things that you're going to do. You're going to talk through your action, you're going to ask questions to get information, and you'll fill a role in the scenario—usually just one. It's up to you if you want to do more, but I find it makes it easier for participants to say they are essentially only playing one character.
00:23:09.440 It's hard to play both. It works for you as a facilitator because you know to a degree how this will unfold. I want to give you some examples of talking through your actions. I've kind of visited a high level in the beginning. I mentioned that sometimes people don't know what to say, and this can be a little uncomfortable for some folks or feel a little strange.
00:23:36.160 I'm going to suggest that you literally narrate your actions—what you would do—and then let the facilitator respond to that. You might say, 'I'm going to check to see if the DNS server is up.' Facilitating myself, I respond, 'Great! How do you do that?'
00:24:01.040 Say, 'I'm going to ping it.' Okay, it doesn't respond. 'Okay, I am going to go look at a graph to see if it has higher error rates.' Great! How do you do that? 'I'm going to search for it, click the link, use my eyes. Great! The dashboard shows no errors. Cool.' That's the level you will probably want to discuss at, where your words marry your actions.
00:24:29.120 You don't want to just say, 'I'll find out if it's working.' As a facilitator, if you hear that—it's perfectly normal. It's easy for us, as experts in our field, to sort of think, 'I know in the moment,' and maybe we will, but again, part of this process is surfacing where we are maybe making assumptions and where we're running up against friction with the process.
00:24:52.560 I said that even though a participant fills a single role, another way to do this is to have multiple participants sort of act as one responder. This can work well if you're onboarding a group for the first time to say, 'Being on call, you can have them discuss amongst themselves and then tell me decide what you're going to do.' That lets them collectively be in the role of the responder without having to run, say, four separate sessions with however many people that you're onboarding.
00:25:38.560 And it also gives folks, if this is their first brush with this, a chance to be more comfortable as participants. Talking through saying, 'If they didn't say it, it didn't happen,' this can be hard. As a facilitator, try to watch out for it. I make mistakes with this sometimes. Ask for details. How might you do that?
00:25:59.680 This isn't nitpicking. When you ask folks, 'How might you do that?' that's not our goal either. We're not asking people to justify themselves to prove that they know it. Participants should feel free to ask questions as well.
00:26:10.960 This is probably a bit more verbose than you might already be if you do similar exercises or are on call. I find that that level of communication works very well. Later this afternoon, we'll hear from some folks who are going to talk about distributed cognition, both across people and across our machines. When we understand that idea, I strongly recommend you check that one out.
00:26:52.080 It makes sense that we must have some sort of signal to the people about what we're thinking, what we're doing. 'How might you do that, and tell me more about that?' are good questions to ask as facilitators if you don't develop any more questions than those two; you'll be fine.
00:27:14.720 So I also don't want you folks to worry too much about the research portion that I mentioned or get caught up with taking notes, so I just want to remind you that I'll upload the resources afterwards and put a link in Discord as well.
00:27:39.760 I can also put some examples of scenarios I've made. The other thing I'm going to suggest as a facilitator is to be comfortable with silence. This can take practice, especially with new participants.
00:27:56.480 Normally, participants, you know, I've done this before. If this is your first game day, it's likely that it's theirs too. When you rush ahead and don't give them a chance to respond, or they don't feel like they've had that chance, you may feel like you're waiting an awful long time on that Zoom, staring at those windows.
00:28:24.160 But I'm going to suggest you hold out a bit longer; even double it if you can. Most of the time, folks will come up with something. They're either just working it through. If you're doing it in a group, I find that folks wonder if you're talking to them even though there are only four of you.
00:28:52.160 So finally, I want to talk about what you do afterwards, and then we'll talk more about your questions. Afterwards, I want to get feedback. I don't have a lot of recommendations as to how. Unsurprisingly, you folks know your teams much better than I do.
00:29:23.840 I don't know if they're bombarded by survey surveys all the time. If the next SurveyMonkey you send to someone makes them flip a table, I don't know. If that's your team, don't send a survey. If surveys are rare and you think people might engage with them, send a survey.
00:29:39.640 You may notice a theme here: It's okay to not be perfect at this. You will get more comfortable; you will learn more as you do it! One thing I learned in being an EMT is that it's okay to not always know what you're doing, but that also doesn't keep you from proceeding and doing the best you can.
00:30:10.720 There are a number of sayings in it, but one of them is: 'Never say oops on scene.' You know, you can look cool and collected, you may or may not be—that's fine. You know, proceeding and doing this has value.
00:30:29.920 It doesn't have to be perfect to be useful. You do not have to be perfect in order to be useful to the team. Participants don't have to be perfect at this in order to be useful to them or to you. No one has to be perfect at this.
00:30:48.080 So your feedback might be formal; it might be informal. A five-minute discussion at the end—that's feedback. Whatever way you think you're going to get information about how folks experience things is what I'm recommending.
00:31:09.440 And the reason for that is one of the reasons we want feedback is so that we can keep this going, as I said, both participants and facilitators are going to become more comfortable the more they do this.
00:31:30.480 I think we all know what quarterly or annual trainings feel like; that's where we wail on the next button, right? Or we just guess at the quiz answers because that seems better than reading the slides. Okay, officially, we don't do that—for our managers and wellness compliance team watching at home—we do not do that, we absolutely do not!
00:31:51.680 We don't want this to be that, so I'm going to say that the more often, the better. Is that daily? Probably not! Is it weekly? Maybe it is. I spoke with a company that does logging in a non-traditional sense. They remove trees in very dangerous situations, usually near power lines during and after winter storms.
00:32:13.200 They found that they were able to carve out a day they've now called it 'drill day.' They call it 'game days' because they're big, serious people with chainsaws, I suppose, and they don't play games. I don't know. They called it 'drill day,' and that worked for them. That was a cadence they could all bring their scenarios, their experiences, and their things to the group. It was their chance to practice.
00:32:40.640 That's the key: Making room for this. One of the ways to do this is to ask folks on the stage to just do the best practice thing, right? Ignore the reality. Because no one in this room is busy, of course, so you should just totally do the best practice thing, right? But that's partly why we're getting the feedback: The learnings that you get for a variety of reasons should feed back into the process.
00:33:01.400 For one, if you're not listening to the humans who are in it and you're not improving it, what's the point? And for two, I find that people at a certain level and above the org chart find it an easier sell. Why do you even need this specialized time? Okay, you get it once, maybe, but look at how we've improved the process.
00:33:30.880 I think we all have had our fill of business continuity documents—'Quick, we need you to document everything you've ever known about all the possible ways this could fail because of business continuity planning!' We've had a lot of that lately. How do we know that's right?
00:33:42.720 So how do we prepare? This is a chance to improve that—to say, 'Well, actually, this thing we wrote down is wrong, and we've now made it more correct, easier to follow,' what have you. I find that that makes this a much, much easier way to get buy-in.
00:34:02.000 If you can't get a couple of hours, I know I said if you cut it down, do an hour. I would suggest there's a point where it's too short; you probably can't do this in five minutes. But if you can get an hour weekly, that's probably better than two hours once a month. Having space where you can return to this process, where you can continue to build common ground with your peers, is going to be very important.
00:34:23.680 So I've left a lot of time for Q&A because I want to return to the questions that I asked you folks to keep in mind. I personally don't like when folks disappear and say, 'Questions now!' But my answers tend to oscillate between 'No' and 'Yes, one million.' So I'm going to ask this of the group: Do you feel ready to do this when you go back home?
00:34:46.400 And if not, what else can I tell you? What else would you like to know that will help you feel comfortable starting a game day when you get back?
00:35:06.800 Uh, to the mic, please, or if someone could pass one down if you have a question. Thank you.
00:35:34.000 There we go, now it works! I’ll re-ask: What sorts of roles do you usually assign people? I know it'll be different per team, but like, what are some examples you've used, or do you just do based on what they actually perform on the team?
00:35:50.480 I think you're asking about what roles folks would be assigned: like what participants would they be—incident commander? Would they be what have you? Is that sort of what I'm hearing? Yeah, um, for me, I work with teams to do the roles they would likely find themselves in.
00:36:06.320 You know, if you have a sort of, I guess now more traditional incident response mechanism, then I'm going to say that the role you start with is probably incident commander.
00:36:23.520 If you know you have three people and the incident commander person, that's just why we have that complication spin-up. You say, 'Oh, I need a subject matter expert on database.' Well, there's your next person and so on and so forth.
00:36:40.720 If it doesn't map well, that's where you can say, 'Well, you're all just responders.' I'll play the incident commander as a facilitator, and then you all respond. But yes, typically, whatever process you are thinking of adopting, whatever roles exist there, whatever roles you're likely to be.
00:37:02.720 You know, if you're on call, maybe incident commander. If you are at risk for being paged, the subject matter expert probably should do one of each.
00:37:27.040 Okay, that makes sense. Another question I had was, I guess, what level should the facilitator know the answers? Should they be familiar with the exact process you should go through? Should they just have a general understanding? What level of research does the facilitator need to do about the scenario beforehand?
00:37:42.720 So for the scenario, the facilitator is going to have crafted it. They don't have to have written a lot; they can be prepared to improvise. So it's up to your comfort level. I'm going to suggest that you literally just fill in the three portions: you know, a trigger, a complication, and an end state.
00:38:05.680 Be more familiar with the process you're going to be working with than this scenario. I'm going to suggest that just because I think if you go the other way, you can create this cycle where you say, 'I'm not familiar enough with it yet; I have to flesh this out a little more.' Then you never do it.
00:38:29.200 If you do find that you have a scenario you're very comfortable with because it's one you've experienced or you just have that, then that's great—there's no maximum level. But I'm going to say that the minimum is pretty low.
00:39:01.760 And it might work to start with a scenario that you already know and feel more comfortable with. It might not help you to learn the process, but it's about doing the extra exercise, and that's right.
00:39:26.520 So if you have the scenario and your vision as facilitator, your participants may still have never actually been through the incident response process; you know, that's probably the bit they're learning. You know, how do I page someone? How do I ask for more help?
00:39:48.240 How often do I check on people to make sure they did the thing? Do I check on people? Do they open a Slack channel? Do I do a Zoom? What do I do? It's a lot of what we're trying to get at, and the scenario is often a backdrop for us to realize that we have these questions.
00:40:05.520 Okay, thank you! Thank you. Yes, so our team has done tabletop exercises before, but it sometimes feels like they turn more into trainings in the sense that it's trying to remember which document to check or which read.
00:40:21.680 And it sounds like the process is more about becoming comfortable with responding; not necessarily knowing exactly what resource. If that's true, it occurred to me, is it helpful, or is it fair, to do something like say, 'You got paged' and have it be a page that doesn't actually exist?
00:40:43.920 As if you're doing something that we've never seen before. How would you walk through it? Absolutely. I've talked to folks in other scenarios. I talked to someone recently who did logistics for the U.S. Army.
00:41:05.760 One of their tabletop scenarios had to do with HR logistics and was, 'You are visited by aliens. How do you onboard them into the system?' It was clever because what they discovered is, 'Oh wait, we need a translator.' Oh, we can't predict what religion they'll be, so we need someone to help with that.
00:41:31.360 How do we anticipate that? What do we do if we don't know the answer? What if we can't possibly find the answer? So scenarios—not that important; they can be ridiculous. I have a colleague who I'm pushing very hard to write a blog post. She told me that she wants to do one around Jurassic Park as a scenario.
00:41:54.880 I'm going to suggest that if you feel that it is very checkbox, you know, 'Did you get the right Confluence page?' kind of questions, that is a signal that if that level of detail and memorization is needed, and that’s why it's sort of being checked in this way, that is a signal for change in the process.
00:42:26.720 Because I don't know about you folks, but there's no amount of me using Confluence that makes me type the right word into that the first time at 3 AM! It just doesn't happen. If you're so concerned that that's a necessity to your process, to me, that's a signal that thankfully you've surfaced through a tabletop exercise, and you realize how much your process, as performed, requires you to be a perfect memorizing human at 3 AM.
00:42:45.440 That seems unrealistic to me, and I think that's the part that I would suggest working on as opposed to changing other things.
00:43:04.000 So I'm going to ask you again, what else would you like to know that will help you feel comfortable starting a game day? I’m going to ask that you imagine yourself back at home, ready to do it.
00:43:25.760 Anything feel uncertain? Have you managed to think back and feel ready to facilitate your first one?
00:43:56.080 Hello, so I have a question on that. Do you find or do you have experience or any opinions on whether game days or similar activities work better in a one-on-one scenario or in a one-to-many context?
00:44:17.440 You know, keeping people's attention and letting them absorb things if they're not the active participant? Do you have any opinions on sort of that dynamic?
00:44:38.720 Yeah, absolutely! Typically, I don't find there are a lot of situations where one-on-one is especially helpful. I think it can depend on your relationship with whoever that other person is. There is a danger that it too easily becomes a quiz.
00:45:06.480 If we're one-on-one, I can very quickly feel like I'm testing you: Did you get it right? Did you get it right? And it can feel very much like you are being judged, you're being tested on this. This isn't something we're trying out together anymore.
00:45:31.040 If you have a different relationship with whomever the other person is, where you're 100% certain that is not a risk, I would also question whether or not your assessment is accurate. If you're sure that's not a risk, then I think one-on-one can be fine.
00:46:17.440 I think that, realistically, I want to say the more the merrier! Obviously, that has an upper limit. I'd say usually you want to work around team boundaries, whatever that means to you.
00:46:56.240 I've worked with organizations where there's this other person who, if you looked at the org chart or asked someone, is not on the team, but if you look at their incidents or how they ask for help, they might as well be—they always get paged, they always deal with that.
00:47:25.280 So when in doubt, I'd say work around the team boundary as your team or the rest of the organization perceives it. If you're just a huge team, work in small groups and then mix it up. A big part of what is effective about this is getting to watch those folks you work with or hear them say something that would have never occurred to you.
00:47:48.080 Whether right or wrong doesn't matter—someone's going to say, 'Oh, I would just restart the Atlas service,' and you'll find yourself at some point thinking, 'What? Why would you do that?'
00:48:06.560 Without that chance for the people there to have these differing mental models of how the systems work, without them being there to share their views, a lot of this is going to be less effective. That’s why I want to look at kind of team boundary people you are likely to work with—it doesn't have to be strict; you don't have to say, 'Oh, they're not on my team so they for sure are not invited.' I'm going to suggest that when in doubt, invite them.
00:48:33.120 Do you have a list of roles that are useful in incident response that would be helpful to outline for people who aren't familiar with the process?
00:49:02.800 I have a number of them, mostly because I'm a bit nerdy about this. I think that this sort of traditional model that I've seen in more places—that's based on the U.S. incident command system—says that your first couple roles are usually incident command (or incident manager), and some folks like to call it.
00:49:37.320 You'll often have someone in scribe or communications. This tends to be a role, and that's someone who either takes notes or communicates into other channels or communicates with other folks. Then usually, you'll have someone generically titled SME, subject matter expert, which is just someone you call for help with something specific.
00:50:05.120 I think of those as the three key roles, regardless of what your incident response framework calls for. Those probably represent a fair bit of the work and activity that's actually going on.
00:50:52.080 And if it helps you to know that this is something, if you are more comfortable, you know, you can write that out when you make your scenario. You can say, 'Okay, I'm going to make this person that role,' and this person that role. That helps you feel more comfortable.
00:51:04.720 You mentioned Scribe just now—how much documentation do you do during the event, and post-mortem? You kind of alluded to how you're developing this from your reservoir of documentation.
00:51:20.320 How much do you flow back, and how much do you pull that? You kind of lose that having those confidence-based searches; how much are you actually using that live in the event, and then flowing that back?
00:51:40.640 I'm going to suggest that a lot more of it should flow back because the point is to interact with this process. It shouldn't be a quiz on what you know. It should feel open book—that it's not a test of did you memorize what you're supposed to do.
00:52:06.560 If your process is written down in a step-by-step place, feel free to pull it up; feel free to follow it. I think that's great. It's probably the way you would work in real life. If you find that's burdensome, I'd say don't.
00:52:34.080 But I would then wonder, like, what the process is, and in that case, that's again a place where I want you to feed back. You know, why is it so burdensome that you can't pull it up? Is it that there's so much documentation that there's not even like, 'What do I do?' It's there!
00:53:01.920 It doesn't have to be documentation heavy; it should feel kind of like an open book sort of thing, where I have it there if I need it. It doesn't mean I look at it if you want to produce. I usually don't produce a lot of documentation out of this at first if I don't have to. I take notes, but that helps me learn and remember things.
00:53:25.440 I sometimes write down questions I have, things that surprised me, just to discuss after. That sort of data doesn't usually end up directly back in anything.
00:53:51.680 But for example, if I note for myself that I actually didn't know how to open an incident response channel correctly—oh, is there a naming convention? Is there something? Is it public or is it private? I might write that down, and then hopefully what flows back is some answer on that.
00:54:11.200 Thanks very much for the talk! The question that came to my mind as you were talking and I was thinking about incidents I've been involved in, is that often the most challenging and interesting parts of the incident are what happens after the immediate issue is resolved.
00:54:35.040 You know, we've rolled back; we fixed it, but now there's a flood of emails to customer service, or there's the actual discovery of what really went wrong. I'm wondering to what extent the tabletop exercise can take those post-incident issues and make that part of the experience?
00:55:02.680 Sure! I think it all depends on what those things are. If you're saying, 'Oh, we learned as a result of this incident that this API we wrote functions in a entirely different way than we thought it did,' then I don't see much to be tabletop there.
00:55:25.680 If we agree that we intended it to work like A and in fact, it worked like B—that was surprising—I see that as something later stage. The benefit of eventually building so much comfort and familiarity around our incident response is we can then devote energy to the next stage, looking at our incidents, learning from them, changing things so that they work the way we want.
00:55:59.680 If there's not really a process to interact with or a sort of cohort to surface, then I'm going to say that the tabletop is probably not the right format. But by doing them earlier in the incident cycle, you've now created a situation where instead of the response taking up so much energy, so much focus.
00:56:23.520 When it becomes just this thing we do, we respond to incidents—they're difficult, but it's just a thing we do— that leaves room for that later analysis, that learning, and that change.
00:56:39.680 So I know we are approaching lunchtime, but I'm going to ask you folks again, what else can I tell you? What else can I answer for you? What can I speculate wildly upon to help you feel comfortable and ready to do this when you get back?
00:56:58.560 Is there something stopping you?
00:57:02.560 I’d be happy to hear it too.
00:57:14.560 Alright, so do you have any metrics or any kind of impact that can demonstrate and tell leadership—like, 'Hey, if we dedicate an hour of our week to do game days to more development?' Any metrics in terms of that? To justify this kind of investment?
00:57:22.560 So for me, I tried to make that the benefit of process improvement. 'Oh hey, we learned that all our documentation was pure optimistic fantasy, and now we've brought it one inch closer to reality.' That's the metric I prefer to surface.
00:57:49.680 Because I'm assuming here we're talking about numbers, some sort of quantitative metric, and I firmly believe that a lot of this is going to be qualitative. You might, depending on organizations, see some number shift.
00:58:10.160 If you implement this program, if you have volunteers for on-call, maybe you have more people who are comfortable with it and thus volunteer—I don't know—maybe you don't.
00:58:33.360 But so much of this is familiarity, comfort, and learning that is very difficult to measure in a single number, and I would suggest not trying to do so.
00:58:56.240 That's why I suggest trying to make that metric process improvement. I am kind of obnoxious when asked those questions by certain people, so I would be inclined to challenge them.
00:59:20.000 You know, when they send folks to a three-day React (whatever type coding) thing, what metric do they use to evaluate it? Do they comb your PRs to find out if you are some number better at your code? Surely not, right? So the notion that we train without a quantitative improvement is well understood.
00:59:40.560 I think sometimes folks just have trouble realizing that that's what this is, and that may not always be an easy sell, depending on where you are.
01:00:02.240 Oh, on the slides you see, you had, you know, pick a scenario, pick a complication, and pick a goal, which implies that's part of the facilitator's job to have a goal in mind.
01:00:20.560 Yes, is there value, or it seems to me that there's value in having that be part of the tabletop if you're in an incident response. Part of what I want out of training someone is that they understand when their job is done. Is that a good way to run it?
01:00:45.760 Yes, absolutely! And that's what I'm trying to get at. I just want you to go into it as a facilitator, not saying, 'Oh, I’ll know it when I see it that they're done, because they might ask you, 'Is the service back up yet?' You need to know when to tell them, 'Yes, like you won!'
01:01:04.320 Instead of saying, 'Continue to play for some amount of time,' it doesn't have to be an especially hard goal, but you just want to have something for yourself that says, 'Yup, you did it, we're done,' and you can wrap that up and step out of it and now talk about how that went instead of sort of being, um, I find that without that, it can be a guessing game.
01:01:34.240 Very easily. Just like eventually, as I know for me as a participant/responder, I'm just going to be like, 'I type the following Linux commands; I don't know, like, one of these is going to work. I don't know; I burn down the data center. Like, I don't know!'
01:02:01.840 So you want to have a goal in mind such that it doesn't feel like a guessing game; just so that there is you know, there is an end, you recognize what happens, they know they've gotten there.
01:02:18.000 So what else has occurred to you folks? I asked you at the beginning to note down some questions that might be keeping you from doing this. I'm presuming by being here, there's some amount of interest, even if that calculus was, 'I am less interested in Docker.' That's fine; I understand.
01:02:53.840 But at the beginning, when I asked you what questions are keeping you from it, are there any that remain for you? Any that occurred to you as we've gone through this? As you imagine yourself going back home, do you feel like you know how you're going to do this?
01:03:07.440 In case you didn't notice, I'm practicing my facilitator skill of being comfortable with silence. I have a timer here, so I know how long it's actually been—not how uncomfortably long it might feel.
01:03:22.960 But if there are no further questions, I'm going to suggest that the scenario you developed, or put together from other people's, is something to put in your pocket. And how you do it—if you're on a team where you have the ability to schedule that—I'm going to suggest that you use the time between now and lunch to send out an invite and schedule it.
01:03:38.560 If you don't feel ready to do that and don't feel comfortable asking a question of me here, I will be in the Discord. I will also be around if you would like to ask a question you're not able to.
01:04:01.680 If you are on a team where you can't, if you don’t feel that you have the power just to send that invite, I would suggest that you send an invite for engaging interest in this—talk to people about it, suggesting that you do it—at least seeing if there's a possibility.
01:04:23.920 I'm hoping that whoever does traffic analysis or whatever on the Wi-Fi is going to make a note that a bunch of ICS files just flew across the network for calendar invites, right? That's good! I'm pretty sure that's what's going to happen.
01:04:46.320 Alright, alright folks, with that, I'll let you get some extra time so you can start getting ready for lunch. Thank you!
Explore all talks recorded at RubyConf 2021
+95