Talks

ChatOps

ChatOps

by Josh Nichols

In the video titled "ChatOps" presented by Josh Nichols during the Rocky Mountain Ruby 2012 event, the speaker introduces the concept of ChatOps, a blend of 'chat' and 'operations'. This approach leverages electronic chat platforms to streamline day-to-day business operations, enhancing collaboration and task management through the use of automation tools, specifically chatbots such as Hubot. Josh shares his experiences with Rails Machine, outlining the challenges his small team faced due to losing key personnel and the consequent operational chaos. This difficulty prompted the need to innovate and rethink operational strategies.

Key points discussed include:

  • Definition of ChatOps: ChatOps combines communication and operational activities using chat tools to facilitate engagement and task execution within teams.
  • Rails Machine's Background: Josh describes the evolution of Rails Machine from a simple hosting service to offering comprehensive support and operational consulting for their clients.
  • Challenges in Operations: With a small team and the departure of several engineers, operational knowledge gaps became evident, leading to struggles in managing multiple tools and communication streams.
  • Introduction of Automation: The emergence of chatbots (like Hubot) as a solution for unifying operations and simplifying task management. It enables users to interact with systems via chat, reducing the complexity of accessing information and managing workloads.
  • Implementation Examples: Josh shares how he developed scripts for Hubot to retrieve server status, manage on-call duties, and track customer account details, enhancing operational efficiency and team communication.
  • Personal Impact: The adoption of ChatOps not only improved operational productivity but also positively influenced Josh's personal life, contributing to improved well-being and work-life balance.

In conclusion, ChatOps transformed task management at Rails Machine through automation and interactive communication, illustrating that effective use of technology can lead to significant improvements in both team morale and productivity. The video emphasizes the importance of enjoying one's work and adapting iteratively to challenges, showcasing a successful case study in utilizing ChatOps effectively.

00:00:09.519 Good morning! My name is Josh Nichols. Here's a picture of me wearing a tuxedo and wielding a sword. This was from an awesome wedding, incidentally. I work at Rails Machine.
00:00:16.480 Officially, I'm the Chief Technology Officer, but unofficially, I'm the Chief Technical Pickle.
00:00:26.960 Today, I'm going to be talking about ChatOps. Now, the first thing you might be thinking is, 'What the heck is that?' What is ChatOps?
00:00:33.680 Whenever I come across a new term, I usually think, 'Okay, I'll look it up in the dictionary.' Well, it's not going to be there, so let's try breaking it down into these two parts.
00:00:40.559 The first part is 'chat.' Now, when you think of chatting, you might think of talking like I’m talking to you right now, using spoken words, which can sometimes be slow and inefficient.
00:00:46.239 However, that's not what I'm talking about. I'm referring to electronic chat – using platforms like IRC, Campfire, Jabber, or any of those tools. So when I say 'chat,' I mean internet chat.
00:01:05.280 The second part is 'ops,' short for operations. This is not the kind of operation where you have a doctor slicing you open, nor is it about servers. If you know me or Rails Machine, you might initially think I’m referring to server operations, but that's not it.
00:01:16.320 I’m talking more about the day-to-day operations of running a business; the essential tasks that keep everything running smoothly. For example, if you have a product, you need to support your customers, handle development, manage sales, or if you are a consultancy, work with ticketing systems and customer interaction. At Rails Machine, we deal with servers, applications, code, Rails, and all that good stuff.
00:01:39.680 So when we put them together, what do we get? ChatOps. It involves using electronic chat for day-to-day business operations. But that, in itself, could be pretty boring.
00:01:44.720 If you were just using Campfire for all that, it wouldn’t be worth discussing. So what makes it interesting? What if let's say you're discussing business in chat and that discussion includes a robot helper to provide additional context or even do some of the tasks for you?
00:02:06.080 I like to say, 'Anything is possible with enough code.' The essence of ChatOps is using code to create a robot helper that resides in your chat, assisting you with your work on a daily basis.
00:02:50.160 So what's the big deal? That's why I'm here to hopefully provide you with insights into what it actually looks like and why it's interesting. Before we delve into that, I'd like to share a story about Rails Machine and how we arrived at ChatOps and how we're using it today.
00:03:11.920 Rails Machine has been around for a while; I believe we are one of the first Rails-specific hosting companies. Everything began quite simply, with a Virtual Private Server (VPS) configured for running Rails. We also had the Rails Machine gem, which was akin to the idea of Rails having a fifteen-minute blog.
00:03:36.000 We had deployment in 50 minutes; it achieved solid outcomes. However, as our customers grew and their needs became more complex, we had to adapt. They required additional support and more hands-on assistance, which led us to offer what we call managed hosting.
00:04:03.920 In addition to providing physical servers, we started offering consulting to help take a Rails app from development to production, including deployment and monitoring. If a system goes down in the middle of the night, our customers won’t need to worry about it; we’ll handle it and help identify what went wrong and how to get it back online.
00:04:25.919 Additionally, there’s a lot that goes into managing Rails applications in production, and we offer operational consulting to help ensure that they are available, performant, and capable of growing consistently.
00:04:44.000 However, we’ve faced some challenging times recently. Rails Machine has always been a small company; our current team is just four people while we serve hundreds of customers. We have been small for a long time; I don’t think we’ve ever exceeded ten employees.
00:05:02.800 About a year ago, we had a situation where three of our four engineers opted to take different professional opportunities. The only engineer left is me, with two thumbs, standing here to give this presentation.
00:05:24.320 During that tough period, which was certainly challenging, I realized the true value of our employees lies in the knowledge they possess. It's not merely about knowing how to complete tasks, such as being proficient in Rails or Apache; it's about understanding how the business operates.
00:05:53.680 When you lose that knowledge, you can manage to keep going with what you have, but it's comparable to having a blank area in your brain. It happens in companies; you must be resilient and keep moving forward. All this while, we are still trying to deliver quality service to our customers.
00:06:06.080 We were handling support tickets, on-call work, monitoring, consulting, building tools, and engaging in open-source development. Broading all of these tasks into one representation is cluttered; it's tricky to visualize all that happens on a daily basis. The core of the issue is that there wasn't a single dashboard or centralized system to manage everything.
00:06:36.000 This made it increasingly difficult; it felt almost as if we were trapped. When you’ve got numerous tools, learning how to use each effectively and interacting with them can become a grueling task. I realized I had to remember multiple URLs, search for items, click, copy, and paste information from one tool to another.
00:07:06.880 We managed to navigate through the challenges thanks to our outstanding staff, but losing that talent only compounds the issue. If you're short-handed, the feeling of being constrained becomes overwhelming, and I can deeply relate to that.
00:07:27.480 Personally, I felt like I was trapped; I was struggling and overwhelmed. The toll it took on my personal life was tough; I wasn't sleeping well and gaining weight, and I hadn’t been on a date in years—it was just awful. I felt stuck in carbonite, devoid of options.
00:07:48.240 In a way, it felt pretty dark. If Rails Machine had a movie saga, we were at the end of 'The Empire Strikes Back'. However, things were set to improve, and I wouldn't be here today if they hadn’t.
00:08:08.160 This challenging situation persisted for several months. Earlier this year, it became evident that we needed to make a change. Our CEO, Bradley Taylor, came in and declared that 'This has to change.'
00:08:20.319 I'll never forget that morning when I woke up and checked my email and saw a subject line that read 'New World Order'. This email outlined a new division of responsibilities: 'Kevin, you're doing this. Ernie, you're doing that. You’re no longer responsible for this aspect.' It was amazing how a slight shift in perspective made things so much better.
00:08:44.480 The only odd thing about that email was that my name wasn't mentioned at all, which momentarily made me wonder if I still had a job or what was happening. Fortunately, it turned out I just needed a break.
00:09:06.240 For a while, I stepped away to gather my thoughts, and eventually, I decided to recharge—perhaps hanging out in a hammock on the beach could help me refocus my life and work.
00:09:26.560 However, I found it challenging to disconnect because I tended to check support tickets—only to remind myself that wasn’t my responsibility anymore. It was tough, but eventually, I started contemplating the solution, which led me to the concept of ChatOps.
00:09:39.840 After discussions with Bradley and my friend Jesse Newland at GitHub, we came to the idea of utilizing chat as a platform for managing tasks. We didn’t have a centralized dashboard to navigate tasks, but we constantly used Campfire in our daily operations, so what if we could leverage that platform?
00:10:03.520 I needed to formulate a plan and define some goals before embarking on this journey. I wanted the work to be enjoyable, to give me a sense of joy and fulfillment. My heart needed to glow with joy, not just in terms of the development but also in terms of sharing our work and achievements with others.
00:10:34.720 I wanted this workflow to be iterative because the workload at Rails Machine fluctuates considerably. We do consulting, and tasks come and go unexpectedly; it's challenging to pin down a week’s worth of work on a single project at a time. Hence, I needed something feasible that I could work on steadily each time.
00:11:01.920 Moreover, I wanted the ChatOps process to be well-documented—not just in a plain text file, but more interactively so that its functionality explains itself. Whenever someone uses this ChatOps tool, it should help new users seamlessly understand how to participate.
00:11:26.560 Now with those goals in place, I could explore each individual task to see if it aligned with our goals. If a task didn’t meet those criteria, I would reconsider whether or not to pursue it.
00:11:58.240 Several months prior, GitHub had released their Campfire bot, Hubot. After eagerly discussing it, I finally realized it was an obvious path for us to follow. We had a Campfire bot built with a Ruby library called Fire Tower, but I wanted to move on from that.
00:12:09.440 The original bot wasn’t very enjoyable to work with, so I ended up taking over maintenance of it. Hubot's use would offer itself as a much more enjoyable project and a way to engage a wider community, where others have been blogging about its functionality.
00:12:27.200 There was a repository of scripts out there, some of which are whimsical and delightful, while others are genuinely useful. I didn’t expect that, after getting comfortable with Hubot, I might end up as a maintainer for the bot itself.
00:12:47.440 Now, rather than attempting to do live coding here, I can demonstrate Hubot’s functionality live. Unfortunately, I won’t have enough time for an in-depth 'getting started' session with Hubot, as that could take an entire talk by itself.
00:13:10.080 However, I’ll give you a rapid glimpse into how it functions. Hubot is a Node package, and we have a package.json file for managing dependencies in Node. This file outlines the required dependencies for our Hubot implementation.
00:13:41.680 We’ve a Hubot scripts file that lists all the functionalities we want to implement. The Hubot setup engages with a source scripts directory, from which it utilizes Coffee or JavaScript files.
00:14:07.920 Having already set my goals, I can then think of how to ask questions in Campfire and how to answer them. For instance, it’s a workday morning where I’m a bit tired, contemplating whether I want to make a latte or iced coffee.
00:14:30.720 I want to know if it’s good weather for iced coffee or if I should stick to hot drinks. There’s this website called 'Is It Iced Coffee Weather?' that serves this purpose. Unfortunately, they don't have an API, so I’ll have to do some simple screen scraping.
00:15:01.840 Therefore, I wrote a script to retrieve this information using the Cheerio library to extract the necessary elements from the website to respond to my inquiries.
00:15:34.080 The bot responds to the command with whatever relevant output it gathers. Initially, I wanted to make sure I provide a means for the bot to respond accurately based on my current location since it doesn't know automatically.
00:15:57.680 Once I have a functioning bot, it's essential to think carefully about its name. As a fan of Borderlands, I decided to name my bot Claptrap. So now I can ask, 'Is it iced coffee weather?' and Claptrap replies, presuming that the bot is alive.
00:16:21.680 This interaction is simple and powerful, providing me with the necessary information promptly. Now that I have the coffee query sorted out, it’s time to delve into work-related monitoring.
00:16:46.680 I need to check the status of certain servers and ask if they are being monitored, how much memory they have available, and so forth. We have an internal application called Imperial Probe Droid that tracks the state of these servers.
00:17:11.680 If I need to know about the Claptrap server, I can easily search within this application to find a concise overview of its status. This system will help me establish whether or not it's being monitored effectively.
00:17:32.240 However, the traditional way of looking this up involves logging into the app and running manual searches, which can be tedious. Wouldn’t it be much easier if I could just retrieve this crucial data directly from Campfire instead?
00:18:07.200 To make that happen, I wrote a script that pulls the necessary information. The method follows a boilerplate structure to ensure clarity and conciseness throughout its function.
00:18:35.760 Within the script, I ensure documentation is available for user guidance. Creating a range of command styles with regular expressions facilitates interaction with the bot. This allows it to respond depending on user input accurately.
00:19:00.720 Additionally, there are various methods available within the context of this application to assist with querying and gathering relevant data. This helps streamline the process of obtaining information rapidly.
00:19:27.680 Next, we have a paging system, known as PagerDuty. This system ensures we have coverage, so someone is always on call to address issues as they arise.
00:19:50.160 It's beneficial to inquire who is on call, so I implemented a way to ask Claptrap who is currently responsible for monitoring. This functionality adheres to similar patterns established in previous scripts for consistency.
00:20:12.640 We also made the decision to automate announcements, notifying the team whenever someone comes on shift for monitoring. This helps ensure everyone is aware without needing to query the bot constantly.
00:20:39.680 Additionally, we rely heavily on New Relic for performance monitoring. It allows us to gauge the health of our servers and address slow performance issues. However, it can be challenging to keep track of which customers have active New Relic accounts.
00:21:05.680 By implementing a script that enables us to identify and find information about customer accounts, we can minimize the hassle of manually querying New Relic and gathering insights.
00:21:32.880 This allows us to effectively ask if a customer has a New Relic account and quickly obtain the necessary details to correlate back to their application.
00:22:00.320 As we are making developments, I wanted to conclude with a fun aspect. Our office environment often features shared audio speakers, and we frequently receive inquiries about the current song playing.
00:22:28.480 A fun script allows us to discern the current song's name by utilizing some form of audio recognition or track identification to provide quick answers.
00:22:58.320 Through interactions involving metadata about the songs being played, we generated an implementation that eagerly recommends information back to users within our chat application.
00:23:26.960 Encouraging a fashionable culture via data sharing and enthusiastic collaboration makes working with our tools much more enjoyable.
00:23:57.120 In conclusion, ChatOps has transformed our day-to-day tasks into more manageable and interactive experiences, which contributes significantly to morale and productivity within our team.
00:24:07.920 So the main takeaway from my presentation is to enjoy what you do while progressing iteratively through challenges. Since adopting ChatOps, I’ve shed 30 pounds, met an amazing partner, and now serve as CTO—things are going really well.
00:24:57.040 If anyone is utilizing Hubot, please reach out to me! I’d love to hear about your experiences. Here’s my contact info: joshuailsmachine.com and don’t forget to enjoy our technical pickles!
00:25:22.400 Are there any questions?
00:25:31.440 Regarding the question about my gaming adventures, I have unfortunately not played Borderlands 2 yet, but I look forward to it once I return to the office.
00:26:07.440 There was also a question about keeping the bot active. We use ‘god’ for monitoring the process, which ensures it stays alive, but disconnections can still occur occasionally, something I'm actively working on.
00:26:36.960 If there are no more questions, I want to express my gratitude to you all for being such an engaging audience.