Zachary Feldman

Home Automation with the Amazon Echo and Ruby

@zachfeldman
The Amazon Echo recently debuted and made a big splash with its incredibly accurate voice recognition, capable of hearing and transliterating commands from 20-30 feet away. Home automation enthusiasts and hackers alike wondered if it would be possible to intercept commands from the device and trigger custom actions. While device traffic is encrypted, the device pushes commands to a history page in a web application. Using Watir WebDriver, which normally is used for feature testing, we've created a proxy that can be run on a Raspberry Pi as well as a modular Ruby framework based on Sinatra to run custom commands, allowing us to control the Hue wireless lighting system, Nest, and even request an Uber!

Talk given at GORUCO 2015: http://goruco.com

GoRuCo 2015

00:00:14.340 Hello everyone! I'm here to give you a talk about home automation using the Amazon Echo and Ruby. I hope this presentation goes well since I’ve never tried this setup in front of 300 people before. To start, I wanted to create an API to help solve some of my first-world problems—because why would anyone want to get up off the couch? My name is Zach Feldman, found as @zacksfelman on GitHub and Twitter. I'm the co-founder and Chief Academic Officer at the New York Code and Design Academy, one of those new coding boot camps you’ve been hearing about. One of our students is actually in the audience—shoutout to NYCDA! We offer classes on web development ranging from basic to advanced iOS development and UI/UX design. Check us out, and feel free to ask me any questions afterward. We're launching in Amsterdam this September, and I’m really excited about it.
00:00:26.800 So, the Amazon Echo is this device right next to me that resembles a garbage can—kind of like R2D2 in a way. It has seven microphones, which is incredible. I can confidently say that no other smartphone, including Android models, has this level of hardware. All other methods of speech recognition so far have been on inferior hardware. While I may not be an expert in parsing waveforms into text, I can guess that it's easier to recognize speech from seven microphones than from one or two. This is why the Echo is so amazing. I'm often sitting on my couch or even in my bathroom, and I can issue commands from over 20 feet away. When I speak to it—even without using the wake word—it understands my commands clearly due to the current generation of voice recognition technology.
00:01:11.229 When I first got my Echo around eight or nine months ago—don’t ask how I got it this quickly—I initially enjoyed its built-in functionality. For instance, I can say 'Alexa, set an alarm for 41 seconds from now,' which I think is fantastic. It's great for checking the International Space Station's location, asking for music, and getting weather updates—all with just my voice! Sometimes, I ask my Echo to do things it can't understand, and it’ll tell me so politely. It has come with outstanding built-in functionality, and while I was impressed upon unboxing, I also thought, 'Why can’t it do more? This seems a bit ridiculous. Surely I can make it control more things in my life!' That thought led to the development of enhanced features.
00:02:17.500 I created a kind of proxy and an API to enable these extended functionalities. I’ll go ahead and start it up now to demonstrate. Without touching my computer, I’m putting out a request to add an event to my calendar. I can say, 'Alexa, add an event, Breakfast at Tiffany’s at 5:30.' Just like that, if all goes according to plan, it should acknowledge the event despite my use of 'stop' in the command. This workaround is necessary, as not every function Amazon intended is accessible to us yet. If anyone from Amazon is present—sorry, but I figured out ways to extend the Echo's capabilities.
00:03:01.190 Let me show you another functionality. I can say, 'Alexa, tell the world Gotham ruby conference rocks.' It should trigger a tweet. This highlights that the command is increasingly interpreted, even if the subsequent text can sound garbled in a crowded room. It generally works better at home, as you might imagine. The idea behind this system relies primarily on the Alexa Home Project, which has a GitHub repository available for those interested in examining the code further. The primary concept is that when a new command is posted, it is sent to a server that receives and interprets these commands. We utilize a discriminating scraping system for commands and their respective actions.
00:03:41.300 The scraper employs Watir WebDriver to log into the Amazon Echo web application, allowing it to monitor what commands you give it. Whenever a command is received, it generates a corresponding response, which is then sent back to the server for parsing. Using regel expressions, we can systematically navigate through the commands and ensure proper handling without compromising user privacy. I assure you that all requests sent via the Echo are encrypted.
00:04:37.720 After experiencing some issues with executing duplicate commands, I had to insert a mechanism to ensure that each command was processed uniquely. Hence, we injected JavaScript into the web page monitoring when the Ajax complete event signals a new command addition. This way, whenever a new command gets pushed to the Echo's history page, we can decode it and trigger the server to perform the desired action. It's crucial to avoid redundant actions, such as lights randomly turning off mid-conversation due to past commands being re-triggered. Our monitoring system employs pattern recognition, so it filters through and avoids so-called ghost commands before they execute.
00:05:35.860 To achieve this setup, I used my existing knowledge of Sinatra. Each of my modules communicates with the main application through their respective APIs. For instance, a specific module is created for controlling the Hue lights; simply scanning for the 'turn on' trigger activates the Hue API, effectively managing the lights around me. To keep things straightforward, I built a modular structure so other developers could contribute easily. Contributors are not limited to one functionality—they can create anything with the same foundation. All modules are independently defined to ensure clean interactions with the system.
00:06:53.760 As we progressed, the project's architecture refined itself. I prioritized documenting it thoroughly and facilitating an extensible design to encourage collaborative coding and contributions from others. The resulting code base is organized in different folders for easy navigation. We even have documentation detailing how to run this program on a Raspberry Pi and manage modules efficiently.
00:08:04.290 Each module follows a pattern established through a class structure, initiating with a wake word so the system is aware of what commands to activate. For each command received, the relevant module parses it and executes the designated action. For example, when processing Twitter commands, I simply extract the keywords, removing unnecessary segments, while passing the necessary strings to execute the tweet action seamlessly. Each module is encapsulated, allowing for frequent checks against active commands and managing their instances efficiently.
00:08:49.249 In parallel, I worked on Google Calendar and Uber modules, making my architecture more comprehensive. These added functionalities offer unique versatility but also introduced challenges, demanding precise design considerations. Each time I design a new module, I devise how it communicates with the rest of the system while preserving the smoothness of command executions. A great bonus is that this gives developers a practical avenue for open-source contributions, enriching the overall platform.
00:09:48.630 During this journey, I found an opportunity to engage others keen on enhancing the existing framework. Through the project's convivial backdrop, other interested developers can collaborate to innovate, and I’m thrilled to witness continuous contributions rolling in as knowledge expands.
00:10:40.180 However, the most gratifying aspect remains seeing others becoming inspired and contributing their unique inputs to the ongoing projects. Collaborations with folks like Stephen Arcana demonstrate how our efforts can foster learning experiences, propelling new developers into greater depths of coding and interfacing applications through commands. As much as I aim to support someone in launching their career in technology, I’m also growing parallelly, building my skills and understanding in these areas.
00:11:41.000 As I draw near the end, I implore you to consider what we've built. I urge you to reach out, engage, and contribute to the further development of this project. We aim to establish an ecosystem where innovative thinking can thrive and evolve. Embracing open-source initiatives, we collectively pave the path towards simpler and significantly more engaging solutions, ultimately ushering transformative changes in our interaction with technology. Thank you for your time!
00:20:45.130 Thank you.