Home Automation with the Amazon Echo and Ruby

by Zachary Feldman

This video, titled "Home Automation with the Amazon Echo and Ruby" and presented by Zachary Feldman at GoRuCo 2015, explores the integration of the Amazon Echo with Ruby programming to enhance home automation experiences.

Zach begins by introducing the Amazon Echo, highlighting its advanced capabilities, particularly its impressive voice recognition technology which uses seven microphones for better command interpretation. He shares how he came to own the device and his initial enjoyment of its built-in functionalities, such as setting alarms and playing music, while also expressing a desire for further enhancements.

Key points discussed include:
- API Creation: Zach explains the motivation behind developing an API to extend the Echo's functionalities, showcasing how he began to create a proxy and an API to achieve this.
- Command Integration: He demonstrates how to issue an event command without physical interaction through the Echo, illustrating the practicality of voice commands in managing tasks.
- Scraping and Interaction: The use of Watir WebDriver to log into the Amazon Echo web application for monitoring verbal commands is explained, along with how to ensure user privacy through encrypted command requests.
- Handling Command Redundancy: Zach describes how he resolved issues of executing duplicate commands by implementing a monitoring system that filters out ghost commands.
- Modular Architecture: The project employs a modular structure built with Sinatra, allowing for easy contributions and extensions by other developers, with specific modules created for controlling devices like Hue lights, Google Calendar, and Uber.
- Collaborative Potential: He emphasizes the importance of documenting the project to foster collaboration, encouraging other developers to contribute to the open-source initiative to enhance functionality.
- Inspiration and Growth: Zach stresses the rewarding nature of engaging with others in the tech community, nurturing both personal growth and helping new developers thrive.

In conclusion, Zach urges the audience to engage with the project, highlighting the potential of open-source innovation to enrich our interactions with technology and automate everyday tasks more efficiently. The video successfully conveys how programming can empower users to create tailored home automation solutions using the Amazon Echo and Ruby.

00:00:14.340 Hello everyone! I'm here to give you a talk about home automation using the Amazon Echo and Ruby. I hope this presentation goes well since I’ve never tried this setup in front of 300 people before. To start, I wanted to create an API to help solve some of my first-world problems—because why would anyone want to get up off the couch? My name is Zach Feldman, found as @zacksfelman on GitHub and Twitter. I'm the co-founder and Chief Academic Officer at the New York Code and Design Academy, one of those new coding boot camps you’ve been hearing about. One of our students is actually in the audience—shoutout to NYCDA! We offer classes on web development ranging from basic to advanced iOS development and UI/UX design. Check us out, and feel free to ask me any questions afterward. We're launching in Amsterdam this September, and I’m really excited about it.

00:00:26.800 So, the Amazon Echo is this device right next to me that resembles a garbage can—kind of like R2D2 in a way. It has seven microphones, which is incredible. I can confidently say that no other smartphone, including Android models, has this level of hardware. All other methods of speech recognition so far have been on inferior hardware. While I may not be an expert in parsing waveforms into text, I can guess that it's easier to recognize speech from seven microphones than from one or two. This is why the Echo is so amazing. I'm often sitting on my couch or even in my bathroom, and I can issue commands from over 20 feet away. When I speak to it—even without using the wake word—it understands my commands clearly due to the current generation of voice recognition technology.

00:01:11.229 When I first got my Echo around eight or nine months ago—don’t ask how I got it this quickly—I initially enjoyed its built-in functionality. For instance, I can say 'Alexa, set an alarm for 41 seconds from now,' which I think is fantastic. It's great for checking the International Space Station's location, asking for music, and getting weather updates—all with just my voice! Sometimes, I ask my Echo to do things it can't understand, and it’ll tell me so politely. It has come with outstanding built-in functionality, and while I was impressed upon unboxing, I also thought, 'Why can’t it do more? This seems a bit ridiculous. Surely I can make it control more things in my life!' That thought led to the development of enhanced features.

00:02:17.500 I created a kind of proxy and an API to enable these extended functionalities. I’ll go ahead and start it up now to demonstrate. Without touching my computer, I’m putting out a request to add an event to my calendar. I can say, 'Alexa, add an event, Breakfast at Tiffany’s at 5:30.' Just like that, if all goes according to plan, it should acknowledge the event despite my use of 'stop' in the command. This workaround is necessary, as not every function Amazon intended is accessible to us yet. If anyone from Amazon is present—sorry, but I figured out ways to extend the Echo's capabilities.

00:03:01.190 Let me show you another functionality. I can say, 'Alexa, tell the world Gotham ruby conference rocks.' It should trigger a tweet. This highlights that the command is increasingly interpreted, even if the subsequent text can sound garbled in a crowded room. It generally works better at home, as you might imagine. The idea behind this system relies primarily on the Alexa Home Project, which has a GitHub repository available for those interested in examining the code further. The primary concept is that when a new command is posted, it is sent to a server that receives and interprets these commands. We utilize a discriminating scraping system for commands and their respective actions.

00:03:41.300 The scraper employs Watir WebDriver to log into the Amazon Echo web application, allowing it to monitor what commands you give it. Whenever a command is received, it generates a corresponding response, which is then sent back to the server for parsing. Using regel expressions, we can systematically navigate through the commands and ensure proper handling without compromising user privacy. I assure you that all requests sent via the Echo are encrypted.

00:04:37.720 After experiencing some issues with executing duplicate commands, I had to insert a mechanism to ensure that each command was processed uniquely. Hence, we injected JavaScript into the web page monitoring when the Ajax complete event signals a new command addition. This way, whenever a new command gets pushed to the Echo's history page, we can decode it and trigger the server to perform the desired action. It's crucial to avoid redundant actions, such as lights randomly turning off mid-conversation due to past commands being re-triggered. Our monitoring system employs pattern recognition, so it filters through and avoids so-called ghost commands before they execute.

00:05:35.860 To achieve this setup, I used my existing knowledge of Sinatra. Each of my modules communicates with the main application through their respective APIs. For instance, a specific module is created for controlling the Hue lights; simply scanning for the 'turn on' trigger activates the Hue API, effectively managing the lights around me. To keep things straightforward, I built a modular structure so other developers could contribute easily. Contributors are not limited to one functionality—they can create anything with the same foundation. All modules are independently defined to ensure clean interactions with the system.

00:06:53.760 As we progressed, the project's architecture refined itself. I prioritized documenting it thoroughly and facilitating an extensible design to encourage collaborative coding and contributions from others. The resulting code base is organized in different folders for easy navigation. We even have documentation detailing how to run this program on a Raspberry Pi and manage modules efficiently.

00:08:04.290 Each module follows a pattern established through a class structure, initiating with a wake word so the system is aware of what commands to activate. For each command received, the relevant module parses it and executes the designated action. For example, when processing Twitter commands, I simply extract the keywords, removing unnecessary segments, while passing the necessary strings to execute the tweet action seamlessly. Each module is encapsulated, allowing for frequent checks against active commands and managing their instances efficiently.

00:08:49.249 In parallel, I worked on Google Calendar and Uber modules, making my architecture more comprehensive. These added functionalities offer unique versatility but also introduced challenges, demanding precise design considerations. Each time I design a new module, I devise how it communicates with the rest of the system while preserving the smoothness of command executions. A great bonus is that this gives developers a practical avenue for open-source contributions, enriching the overall platform.

00:09:48.630 During this journey, I found an opportunity to engage others keen on enhancing the existing framework. Through the project's convivial backdrop, other interested developers can collaborate to innovate, and I’m thrilled to witness continuous contributions rolling in as knowledge expands.

00:10:40.180 However, the most gratifying aspect remains seeing others becoming inspired and contributing their unique inputs to the ongoing projects. Collaborations with folks like Stephen Arcana demonstrate how our efforts can foster learning experiences, propelling new developers into greater depths of coding and interfacing applications through commands. As much as I aim to support someone in launching their career in technology, I’m also growing parallelly, building my skills and understanding in these areas.

00:11:41.000 As I draw near the end, I implore you to consider what we've built. I urge you to reach out, engage, and contribute to the further development of this project. We aim to establish an ecosystem where innovative thinking can thrive and evolve. Embracing open-source initiatives, we collectively pave the path towards simpler and significantly more engaging solutions, ultimately ushering transformative changes in our interaction with technology. Thank you for your time!

00:20:45.130 Thank you.