Building Asynchronous Communication Layer w XMPP, Ruby, Javascript

by Andrew Carter and Steve Jang

In this video presentation at Rails Conf 2012, speakers Andrew Carter and Steve Jang discuss their implementation of an asynchronous communication layer using XMPP for their device automation framework at Hulu. They illustrate how they leveraged existing tools like Strophe.js, XMPP4R, and ejabberd to integrate various living room devices, facilitating programmability and automation.

Key points discussed in the presentation include:

Background and Context: The evolution of Hulu's platform from web-only to supporting multiple devices (over 60 from 17 manufacturers), highlighting challenges faced in device testing and content delivery.
Need for Automation: The speakers explained the complexity of streaming video and how manual testing was inefficient, leading them to create a more robust toolchain.
Introduction of XMPP: They determined that XMPP was an ideal solution for bi-directional communication due to its low latency and peer-to-peer capabilities, contrasting it with other methods they considered (like TCP connections or polling).
Components of the Architecture: The framework was designed with a clear separation of components:
- ejabberd: The chosen XMPP server which facilitates communication.
- XMPP4R: The Ruby library that handles server interaction on the Ruby side of their automation framework.
- Strophe.js: A JavaScript library used for communicating with the XMPP server from the client devices.
Command and Response Handling: The design enables commands to be sent to devices with immediate responses, enhancing testing efficiency. They demonstrated through pseudo-code how to create a synchronous API to facilitate easier test-writing.
Key Features and Advantages: The use of established components allowed the team to focus on their automation tasks without reinventing technologies, significantly speeding up the process and simplifying debugging.
Takeaways and Future Directions: The presenters concluded that using XMPP proved invaluable for their needs and opened doors for potential future applications in automated testing and device provisioning, advocating for its broader use beyond chat services.

The session reflects the practical challenges of developing on diverse platforms and highlights the utility of XMPP in streamlining communications in automated testing environments, which can reduce overhead and increase efficiency in software development workflows.

00:00:26.480 Hello, everybody. My name is Andrew Carter, and my colleague, Steve Jang, will be presenting with me today. We work in Seattle, at our new office, primarily on television devices like Roku, PlayStation, and Xbox. Today, we'll discuss a project that we built using mainly XMPP, Ruby, and JavaScript, allowing us to add programmability and automation to these television devices.

00:00:40.039 To start, let me share a brief history. In 2007, Hulu launched with the mission to help people find and enjoy premium content. We were web-only, targeting Safari, Firefox, and Internet Explorer. The site was built on Ruby on Rails, and we still use Rails today. Hulu started off strong, but viewers prefer watching content on their television screens. Fast forward to today: Hulu.com is still operational, and we've launched Hulu Plus, our initiative to bring Hulu to mobile and living room devices, marking our expansion into the international market, starting with Japan.

00:01:34.959 Currently, Hulu operates on over 60 devices from 17 different manufacturers. These devices present unique challenges that differ significantly from targeting web applications. We developed our platform across many frameworks, utilizing JavaScript both with and without THS, and many native UI components like iOS native on Apple devices and Android native on Android devices. Some platforms even rely on Flash, showcasing the complexity of these television devices' architectures.

00:02:25.000 Each platform comes with distinct tools of varying quality, and the complexity of streaming video poses numerous challenges. Achieving reproducible conditions and predictability is difficult, and we're heavily reliant on the quality of network connections, which can be tricky to simulate. Furthermore, differences in hardware—such as Roku versus PlayStation—create additional challenges. Platforms vary in their playback engines, and in some cases, we might be pioneering the use of certain video engines. For example, we often use live streaming—a method pioneered by Apple, and it has become a quasi-standard.

00:03:00.879 Despite the established usage, many devices are implementing their engines for the first time, and our approach of injecting dynamic ads into streams complicates things further. While most engines cater to traditional pay-per-view movies, switching contexts and dynamically introducing ads creates significant potential for issues in both our code and our partners' code. This complexity also makes testing time-consuming—it's not quite as fun as watching TV all day might sound!

00:04:12.599 We found that we needed a new toolchain capable of supporting these diverse platforms. Ideally, it should work with HTTP, be scriptable, and reliably measure quality of service, fostering reproducible conditions and measuring our impact. Consequently, we developed a project internally that we call 'Vendor,' which serves as the backbone of our automation framework. Now, Steve will walk you through the architecture of this framework, detailing the components we utilized to build it and sharing the lessons learned along the way.

00:05:00.320 Thanks, Andrew. I'm Steve. At our office, we really enjoy using Vendor, which will be the focus of our discussion here. The architecture of our automation framework looks like this: On the left side, we have a laptop running our certification suite, which is written in Ruby. On the right, we have a Roku device, which operates on a JavaScript-based engine.

00:05:22.280 The highlighted components here are central to our talk today, and they're very standard components likely to pique everyone's interest. Let's dive into the section dedicated to our XMPP session, which constitutes a crucial part of our automation infrastructure. We chose ejabberd as our XMPP server. The certification test suite establishes an XMPP session with the device, enabling us to send commands and receive responses from it. The devices are also capable of independently firing events.

00:05:56.160 For instance, when an ad starts playing, we need to fire the corresponding event to indicate its progress. As we run our tests, we collect these events and at the end of the session, we verify the interactions with the device and the server logs, ensuring we capture all relevant activity recorded on the server side.

00:07:20.480 Reflecting on our approach to XMPP, various methods could facilitate the required bidirectional messaging. One could consider opening a TCP connection directly from the script to the device and then sending messages. While that method is straightforward, building a socket-based solution atop JavaScript is quite a cumbersome task; additionally, it does not feel appropriate to expose device ports even in testing scenarios.

00:08:02.960 Another viable approach involves making short calls from the device to a shared server, creating a two-way communication channel. However, such a setup introduces latency depending on the polling interval, which could impact the timely execution of commands. The preferable solution turned out to be long polling, which minimizes latency by maintaining an active connection, allowing for immediate command transmission.

00:09:01.080 There are several long polling implementations we evaluated, including Comet and Nginx HTTP push modules that allow for streamlined push operations. However, these implementations traditionally target client-server models, whereas our requirements suggest a peer-to-peer relationship between the script and the device. Whichever option we selected meant implementing features like security and routing commands effectively, which is manageable but due to XMPP’s design, it was inherently suited to our scenario.

00:10:34.000 Allow me to provide more context about the components we utilized to construct our automation framework. As you may already know, XMPP stands for Extensible Messaging and Presence Protocol—a peer-to-peer communication protocol. It's commonly employed in applications like Google Talk and Facebook Chat, characterized by its asynchronous nature and XML-based structure.

00:11:50.400 To elaborate, when your client connects to the XMPP server, it initiates an XML stream—establishing a unidirectional flow of information to the server while the server simultaneously returns its own stream. The message exchanges transpire in the form of XML fragments known as stanzas. This facilitates an efficient routing mechanism that allows clients to send and receive messages seamlessly.

00:12:45.320 XMPP brings along built-in support for TLS and SSL, which simplifies implementation efforts for security. Because of the protocol's standards, there’s extensive support for managing presence information and role management—everything we need to devise networks of devices communicating with one another in our automation framework.

00:14:06.800 One decided advantage is that XMPP facilitates HTTP participation in our framework. We also adopted BOSH, which is essentially a long polling mechanism created by the XMPP standards organization—perfect for constrained client environments due to its lightweight usage of HTTP without heavy overhead like cookies or headers.

00:15:34.160 The BOSH protocol’s compatibility with HTTP 1.0 is another boon, particularly since it avoids the chunk transfer coding risks presented by other protocols, especially when intermediaries like HTTP proxies might buffer chunks and delay message delivery. Lastly, in this transport layer, we employed ejabberd—an XMPP server with an impressive track record for strength and scalability. You can set up an account at j.org or use it for your chat needs.

00:16:26.160 For installation, we appreciate using Chef for automated deployments; our team developed a Chef recipe to install the ejabberd server. While not yet open sourced, it's our hope to share this at a later stage. We run the Nginx server as root to open port 80 for HTTP. This proxies to ejabberd running under a regular user account, enabling safe communications.

00:17:43.360 Moving on, XMPP4R is the Ruby library we leveraged on our Ruby side, acting as the transport layer for our certification test suite. This Ruby implementation of an XMPP client offers an asynchronous API—registering callbacks that activate when messages are received from the other side. Importantly, it utilizes a SAX parser to keep operations efficient.

00:18:42.160 We found many valuable alternative Ruby implementations of XMPP that may be better suited for cases where you intend to push data from the server to the client rather than bi-directional communication. For our context, we need a lightweight library that interacts appropriately without excessive overhead.

00:19:35.360 Next, we discussed Strophe, which is particularly relevant as many of the devices we're testing are not conventional web browsers. While offering a JavaScript engine, they lack the robust behavior of well-defined browsers. Strophe is a lightweight library, minimizing DOM-related processes, yet it's nuanced enough to accommodate our needs.

00:20:32.920 However, since some of our platforms lack full DOM support, we had to implement workarounds to effectively manage DOM-related tasks. Although the small size of Strophe is advantageous, it also means limited features that are essential, such as XMPP protocol extensions. For example, we require devices to initiate their own registration upon waking, necessitating a custom extension that leverages Strophe’s plugin capability.

00:21:31.840 This accommodates scenarios in which devices need to auto-register without demanding extensive manual configuration. Our plugin architecture allows for adding simple functionalities without complicating the primary library structure, improving overall manageability and efficiency.

00:22:45.920 By sending a registration message to the server, we establish the necessary user account for devices, underpinning their unique identities and ensuring successful communication. This registration process hinges on generating XML fragments with the username and password, as prescribed in the XMPP protocol, allowing devices to join the network.

00:23:39.040 With this robust architecture in place, we effectively harness the capabilities of the XMPP protocol to meet our project requirements within our automation framework. As an internal gem, Vendor provided us the ability to multicast actions across devices seamlessly, allowing for efficient orchestration.

00:24:45.680 Regarding Vendor, we aim to provide a simplified API for developers creating certification tests, structured in a synchronous manner for ease of understanding. For example, when testing interactive commands, we preferred not to delegate control to callback-style mechanics, which could bloat and complicate the testing approach.

00:25:42.000 Instead, we designed a straightforward command-response interaction pattern where actions like loading a device, confirming logins, and activating devices before engaging with video content is seamless and efficient. This consolidated structure allows testers to encapsulate the relevant interactions and expect predictable outcomes from those interactions.

00:26:43.480 We’ve also captured logs of events over XMPP, giving us invaluable insights into the interactions between the device and the server, and transforming our certification suites into straightforward code. Each command-enhanced environment increases our ability to ensure that test coverage is thorough and reliable.

00:27:48.080 Throughout our implementation, we have learned that leveraging existing off-the-shelf tools and enhancing them strategically allows us to address complex scenarios without reinventing the wheel. The modular nature of Vendor ensures that we can iterate and expand our capabilities further as new requirements present themselves.

00:28:40.000 As Andrew mentioned, utilizing ready-made solutions enabled us to focus on solving the problems we faced rather than developing our technology from the ground up. The asynchronous capabilities of XMPP have proven invaluable, contributing to the efficiency of our automation processes. As we extend our capabilities, we also appreciate the flexibility that XMPP gives us that can accommodate various applications beyond just testing.

00:29:55.240 In keeping with this adaptable framework, we think creatively about working toward automated test grids. For example, deploying new builds of applications across various testing devices, which would enable comprehensive testing capabilities without human intervention, alleviating the need to manually trigger extensive test scenarios.

00:30:54.000 This kind of implementation is not just theoretical; extending our existing framework and embedding additional features is a distinct possibility. The aim is for our devices to ‘self-provision’ and seamlessly operate within a well-defined network, allowing us to enhance our testing capabilities.

00:31:44.480 To summarize, our exploration of XMPP has yielded insights into its underutilization across broader applications. There exists potential for adopting this mature technology not only for messaging but also for automating cross-platform interactions between services. Its asynchronous messaging features illuminate opportunities in use cases outside the traditional chat applications, promoting reliability and responsiveness.

00:32:49.760 To reiterate the observations we’ve made, opting for existing technologies helped focus on solving actual problems instead of crafting novel solutions. This streamlining has optimized our workflows, thus enhancing responsiveness in our operations. Additionally, we leveraged existing testing frameworks to fill in gaps with our tailored assertions, making for comprehensible test suites.

00:33:23.560 While we also engaged in discussions surrounding messaging systems like AMQP for its heavier characteristics, we concluded that XMPP serves our purposes efficiently without the unnecessary weight. The lightweight, low-latency nature of XMPP strikes the right balance for our automation framework.

00:34:01.440 Tools like Strophe significantly enhance our operational capabilities, allowing us to orchestrate seamless communications across devices effectively. While a few complexities arose from lack of DOM support on certain devices, we managed to implement workarounds to mitigate these challenges. The very nature of this undertaking is a compelling proposition, and we're eager to further extend our exploration into XMPP's capabilities.

00:34:34.080 In terms of future considerations, we assess the current XML handling capabilities of libraries like XMPP4R, taking notes of specific challenges. REXML, while initially providing functionality, exhibits performance issues that might hinder scalability. We're contemplating a more robust implementation for XML processing, which ultimately could unlock further potential for our framework.

00:35:53.560 In conclusion, we've seen a variety of use cases for the XMPP protocol. Beyond handling basic messaging, there lies potential for deploying automated testing across multiple platforms. Our experience underscores not only the capabilities of XMPP but also its adaptability to numerous real-world requirements.

00:36:41.600 So, if anyone has questions, we’re happy to delve deeper into any specifics from our exploration of XMPP, or provide clarity on our automation framework.

00:37:32.640 Additionally, we're keen to discuss other aspects of the project, including tools we utilized like BOSH or how we managed different device connections. Conversations around our implementation and setup of ejabberd also hold merit, given its significance in our protocol implementation.

00:38:17.440 So, if you're interested in diving into technical nuances, exploring various server-client integrations, or simply grasping core concepts surrounding our architecture, we welcome all inquiries.

00:39:00.320 Let’s keep the dialogue open. Whether you wish to share insights, seek opinions on frameworks or libraries, or gain feedback on testing methodologies, we’re all ears.

00:40:07.240 Thank you for your attention. We appreciate the opportunity to share our journey, and we hope it sparks inspiration for your own projects.