Natural Language Processing
Now Hear This! Putting Real-Time Voice, Video and Text into Rails

Summarized using AI

Now Hear This! Putting Real-Time Voice, Video and Text into Rails

Ben Klang • April 21, 2015 • Atlanta, GA

In his talk titled 'Now Hear This! Putting Real-Time Voice, Video, and Text into Rails,' Ben Klang explores the integration of contextual communication within applications using WebRTC technology. He emphasizes that while traditional communication methods like phone calls can be effective, they often lose vital context and do not integrate into the broader business processes. Klang introduces WebRTC as an emerging solution that allows for real-time communication directly within web applications without the need for plugins, utilizing device inputs like microphones and cameras.

Key points discussed in the talk include:

- Limitations of Traditional Communication: Phone calls fail to record discussions and provide a narrowband communication experience that excludes multimedia sharing.

- Introduction to WebRTC: WebRTC enables peer-to-peer connections, allowing seamless media exchange and high-quality audio/video without additional software. It leverages codecs such as Opus and video formats like H.264 and VP8, emphasizing security and encryption.

- Modernizing Communication Architecture: Klang contrasts the existing telecommunication infrastructure with WebRTC’s more flexible architecture that separates signaling from media and promotes high-performance connections.

- Adaptability and Fluidity in Apps: Essential design principles for communication tools include ensuring that applications can adapt to different devices and support fluid transitions between conversation types (text, voice, video).

- Contextual Communication: Enhanced user experience by embedding relevant information within conversations while maintaining user trust through transparent permissions and identity management.

- Practical Applications: Klang highlights three potential applications for WebRTC: a live anonymous matchmaking service, an instant response tool for tech support, and a patient services app that integrates medical records and secure authentication.

- WebRTC Demonstration: The session features a demo showcasing how WebRTC facilitates direct browser communication and speech recognition for smoother interactions.

In conclusion, Klang urges developers to explore the potential of WebRTC and how it can enrich user experiences by integrating real-time communication contextually into applications. Resources for learning more about WebRTC are made available, encouraging the adoption of this technology among developers.

Now Hear This! Putting Real-Time Voice, Video and Text into Rails
Ben Klang • April 21, 2015 • Atlanta, GA

By, Ben Klang
When you want to talk to someone, where do you turn? Skype? Slack or HipChat? Maybe even an old-fashioned telephone? As great (or not) as these are, they all fail in one important way: Context. As developers, why don’t we enable our users to communicate where they are doing everything else, right inside the browser or mobile app? The technology to make contextual communications is evolving quickly with exciting technologies like WebRTC, speech recognition and natural language processing. This talk is about how to apply those building blocks and bring contextual communication to your apps.

RailsConf 2015

00:00:12.320 Welcome to the session titled, 'Now Hear This! Putting Real-Time Voice, Video, and Text into Rails.'
00:00:17.760 My name is Ben Klang and I am proud to be from the city of Atlanta. I hope you all have enjoyed your time here in Atlanta so far.
00:00:24.000 You may know me through some of my open-source contributions. Quick show of hands—has anyone heard of the Gear team? Awesome! How many of you have actually used it?
00:00:31.039 Great! I'm not going to talk about Gear today, but I do want to mention it briefly because it is relevant to our discussion. Gear is an open-source framework for voice applications, similar to Rails, but focused on communication.
00:00:43.680 I'm also the founder of a company called Mojolingo, based here in Atlanta. We work with voiceover patrons, helping them build scalable user interfaces.
00:00:55.440 Today, I want to discuss why the web is a lot like outer space—because on the web, no one can hear you scream.
00:01:06.960 Let me paint a scenario for you: You're working on your app and suddenly realize you need to speak with one of your customers. Most of you might pick up the phone to make that call.
00:01:24.080 The main problem with this approach is that any communication that happens via the phone is now outside of your business process. It isn't noted within your business application, and it typically isn't recorded. The fact that you made a call often doesn’t reflect in the work you're doing for your customers.
00:01:48.720 Furthermore, this communication is limited. You have a narrowband audio signal and can't easily share pictures or links, making for a frustrating and ineffective communication experience. Wouldn't it be cool if we could integrate communication directly into the application?
00:02:19.920 This leads us to WebRTC. By a show of hands, how many of you have heard of WebRTC? Great! The awareness is growing—what about those who’ve tried it?
00:02:31.519 For those unfamiliar, WebRTC (Web Real-Time Communication) is fundamentally about utilizing the microphone, speaker, and camera directly in a web application without needing any plugins. This means you can create real-time communication applications using these components seamlessly.
00:03:02.640 WebRTC also provides built-in functionality for establishing peer-to-peer connections between two or more users. This connectivity can be tricky due to things like firewalls, but WebRTC has mechanisms to help traverse these connection environments.
00:03:35.360 Additionally, it offers a common set of codecs to exchange high-definition media. Notably, Opus, G711, H.264, and VP8 are used for transmitting audio and video efficiently. Opus is particularly remarkable, originating from significant research, including contributions from Skype.
00:04:05.200 Opus is efficient for transmitting voice while also capable of transmitting high-quality music, and importantly, it is royalty-free. H.264 and VP8 are competing standards for video transmission, with VP8 being fully open-source.
00:04:43.199 WebRTC's built-in standards ensure high-quality audio and video. It utilizes protocols like SDP (Session Description Protocol) to ensure that endpoints can exchange information about their media capabilities securely.
00:05:29.280 One of the key benefits of WebRTC is its focus on security; it employs encryption by default, which helps protect conversations from unauthorized access. With this technology, we can communicate without the fear of eavesdropping.
00:06:10.479 It’s important to remember that we are not just trying to replace the telephone with a web browser. We can do so much more. The web offers a rich palette of user interface possibilities, and WebRTC provides a means to enrich communications within those applications.
00:07:20.160 Next, let's consider the relevance of WebRTC. A chart from Dean Mobley projects the adoption growth of WebRTC, revealing that browsers supporting it continue to rise in numbers—over a billion devices support WebRTC at this point, including mobile devices.
00:07:44.800 I want to provide a quick background on how communications are set up today. Most people use their phones for calling; for instance, when Alice wants to call Bob, she dials his number. This experience relies heavily on the interconnected relationships between various carriers, allowing everyone to call everyone else.
00:08:20.480 While this system has its advantages, it comes with significant drawbacks. Innovations can be stifled due to the overhead of coordinating multiple companies, and the user experience is not particularly friendly, as people's identities become linked to a series of random numbers.
00:09:08.640 An alternative architecture, like that used by Skype, centralizes services. This allows for rapid innovation, resulting in features like video and high-definition calls. However, it still doesn’t fully enable businesses to integrate communications into their existing processes.
00:09:32.960 With WebRTC, we can build a more flexible communication architecture. The signaling happens through a web application, and the media is exchanged directly between endpoints, leading to greater performance and quality.
00:10:01.680 For example, if Alice initiates communication with Bob, she sends a request to a web service with session details that include her IP address and supported codecs. The server simply forwards this information to Bob, who generates a response in a similar format.
00:10:50.080 Various protocols like ICE, STUN, and TURN help facilitate establishing connections, even through firewalls. ICE determines available networking interfaces, while STUN helps discover external IP addresses, and TURN relays media when direct communication is impossible.
00:11:47.760 The architecture of WebRTC emphasizes the separation of signaling and media. This allows for efficient communication, even in scenarios where a direct line is not available. With built-in encryption, the media exchanged remains secure, regardless of whether it’s traversing a relay or a direct connection.
00:12:52.800 Regarding signaling, although I use web servers as an example, it’s crucial to note that any medium can facilitate these messages. For instance, we could also deploy this with XMPP or even use USB drives to transfer the session details.
00:13:05.760 Now, let's pivot to applications and what we can build with WebRTC. In my experience building these applications, I have identified two main tenets to consider when designing communication tools: adaptability and fluidity.
00:13:50.480 A modern voice application should take advantage of the various capabilities of devices. It should be fluid; users should be able to transition between different devices, timeframes, or users while maintaining the context of the conversation.
00:14:50.160 Furthermore, it should be contextual to the tasks or applications users are engaged in. Trustworthiness is essential—no one wants to communicate sensitive information if they cannot trust the system. Lastly, the application should provide references to conversations through persistent and shareable URLs.
00:15:46.560 Let’s explore adaptability: If Alice is using Firefox, she has a range of input options such as text or video chat, while other users might be using smartphones or other devices. It’s vital for apps to enable various forms of participation.
00:17:02.720 Fluidity means users should be able to start a conversation in one format, such as chat, and transition seamlessly to audio or video as needed. This keeps the conversation unified and maintains context across different communication styles.
00:18:03.520 Contextual communications mean embedding relevant information directly into the conversation. An example could include notifications of waiting callers or recent sales data. It is vital to enrich the conversation with this kind of contextual information.
00:19:05.360 One way to ensure trustworthiness is to avoid surprising the user. Confidential conversations should remain private. Additionally, providing appropriate and clear permissions for features like mic or camera access is crucial to keeping the user in control. Transparency is key.
00:20:37.680 Integrating identity management into communications is essential. By using OAuth and social logins (such as Facebook or Twitter), we can enhance trust during communications. Each conversation should also be referenced accurately, allowing for URLs that lead to the specific conversational context.
00:21:25.760 I want to explore three example applications that illustrate these principles. First, envision a live, anonymous matchmaking service where users can connect over video while maintaining privacy—this could create safe environments for introductions without sharing explicit contact information.
00:22:48.160 Second, consider an instant response app for tech support or outages. If an issue occurs, this app could integrate various channels—chat, voice, and video—allowing users to discuss the problem while accessing real-time data from relevant monitoring tools.
00:23:11.200 Lastly, a patient services application could facilitate medical calls effectively. Users could call their medical professionals without needing to memorize numbers, and the app could automatically pull in relevant records for context. It would employ secure authentication to protect patient identity while ensuring sensitive information is readily accessible.
00:24:30.160 Now, let’s move into a demo of WebRTC in action. I created a simple nugget application designed to demonstrate my points. Here, you will see the integration of video feeds from two different browsers with a built-in speech recognition feature for new interactions.
00:25:32.400 WebRTC allows for direct communication between browsers. I’ve set this up such that one browser can relay information to another, and I can send voice commands to navigate and control a remote camera. This showcases how WebRTC integrates audio, video, and text seamlessly.
00:26:50.640 Below, you can observe the browser’s capabilities to turn inputs into understandable commands, allowing for intuitive interactions. The potential applications extend far beyond simple communication to include smooth user experiences and real-time feedback.
00:27:39.840 To wrap up, if you're looking for further resources on WebRTC, the official website offers a comprehensive collection of samples and documentation—many samples are live and directly demonstrate its capabilities.
00:28:38.720 Additionally, initiatives like the WebRTC Challenge seek to engage a million developers in using this technology by 2020, and I highly recommend looking into these developments. If any Ruby developers are interested in using WebRTC, explore libraries like Ruby Speech and others to enhance your applications.
00:29:41.280 Finally, I would love to open the floor to questions and discussions. What are your thoughts? Are there any uncertainties or points I can clarify about this technology or its applications?
Explore all talks recorded at RailsConf 2015
+118