Talks

WebRTC Change Communcations Forever

WebRTC Change Communcations Forever

by Greg Baugues

In the presentation titled 'WebRTC Change Communications Forever,' Greg Baugues, Developer Evangelist for Twilio, delves into the transformative potential of WebRTC (Web Real-Time Communication) technology, which facilitates native peer-to-peer communication directly within web browsers. The talk provides a historical context for the evolution of communication technologies, reflecting on advancements from the telegraph to the personal computer. Baugues emphasizes the implications of WebRTC across various applications. The key points include:

  • Evolution of Communication: The presentation begins with a nostalgic reference to EPCOT's depiction of past communication technologies and highlights that WebRTC represents the newest advancement in this long evolution.
  • What is WebRTC?: Baugues explains WebRTC as an open set of protocols that allows real-time communication via browsers without requiring external plugins. This provides developers with a simpler way to integrate voice, video, and data sharing into their applications.
  • APIs Offered by WebRTC: The talk introduces three primary JavaScript APIs within WebRTC: getUserMedia, RTC Peer Connection, and RTC Data Channel. These APIs enable media capture, connection establishment between browsers, and data exchange, allowing for innovative applications beyond traditional video communication.
  • Peer-to-Peer Advantage: WebRTC establishes secure, direct connections between browsers, which enhances data transfer speed and privacy by eliminating the need for a central server. This model opens opportunities for peer-to-peer applications like a distributed CDN and in-browser file sharing (similar to BitTorrent).
  • Real-World Applications: Baugues discusses how developers are leveraging WebRTC for various use cases beyond video conferencing, including gaming, collaborative tools, and customer service applications.
  • Challenges and Limitations: The presentation also addresses the current limitations of WebRTC, including compatibility issues with certain browsers (e.g., complete support in Chrome and Firefox, but limited support in iOS) and the need for signal management to establish connections.
  • Getting Started with WebRTC: Finally, Baugues encourages developers to explore building applications using WebRTC and highlights libraries like Simple WebRTC for simplifying this process. He also promotes Twilio Client as an option for integrating communication capabilities in web applications, enabling both browser-to-browser and browser-to-telephone interactions.

In conclusion, Baugues advocates for the significant impact WebRTC will have on the future of communication, urging developers to utilize this technology to enhance their applications and foster better communication solutions.

00:00:17.039 My name is Greg, and I am the Developer Evangelist for Twilio. Just out of curiosity, how many of you have heard of Twilio before? That's awesome! So cool! How many of you have used Twilio before? That is also so cool! I'm the Developer Evangelist for Twilio, and I live here in Chicago. I'm incredibly happy to be here and to speak at RailsConf. I have programmed for a good chunk of my life, but I picked up Rails about three years ago and just started working for Twilio about two months ago. It's been absolutely incredible to be able to speak at RailsConf in the city I love for this company. It feels like a big accomplishment for me.
00:00:40.960 If you were to come up to me afterwards and say, 'Hey Greg, you just spoke at RailsConf in the city you love! What are you going to do next?' I'm going to Disney World! I'm going on Saturday night. I wish I could say I planned it that way, but my wife and I just happened to plan a vacation at Disney World, and it happened to be that we're flying out the exact day after RailsConf, so that was nice.
00:01:18.000 When I was growing up, I absolutely loved Disney World. It was an amazing place for me; it was probably the happiest place on earth for me. So much so that when I was in college, I took a semester off and went to pick up trash at the Magic Kingdom for $5.85 an hour. It was awesome! When I was younger, Epcot was my favorite. I didn't have much use for castles, but Epcot stands for 'Experimental Prototype Community of Tomorrow.' It is a vision of the future, built to inspire us about what is possible with technology, and how technology can change our lives, presenting new ideas and new ways to communicate.
00:02:16.400 The big golf ball at Epcot is actually a ride called 'Spaceship Earth.' When you go inside, you get into a car that looks like this. It's not a fast ride, nor is it particularly exciting; instead, it is a 20-minute tribute to the history of communication. You go through and see different scenes, like the monks copying manuscripts by hand, and all the figures are animatronics—except one who might not be moving and could be snoring. Then you see the printing press during the Industrial Revolution, and the advent of the telegraph, marking the first time we had real-time communications over great distances.
00:03:03.040 Next comes the advent of the telephone system, marking the first time we could have real-time voice communication over distances. Back when I was a kid in the mid to late 80s, the last scene I remember was the vision of the future: a boy in America talking on a video screen to a girl in Japan. That was what we saw as the future back then. The thing about Epcot is that it presents a vision of the future, and the future eventually arrives.
00:03:31.120 Every ten years or so, they shut down Spaceship Earth for a few weeks to update it. I just found this out—I haven't been there in about ten years—but I want to see this next week. They added a new scene depicting what Disney considers an important contribution to human communication, like the telephone or the telegraph. This new scene features Steve Wozniak sitting in his garage, tinkering on the first Apple computer. Disney believes that the personal computer's advent and the proliferation of software have had a significant effect on human communication, just like the telephone. At Twilio, we agree with that.
00:04:13.120 Our mission statement is to change communications forever. That's the short version. The longer version is to change communications forever by migrating this industry from its legacy hardware to its future in software. It’s quite a mouthful, but it’s easier to explain it by talking about other industries because we're seeing this trend everywhere. Mark Andreessen famously said that 'software is eating the world.' We're witnessing many industries, which used to depend on hardware, now becoming playgrounds for those who can write software—people like you.
00:04:50.240 What if I said, 'Change hosting forever by migrating this industry from its legacy hardware to its future in software?' Who might I be talking about? Heroku? Amazon with AWS? It used to be that if you wanted to host a website, you needed to know how to do this complex setup. Now, when I want to host a new website or create a server, I simply click that blue button up there. I don’t have to know anything about hardware.
00:05:13.120 These guys are Digital Ocean, and I bet you’ve heard of them—they're awesome! I just started using them a few months ago; they have a booth downstairs. What about this: 'Change payments forever'? Who might that be? Square? PayPal? Braintree? It used to be that if you wanted to process a bunch of credit cards, you needed one of these devices, and five years later, that would be the same device that rolled off the line. But today, Square spends so little on the actual hardware that they give it away.
00:05:54.560 They’re not hardware people—they’re software people who are changing the payments industry. Now, what about changing transportation forever? Who's doing that? Uber! How would you like to be the guy who owns the company that makes those taxis? It was just five years ago when life looked pretty good, and they didn't have to change much. They sold almost the same product over the last ten years, and then these guys came along. Uber doesn't make any hardware; they’re software people.
00:06:09.760 As software people, I believe there is no other skill set you can have that gives you more power to impact the world right now than the skill set you all have here today. That’s what we’re passionate about—changing communications forever by migrating from legacy hardware to the future in software. How do we use software? How do we empower all of you to change communications? There's a new technology called WebRTC that has emerged in the last three years. It wasn't invented by us, but we love it. We believe it is going to change how the world communicates.
00:06:51.360 Who here has heard of WebRTC? Because you all showed up, it wasn't completely uninteresting for you. So what is WebRTC? WebRTC enables real-time communication in the browser via open peer-to-peer protocols. That's a bit of a mouthful, but let’s break it down. Real-time communication in a browser is not a new thing; we've had it on the internet for a while. Skype provided real-time voice communication over the internet about ten years ago.
00:07:19.039 Now we have Google Hangouts, which occurs in a browser but requires a plug-in to install. Real-time communication in the browser has existed but always required plug-ins—especially Flash—and then you have Facetime, which is a standalone app, not happening in the browser. I’m guessing everyone here is a web developer, and I have never attempted to work with media before. If any of you have, you know it's a pain in the ass. You have to figure out how to get access to the camera and microphone, capture that data, and manage the coding involved. It's much more complicated than just building CRUD apps in Rails.
00:08:06.080 WebRTC gives us all of that for free. WebRTC is an open set of protocols, a toolset that lives in the browser, allowing you access to video and audio. It captures that information for you, wraps it, and allows you to send it off. This is amazing! Previously, using Skype, you had to install third-party apps, meaning if the other person didn’t have it or didn’t want to install it, you couldn’t communicate.
00:09:00.960 If you wanted to build something that took advantage of real-time communication, you needed to know how to build a standalone app, which was often out of reach for many web developers. Now, if you know how to build browser apps—especially with JavaScript—you can build apps that utilize real-time communication. WebRTC comprises three main JavaScript APIs: one is called getUserMedia, which captures media for us, allowing access to the camera and the microphone to start recording and processing.
00:09:55.440 There’s another one called RTC Peer Connection, which we’ll discuss in a moment; it establishes a connection for communication and sending data. Finally, there’s RTC Data Channel, which is super interesting. When people discuss WebRTC, video and voice communication typically come to the forefront. However, many enthusiasts are most intrigued by the RTC Data Channel, which opens up numerous opportunities that haven't been fully explored yet.
00:10:47.200 With the RTC Peer Connection, we're not bouncing messages through a server. Instead, we establish a direct connection between two browsers. Some groups have leveraged that data channel to create a Peer CDN—an idea that allows visitors on your website to download assets and open direct connections with other visitors to serve that content.
00:11:32.000 Think of concepts like BitTorrent in the browser. WebRTC allows you to create peer-to-peer, secure, and encrypted connections between two browsers. This means it's fast because the data doesn't go through a server. You can have two people in Japan communicating directly without having their messages bounce off a server on the other side of the world. Additionally, all WebRTC transfers are encrypted, resolving many security concerns that people have today.
00:12:12.080 WebRTC is interoperable. For instance, if you use Skype, you can't communicate with someone on FaceTime or Google Hangouts. But WebRTC is an open protocol, meaning anyone with a browser can interact with your app. The fact that we're all using the same protocols means your app can communicate with another application, which is super cool. However, WebRTC is also a bit complicated, and it doesn't quite act like you'd expect. WebRTC does not handle signaling.
00:12:55.520 As a Rails guy, I don’t have comprehensive knowledge about network protocols. It might surprise you that the way these two machines communicate isn’t as simple as it seems. To establish their connection, they need to open a signal with each other. A lot of information needs to be shared back and forth before communication can start—this is done using SDP, or Session Description Protocol, that answers questions like, 'Are you there?' and 'What kind of content are you sending me?' It also manages encoding and the interpretation upon transmission.
00:13:57.760 More importantly, it determines where you are on the network. Computers don't connect directly to the internet; they are behind routers and use NAT to obtain a local IP address. There's a set of protocols to figure out where these computers are so one computer can recognize its location on the internet; this is called ICE, or Interactive Connectivity Establishment. One of these protocols is STUN, which helps with traversing the network. The concept behind this is simple: a server pings a STUN server, which replies with the computer's IP address, allowing direct connection setup.
00:14:52.480 If this doesn't work, there’s a protocol called TURN that can facilitate the connection. To be honest, I’m not overly interested in this aspect, but it’s important to note that WebRTC doesn’t handle signaling, and that can be a good thing. The reason is that there's a lot of debate about which signaling protocol is best. WebRTC is open-ended on how you can implement this based on your app.
00:16:07.840 WebRTC doesn’t scale well with multiple users either—it supports connections, but it was originally designed for one-to-one communication. The problem arises when adding additional participants, as every time someone new joins, a direct connection forms with every other participant, quickly increasing the number of connections and causing slowdowns.
00:16:38.080 For example, on Talkie.io, you can support three or four participants for video chat. Yet, adding more people results in performance issues, making the experience slow down considerably. Additionally, WebRTC doesn't facilitate telecommunication. Regardless of how cool it is for certain apps, businesses still rely heavily on the telephony network. If you're developing business applications, there's a good chance you’ll need to integrate with telephones as well.
00:17:37.840 If these downsides don’t deter you and you're excited to get started, there’s a library called Simple WebRTC built by our friends at An Yet. Henrik, who previously gave an awesome talk at Cascadia.js, developed it. It’s a collection of JavaScript tools that streamlines much of the work. There’s even a simple signaling server built in Node.js that helps resolve these problems.
00:18:45.139 He standardized the implementation across different browsers, as the way you implement something in Firefox is different from how you would do it in Chrome. His library serves as a wrapper to standardize these protocols. Of course, I work for Twilio, and it might surprise you that we have solutions that leverage some of these capabilities.
00:19:35.040 We offer Twilio Client, which enables communication in the browser. We also depend on WebRTC for some of our solutions—it’s not a maximum WebRTC solution, but it allows you to solve many of the challenges that come with it. If you decide to use Twilio Client, there’s a strong chance it can simplify communication using WebRTC.
00:20:39.760 Let me show you a quick demonstration of what it looks like to build an app that uses Twilio Client for browser-to-browser communication or telephone communication. I'll demonstrate this with a Twilio dashboard, where I’ll create a new app and call it 'My RailsConf Phone.' I also need to provide a URL for Twilio to direct requests for what to do once I try to make an outbound call.
00:21:20.600 Once I create my app, I'm given a unique identifier. I'm going to build this using Sinatra. Sinatra is super stripped down, so it’s similar to Rails in that you can define your routes in real-time. It allows quick development, so I’ll require Sinatra and the Twilio Ruby library.
00:22:09.440 Since everybody here is a Ruby programmer, but just in case, we have libraries for Python, Node, and even .NET. After requiring the libraries, I'll also set up some credentials for myself since I don’t want to share confidential information. This setup will work similarly to the root or index in Rails.
00:23:09.600 Next, I need to send a capability token to Twilio, allowing this client to interact on my account's behalf. I’ll create that capability and pass in my authentication information. Once I have my ability token, I’ll specify the capabilities I’m providing to the client. Here, I’d specify the ability to establish outgoing connections and give Twilio my application ID.
00:23:54.080 It's necessary to provide a unique name for Twilio to identify the client. After setting this up, I can generate my token and use ERB to embed that token into the HTML and JavaScript that I prepared ahead of time. Following the quick start guide, you can replicate this entire setup in about 15 to 20 minutes.
00:24:56.640 Here’s what the final product looks like. Even though I’m not much of a designer, the application consists of merely around 15 lines of HTML and 15 lines of JavaScript, which is all accessible in Twilio’s documentation. When I enter a phone number and hit the call button, Twilio sends that number and the capability token to their system.
00:25:38.880 Then, Twilio checks if I have permission to interact with that application. If everything looks good, Twilio goes searching for instructions on how to process that information. I’ve set up at this very URL the instructions for what to do when a call is placed.
00:26:14.940 Now, I need two phone numbers to make the call—'from' and 'to.' Twilio won’t allow arbitrary numbers as your caller ID for valid reasons. If you're using Twilio, you can purchase phone numbers for about a dollar, or you might use your own verified number.
00:27:05.840 Let me use my mobile number for the outgoing call and specify the destination number I want to call. Twilio, theoretically, would take care of reaching that destination, and the number will arrive as parameters.
00:27:25.920 My instructions to Twilio are straightforward; they consist of five lines. Whenever Twilio sends me an HTTP request, I return an XML response directing it to dial from this number to the destination number that I specified earlier. Now that I have everything set up, I’ll reload the app and open a soft phone window to initiate the call.
00:28:12.520 I’m calling my wife's phone right now because she’s not in on the details for this demonstration. You can see Chrome issuing a warning asking for permissions. This would have been handled via Flash before, but now it’s a native Chrome dialogue.
00:29:14.240 So, with that permission given, now I’m connected and can speak. I just want to show how simple it is to establish browser-to-phone communication. It’s incredible how straightforward it has become, and I appreciate the applause for that success. I only started doing this just two months ago, and live coding can be nerve-wracking!
00:30:23.840 That’s a very basic app built using Sinatra. But beyond that functionality, there’s so much more you can do with Twilio. For example, you can transcribe calls, manage conference calls with 40-100 participants, and even build call centers with numbers you purchased for a dollar.
00:31:10.120 This app called Zest Phone was built by a company out in California—they're a finance company using big data to improve loan decisions. They open-sourced the entire codebase, which is built on Rails. This codebase provides a comprehensive example of how to implement advanced Twilio functionality such as placing participants on hold or enabling supervisors to listen in on calls without being heard. You can Google 'Zest Phone' to see how it works.
00:31:59.280 You can easily integrate this call functionality into virtually any app. It's merely an add-on feature you can apply to systems needing to connect with callers, which opens up a world of possibilities. I'm excited about the way software is enabling those of us, who traditionally struggled with telecom, to make significant impacts without the need for extensive knowledge or initial investments.
00:32:44.560 It used to cost a hundred thousand dollars to set up a call center but now, with a free Twilio account, you can do it for just a couple of dollars. So, I encourage you all to explore WebRTC, Twilio Client, and harness the skills you have to help change the world. Thank you very much!