Telephony
State of the Art Telephony with Ruby
Summarized using AI

State of the Art Telephony with Ruby

by Ben Klang

In this presentation titled "State of the Art Telephony with Ruby" by Ben Klang at the LoneStarRuby Conf 2011, the evolution of telephony applications over the last decade is explored, highlighting Ruby's role in this revolution. Ben introduces Adhearsion, an open-source framework designed for voice application development, likening it to Rails for web applications. The talk covers the following key points:

  • Adhearsion Overview: Adhearsion serves as a powerful toolkit for creating voice applications, providing features like thread management, eventing subsystems, and a plugin architecture.
  • Simplicity in Development: It abstracts telephony tasks into simple, readable commands that integrate seamlessly with Ruby applications, allowing developers to leverage existing Rails code without steep learning curves.
  • Advanced Features: The framework supports complex functionalities such as IVR menus, queues, conferences, text-to-speech through Ruby Speech library, and voice recognition, enhancing the caller experience.
  • Integration with Telephony Platforms: Adhearsion works well with widely-used systems like Asterisk and Tropo, facilitating swift deployment of telephony applications. Tropo, a cloud-based service, is described as particularly user-friendly, while Asterisk is noted for its wider feature set despite its complexity.
  • Call Architecture: It starts with a phone call routing through either Asterisk or Tropo into Adhearsion where custom application code executes; this structure is essential for managing voice interactions.
  • Community and Resources: Klang emphasizes the importance of community support, highlighting the growth of Adhearsion since its prior versions and introducing the new features in Adhearsion 1.2. He invites contributions from users to expand the framework's capabilities.
  • Conclusion and Call to Action: The conference in San Francisco is promoted for further learning and networking opportunities among developers interested in telephony applications. Overall, the presentation illustrates how Ruby combined with Adhearsion can simplify and enhance the development of telephony solutions, advocating for wider adoption and collaboration within the developer community.
00:00:00.539 Laughs.
00:00:20.000 My name is Ben Klang. I'm the founder of Hydro Lingo, a company based out of Atlanta, Georgia. We build voice applications, and I'm here to talk to you today about state-of-the-art telephony with Ruby.
00:00:26.939 First, I just want to ask who in here has actually tried to build a voice application? Cool! Okay, then what happens in production?
00:00:35.640 Awesome! So, my next question is, who thinks this is true? Using Astricon? That's true, yeah. I don’t want to freak you out, so I'm going to show you some Ruby code. The comfort zone is having it easy and free. You can see we have really easy methods, and we have a little ActiveRecord in there. I want to make sure to let you know that this talk is not going to be a deep dive into the internals of like Asterisk.
00:01:02.940 I'm the maintainer of the Adhearsion project. Has anyone here heard of Adhearsion? Cool! So, Adhearsion is a voice application development framework. It's really the only one of its kind in the open source world; it's like Rails as a web framework. Adhearsion is a framework for voice application development. It's more than just an API; it gives you a lot of functionality. It includes thread management for all the different calls coming through, an eventing subsystem, a plug-in system to add new functionality, and it handles demonization.
00:01:27.540 So, it's actually a full kit for writing voice applications. It's also completely independent from Rails, but it can run standalone and integrate with Rails. So, I'm guessing many of you, this being a Ruby conference, have written Rails apps and have a lot of investment in models and helpers. Adhearsion can load your Rails application, load all of those helpers, load in all those models, and reuse a ton of code right off the bat. It integrates really nicely.
00:01:59.820 What this means is Adhearsion is voice applications done in the Ruby way. I want to talk for a minute about what Adhearsion’s features are.
00:02:03.780 On that third slide, you can see the beginning of our DSL (Domain Specific Language). Most of the telephony stuff is abstracted to simple commands like 'answer,' 'hang up,' 'speak,' and 'play a recording'—things that are very straightforward and easy to read through when you're actually reading the code. It is all native Ruby code; there's no magic here. It's all classes and methods, just like you're used to. You can unit test it and use all the same tools you would to develop a web application.
00:02:37.440 Adhearsion also has a powerful eventing system, so in addition to the procedural code you might write to interact with your caller, you also get a stream of events coming from the telephony application. These events include actions like when a new channel is established, when a button is pressed on the keypad, or when a person joins or leaves a conference. We'll do a demo of how that looks shortly.
00:02:59.790 It also has really advanced voice features. We take advantage of high-level constructs as part of the DSL. For instance, we wrap up common patterns in voice application design like IVR (Interactive Voice Response) menus. In an IVR menu, you might need to play a prompt and then have three or four options for the user to choose from. If they provide input that doesn't match any options, you can give them a chance to try again, perhaps by playing a message like, 'Please press two for more information.'
00:03:44.880 So the menu abstracts that. Instead of writing all that logic yourself, you set up a menu state and define which piece of code corresponds with each option. Adhearsion also supports queues and conferences; if you have a caller, you can join them to a queue just like you would do in Ruby. For conferences, you can join your caller to the conference too. Another feature is text-to-speech, which allows us to render speech through various speech engines.
00:04:57.179 We integrate a library called Ruby Speech, which will generate something called SSML (Speech Synthesis Markup Language). You don’t have to use SSML if you don’t want it, but if you do, it gives you a ton of control over things like the rate of text rendering, intonation, and voice characteristics. We also have some basic support for voice recognition, which is usually a feature of telephony platforms. You can provide it GRXML (Grammar Recognition XML) to specify the kinds of things you’re going to ask users. For example, if you ask a user for their favorite color, you'd expect an actual color back instead of something like 'puppy dog.'
00:06:07.680 The Ruby Speech Library, which is a companion project to Adhearsion, is a Ruby way of generating both SSML and GRXML, making it easy to work with these standards in a Ruby-native manner. Adhearsion supports call progress and machine detection as well. Call progress is essential; when you're placing an outbound call, you want to know if it's ringing, answered, or hung up. You'll also want to know if someone's picked up an answer machine, and Adhearsion has features to take these factors into account and allow you to make decisions as the call progresses.
00:06:38.040 Let me talk for a minute about the architecture of Adhearsion. Everything starts with a phone call, and that call has to go somewhere, typically through Asterisk. So, how many of you here have used Asterisk before? It's open source, free, and you can run it on your own server. Many organizations are already using Asterisk, so if you're looking to add voice capabilities, there's a good chance you already have Asterisk somewhere in your infrastructure.
00:07:11.220 Although it has a steep learning curve, Asterisk is widely deployed. It does come with an eventing system that we'll discuss further, but it can be tricky to set up. There are distributions like AsteriskNow and FreePBX that make it easier, and Adhearsion is compatible with those. There are some articles on my blog about integrating them. Another downside is that Asterisk does not ship with text-to-speech or speech recognition out of the box; those features require open-source packages or commercial licensing.
00:08:06.480 As an alternative, Adhearsion also connects with Tropo, which allows for similar functionality. The beauty of this is that if you've written an app against Asterisk using AGI (Asterisk Gateway Interface), chances are it will port to Tropo with little to no code changes. The advantage of Tropo is it is incredibly simple to set up; you can sign up for an account and allocate a phone number, and within five minutes you can start handling calls.
00:08:29.760 Moreover, Tropo offers great text-to-speech and speech recognition out of the box without additional fees. If you're not familiar, Tropo is a cloud-based service, which means you don’t have to manage servers yourself. Tropo also supports SMS and international calls directly, whereas Asterisk requires third-party providers for SMS functionality. However, the downside to using Tropo is that being cloud-based raises concerns about privacy and potential downtime.
00:09:50.460 Also, Tropo does not come with a library of sounds and prompts; Asterisk has a rich library that can be utilized easily. As a side note, Jason showed me how he presented on Adhearsion with Asterisk here a year ago, and wrote the first version of integration between Tropo and Adhearsion on the flight here. Since then, we've improved the integration significantly, achieving nearly 100% coverage of the AGI spec and now have production customers running on the integration.
00:10:54.720 Now, regarding the call flow, after a call comes in, it needs to route through either Asterisk or Tropo into Adhearsion, where your application code will run. Adhearsion communicates with various backends, including any SQL server supported by ActiveRecord, LDAP through ActiveLDAP, and XMPP for job reporting. If you can’t find the data in these sources, almost anything else can be accessed using a REST call. The best part about Adhearsion is that if you need more functionality, you can just install the appropriate gem and start using that code.
00:12:07.500 In honor of LoneStarRuby Conf, I’m excited to announce that we are releasing Adhearsion 1.2 today with some cool new features. This release is a result of partnering with two of our clients to greatly improve the text-to-speech support, and we are really happy with how it looks. There will be a blog post on my website discussing some of the new features, and of course, there will be formal release announcements as well.
00:12:58.740 The dial plan is the file in Adhearsion that's consulted first. It defines the basic rules regarding what to say, when to say it, and which parts of your code to access. The console function is much like the Rails console; it provides an environment loaded with all of your models and helpers so you can easily test code. The Adhearsion console functions similarly, allowing you to access all the models and components in your environment. Components in Adhearsion are a plug-in framework that allows you to add, reuse, and share code. Since version 1.0, almost a year ago, you can also install components via gems.
00:14:41.880 An example of this is the previously known `hoptoad`, now called `Airbrake`, which you can install from RubyGems. You can drop a YAML config file into the directory, and any exceptions encountered by your Adhearsion application will be reported to Airbrake. Components provide an excellent resource for useful code, which I'm a fan of.
00:14:57.500 We also have a website that serves as a repository for community-contributed components. We have a growing collection and are always eager for new contributions, which can be found at ahnhub.com. All this information is linked from our main website, which I'll show at the end of the talk.
00:15:05.339 I hope you found this hugely interesting. I'm passionate about this stuff—I think it's the coolest thing since sliced bread! If you feel similarly, we have a conference in October in San Francisco, sponsored by Oxala Labs. It's a free conference; you just need to get there. It will feature two days of presentations by Adhearsion developers and companies successfully using Adhearsion in fascinating ways. We'll also have time for rapid programming for those who want to prototype and explore how it works. Last year was a great experience, and I look forward to it again this year.
00:14:50.000 One of the great things about Adhearsion is the strength of its community. We maintain an active IRC channel, and of course, we have conventions for further interaction. It's an exciting project with plenty of opportunities for growth and contribution ahead. That's all I have for you today; I appreciate any feedback on the speaker rating page. Additionally, the Adhearsion website links to documentation, API docs, Wiki docs, and my website, which contains screencasts for configuring Adhearsion with Asterisk. We also have a desk inclusion console set up and a multitude of resources available. Please feel free to reach out to me with any questions.
00:15:47.340 Thank you!
Explore all talks recorded at LoneStarRuby Conf 2011
+15