mruby

Building Serverless Ruby Bots

Building Serverless Ruby Bots

by Damir Svrtan

In the presentation titled "Building Serverless Ruby Bots" by Damir Svrtan at RubyConf 2018, the focus is on creating serverless Ruby bots using AWS Lambda, which traditionally does not support Ruby. The speaker shares his personal experience of needing an apartment in Zagreb, Croatia, and how he built a bot to scrape listings automatically. Key points include:

- Motivation for Building a Bot: Damir faced challenges in finding an apartment quickly due to high demand. He created a simple bot that checks an online classifieds site every five minutes for new listings and sends an email alert.

- Transition to Serverless: To ensure the bot runs consistently without relying on his personal computer, he explored serverless solutions. AWS Lambda was chosen due to its capabilities in executing code without managing servers.

- Ruby Support Issues: Despite AWS Lambda’s robust support for several programming languages, it does not support Ruby, prompting the exploration of alternative Ruby implementations including Traveling Ruby, MRuby, and JRuby.

- Implementations:

- Traveling Ruby: This allows CLI applications to run without the end-user needing to install Ruby. Damir discussed the process of preparing a Ruby package for deployment.

- MRuby: A lightweight, embeddable Ruby interpreter aimed at microcontroller and IoT applications, with challenges due to lack of standard library support.

- JRuby: Runs on the Java Virtual Machine, providing two-way access between Ruby and Java, and leveraging Java's environment. Each implementation faced distinct hurdles, such as missing dependencies or coding constraints.

- Packaging for AWS Lambda: He detailed the process of wrapping Ruby scripts for Lambda and explained the necessary steps to package and upload the bot's code.

- Performance Metrics: The speaker compared the code size, memory consumption, and execution speed of the three Ruby versions, noting that MRuby had surprising metrics and that warm-up times impacted JRuby performance.

- Further Considerations: Other serverless options and projects were mentioned, like Ruby Packer and Fast Ruby IO, highlighting the need for Ruby support across platforms.

The conclusion emphasizes the experimentation with different Ruby implementations and the hope that broader support for Ruby will increase through efforts like a petition for serverless Ruby initiatives. Damir's knowledge and professional experience, including his current role at Netflix, underscore the presentation’s relevance to developers navigating serverless options for Ruby applications.

00:00:15.379 Hello everybody, thank you for coming to my presentation today. I want to talk about building serverless Ruby bots. Some time ago, I was looking for an apartment to rent in the center of Zagreb. For those of you who don't know, Zagreb is the capital city of Croatia, which is located in Central Europe.
00:00:36.090 If you want to rent apartments in Croatia, you check out listings on an online classifieds platform that's called Njuškalo. Njuškalo is basically the same thing as Craigslist here in the US. I had a problem while trying to find an apartment: I had a hard time forcing myself to check the classifieds multiple times a day. You probably all know this, but good apartments usually get rented out within an hour or two.
00:01:10.650 To tackle my problem, I decided to build a small bot that would scrape the webpage with all the listings every five minutes and send me an email when a new apartment became available. This technique was actually very successful because I was always the first to call the advertisers, and I managed to find an apartment fairly quickly.
00:01:39.210 The technical background of the bot was really simple. I used a couple of gems: HTTP Party to fetch the content of the webpage, the Yajl gem, which is an HTML parser really similar to Nokogiri but better. Then I used an Action Mailer gem to shoot myself an email. I wrapped it all in a rake task and invoked that rake task with the Whenever gem. The script ran on my local computer, and that was good enough for me.
00:02:03.149 However, if I’m not on my computer, I rarely check my emails. After some time, I decided to reuse the script and wanted to run it consistently every five minutes, no matter if my computer was on or off. Therefore, I had to move it from my personal computer to a server.
00:02:23.260 However, I didn't fancy the idea of renting a server just to run a simple Ruby script. Enter serverless! I knew at that point that the ideal infrastructure for what I needed would be serverless—a place where I could just deploy my function and run it periodically without managing servers.
00:02:34.530 Thankfully, Amazon, Microsoft, Google, IBM, and many other vendors provide the infrastructure for this. It's usually referred to as serverless or Function as a Service (FaaS). I wanted to try out the most useful and pioneering service in this area, so I tried AWS Lambda.
00:02:58.689 AWS Lambda lets you run code without provisioning or managing servers. You upload your code in a zip file, set some triggers to run it, and that's it! You're up and running in minutes. AWS Lambda executes your code only when needed, scales automatically, and you pay only for the compute time you consume.
00:03:40.229 Lambda supports multiple languages such as Python, Java, Node.js, and C#, but there is one problem: it doesn't support Ruby! The AWS team asked the public more than three years ago on Twitter whether they should first support Ruby or Go, and unfortunately for us, the vote was 54% for Go and 46% for Ruby.
00:04:11.560 More than two years later, this tweet came out announcing Go support. It took them a while—basically over two years—and I would guess that Ruby support isn't near at all. There have been some rumors about it, but no official announcements have been made, which leaves us without Ruby support.
00:04:51.010 I wanted to stick with Ruby and try it out on Lambda. Although there's no official support for Ruby on AWS Lambda, you can run JavaScript, Java, or Python and invoke shell commands. Since we can invoke shell commands, this means we can wrap our Ruby script, package it, and run it on AWS Lambda.
00:05:32.750 There are at least three, but actually more, ways to package Ruby and run it on AWS Lambda: Traveling Ruby, MRuby, and JRuby. Let’s introduce each of them.
00:06:19.920 Traveling Ruby is a project that tackles the problem of distributing CLI apps. If you ever wanted to distribute a CLI built in Ruby, your end users would have to have Ruby installed, or even worse, have a Ruby version manager installed. They would also need to install Bundler and all the gems. Traveling Ruby allows Ruby app developers to distribute a single package to end-users without requiring them to install Ruby or any gems.
00:07:44.960 MRuby is something totally different. It's an interpreter for the Ruby programming language with the intent of being lightweight and easily embeddable for various microcontrollers and IoT devices. It's created by Matz, and it's designed to provide various ways to run Ruby code packaged as an executable binary that can be run on any Linux distro, OS X, or Windows.
00:08:29.160 Unlike our canonical Ruby implementation, JRuby runs on the Java Virtual Machine (JVM). It provides a way for us to embed Ruby into any Java application and allows two-way access between Java and Ruby code. Since AWS Lambda supports Java, it's possible to run JRuby on it.
00:09:14.200 I wanted to try all the Ruby implementations to check how they perform. Since we have three totally different implementations, I decided to write a minimalistic bot that has all of its code in a single file with minimum dependencies, basically depending only on the standard library.
00:09:50.750 The first thing I had to do was shrink the codebase. I had a bunch of gems that I needed to get rid of. I removed HTTP Party, since I could just use Net::HTTP, which was good enough. I also removed Action Mailer in favor of using Mailgun's HTTP API. I discarded Whenever and Rake because Lambdas have their own scheduler. The only gem I couldn't avoid was Nokogiri, as the webpage I was scraping lacked a JSON API.
00:10:45.150 So, I went on to find an HTML parser for MRuby. I eventually posted a question on the MRuby official repository asking if there was an HTML parser, and Matz replied. He gave me three suggestions for my problem: one, use regular expressions (which I avoided); two, try to port a simple Ruby HTML parser; or three, write a wrapper over LibXML, which was the hardest option.
00:11:40.000 I looked at the HTML structure of the listings and noticed that each has a time element with a date-time attribute. Given that, I decided to go with regular expressions to parse it. Now let’s look at the Ruby bot code, which is quite simple.
00:12:38.000 I created a simple class called NewApartments that takes a URL to scrape. This URL would include parameters such as the desired region, square footage, and price. It would then check with each HTTP request if there were any new apartments published in the last five minutes and send me an email if there were.
00:13:15.000 The implementation with Traveling Ruby requires downloading runtimes first for each platform you're going to run your bot on. I had to download runtimes for macOS and Linux, particularly for running on AWS Lambda. You can easily curl that down from an S3 repository provided by Fusion.
00:14:06.000 Once downloaded, unzip and prepare your boilerplate bot code. The structure includes a bin folder inside the runtimes directory, along with a Ruby script for execution. If I copy over my bot code to a file called main.rb, the structure would look something like that.
00:14:42.000 If I wanted to execute the bot with the Ruby runtime installed on my machine, I would simply do that. If I wanted to invoke it with the Ruby pulled from the S3, I would do that too. Everything worked perfectly fine right from the beginning.
00:15:30.000 To upload the bot to AWS Lambda, you need to create a wrapper since Lambda does not accept direct shell commands. A similar wrapper can be found online that creates a child process to execute the shell command. After that, zip it all up or package it into a zip file to upload to AWS Lambda.
00:16:16.000 You need to package three things: the runtime, the source code (the main.rb file), and the JavaScript wrapper. After that, you go to the AWS Lambda panel and follow five easy steps to configure your serverless function. You must name it, select a runtime, and choose a trigger.
00:17:03.000 There are various triggers to choose from, like the API Gateway or CloudWatch Events Scheduler, which allows us to run the bot periodically. Then, you create a rule for the trigger and upload the zip of the bot code last.
00:17:45.000 The final step is setting a handler. The handler tells Lambda what file to look into, which method to invoke, and that's it! Now we can harvest the benefits of our Ruby bot and receive emails whenever a new apartment becomes available.
00:18:39.000 The implementation with MRuby was interesting as well. There are two ways to build apps with MRuby: you can download the MRuby source directly from GitHub, or use the mruby-cli platform for building native command line applications.
00:19:22.000 I decided to build from the MRuby source instead of using the CLI. After cloning the MRuby repository, I found an empty bin folder, a build config file that acts like a gem spec, and a mini rake file. Once I invoked the rake file, the bin folder was no longer empty—it contained an MRuby file along with an MRB file.
00:20:17.000 If I wanted to create a simple hello world file and invoke it using MRuby, I would do bin/mruby hello_world.rb. That worked, but when I copied my bot code over to main.rb, I encountered an 'undefined method require' error.
00:21:13.000 MRuby doesn’t support the require method because it works differently than standard Ruby. As mentioned earlier, MRuby is designed to be lightweight and doesn't come packaged with everything included. Thus, I had to find replacements for several libraries like Net::HTTP.
00:21:56.000 MRuby lacks support for environment variables or regular expressions out of the box, so I had to add these dependencies to my build config file. After some tweaks and recompilation, I finally got it working.
00:22:37.000 Now, executing the bot would send an email, which was great. The next challenge was uploading it to AWS Lambda, which required cross-compilation since I was running on macOS, while Lambda runs on Linux. I compiled it on a Linux machine and zipped it just like I did with Traveling Ruby.
00:23:42.000 Now let's discuss JRuby implementation. The easiest way to get started with JRuby is to use the AWS Lambda JRuby repository, which has a prepackaged formula for building Ruby functions on AWS Lambda. Once you clone it, you'll find various components including a lip folder with the JRuby jar file.
00:24:41.000 You also have a source folder for your main.rb files and a Gradle task to compile everything into a zip file. After pasting my code into the main.rb file and building the project with a Gradle command, everything would create a zip file of the bot.
00:25:31.000 However, I faced some issues when compiling on my local machine, possibly due to different Java versions. Therefore, I built everything on a Linux machine and named the existing upload as the previous bots.
00:26:18.000 The only difference when uploading JRuby is using Java 8 instead of Node.js. Despite the lack of official Ruby support, we managed to run three different implementations of Ruby on AWS Lambda—all of them worked!
00:27:11.000 Now, let's talk about metrics, code size, memory consumption, and speed—after which I'll explain Lambda's pricing. As expected, the least code size came from MRuby, which is no surprise. However, all three implementations stayed within Lambda's restrictions on maximum code size per function, which is 50 megabytes.
00:28:06.000 I was surprised that MRuby had a larger memory footprint in my tests, while I expected JRuby's to be highest. Interestingly, even a simple program like 'Hello World' can take up to 2.5 seconds to execute on JRuby due to Java's long warm-up time.
00:29:01.000 Cold starts occur when your code is triggered for the first time; the cloud provider initializes a new container, which takes time to warm up. Subsequent calls reuse the same container unless about four hours pass. After warming up, Traveling Ruby and MRuby have comparable execution times, whereas JRuby performs significantly better, executing in approximately 1.2 seconds.
00:30:06.000 Why do these metrics matter? AWS Lambda's pricing operates on a two-tier system: the number of requests and the sum of execution durations expressed in gigabyte-seconds. You get 1 million requests and 400,000 gigabyte-seconds per month on the free tier.
00:31:06.000 So, if you ran the bot every minute, JRuby would outperform MRuby and Traveling Ruby on subsequent requests. However, it uses more memory, so operational costs might equalize in the end.
00:32:01.000 Choosing the right Ruby implementation for the job can be difficult. Traveling Ruby offers low memory consumption but lacks adequate support, while MRuby is lightweight yet has limited library support.
00:32:38.000 JRuby provides simple packaging but can be challenging to configure properly depending on the environment. This can lead to increased debugging efforts, especially when encountering Java errors.
00:33:27.000 Now, let's mention some alternatives. Ruby Packer compiles an entire application into a single executable, and Ruby Snap is a packaging and distributing app project still in progress. As for serverless platforms, Easier Functions has broader language support; however, it lacks the integrations that AWS Lambda provides.
00:34:34.000 Google Cloud Functions support only Node.js and Python, while other options involve running Ruby with Docker on Kubernetes, which is less straightforward. Recently, Apache OpenWhisk announced Ruby support, which is a promising development. IBM Cloud Functions are actually built on Apache OpenWhisk, making them the first cloud provider to officially support Ruby.
00:35:25.000 IBM Cloud Functions even include an inline editor with syntax highlighting, although improvements are still required. Additionally, Serverless Framework has become a popular framework for applications targeting AWS Lambda and Apache OpenWhisk.
00:36:18.000 Lastly, Fast Ruby IO is a hobby project that allows you to quickly generate Ruby serverless functions and deploy them on DigitalOcean. While this project might lack some features, it shows potential for proof of concept.
00:37:09.000 In conclusion, this has been a fun experiment for me to test multiple Ruby flavors. I am glad to share that you can run Ruby on various serverless providers, regardless of official support.
00:37:45.000 However, I believe we are far from perfect integration, but hopefully, developments from providers like IBM will motivate others to support Ruby as well. You can even support this initiative by signing a petition on serverlessruby.org that currently has over 1,500 signatures.
00:38:35.000 Thank you all for your attention. My name is Damir Svrtan, and as you might have gathered, I come from Croatia. I have lived there until six months ago, and I now work for Netflix in the San Francisco Bay Area. Netflix is currently hiring Ruby developers across multiple positions. If you're interested, please feel free to catch me after the talk or check this URL.
00:39:12.000 Thank you, everyone!