Talks
Ruby off the Rails: Building a Distributed System in Ruby

Ruby off the Rails: Building a Distributed System in Ruby

by Ruby off the Rails: Building a distributed system in Ruby

In the video titled "Ruby off the Rails: Building a Distributed System in Ruby," Matthew Kocher discusses how to build a distributed system using Ruby, specifically focusing on Cloud Foundry, a Platform as a Service (PaaS) solution. He shares insights from his experience working with Cloud Foundry at Pivotal Labs over the past six months.

The main topics covered in the talk include:

- Traditional vs. Modern Deployment: Kocher contrasts the manual steps involved in traditional application deployment with the streamlined process offered by PaaS solutions like Cloud Foundry, which greatly simplifies the developer experience.

- Overview of Cloud Foundry: He explains that Cloud Foundry is a large and complex distributed system predominantly written in Ruby. It mimics local development environments and allows developers to deploy code rapidly.

- Workflow Components: Kocher delves into various components of Cloud Foundry, such as the CF gem for developer interaction, the Cloud Controller (a centralized API), and the Droplet Execution Agent (DEA), which is responsible for executing user code securely.

- Build Packs: He discusses the use of build packs to transform user code into executable droplets, allowing applications to run in a cloud environment.

- Resource Handling and Load Balancing: The talk highlights how the system handles resource allocation dynamically, with updates to algorithms that manage load balancing between DEAs based on performance metrics.

- Health Management: Kocher emphasizes the role of the Health Manager in monitoring system events, ensuring application availability, and addressing issues such as outdated app instances.

- Service Management: He raises important points regarding service management within Cloud Foundry, acknowledging that while it supports key services like PostgreSQL and RabbitMQ, the service offerings are not production-ready yet.

Throughout the presentation, Kocher encourages audience involvement in these open-source projects and shares that there are many opportunities for developers in this field. He wraps up by directing viewers to his slides, online resources, and ways to connect with him post-conference.

Key takeaways from the session include:

- The architectural considerations when deploying Ruby applications at scale.

- The importance of community collaboration in open-source environments.

- The potential for Ruby to power various components in distributed systems beyond traditional web applications.

00:00:20.960 Great.
00:00:22.400 Cool. Uh, so I'm a developer at Pivotal Labs. I've been working on Cloud Foundry for the last six months or so.
00:00:26.439 It's been a really fun system to work on. It's a large distributed system written in Ruby that's a Platform as a Service.
00:00:30.759 I'll get into a little bit more about what it is as the talk goes on.
00:00:33.040 I wanted to go over sort of the battle-old days of what we talked about yesterday which was how we deploy applications. If we're not using PaaS, we start by writing the code, which is what we've been spending the last two days talking about.
00:00:44.160 Then we have all of these other manual steps that we talked about yesterday: provisioning servers, installing the runtime, installing databases, setting up DNS, and setting up load balancers. All of this stuff is something that some developers really love getting into, while others would much rather not have to deal with it at all.
00:01:06.799 So, the new workflow that we're looking for is quite different. A lot of times I've worked at small startups where we’ve used Heroku. On the first day we deploy, it only takes about 30 minutes to set up. Then, if you are working for a large company that's still using the old process, it can feel like their data center is stuck in the 1950s. You file a ticket, and the operator will get back to you when it's ready.
00:01:25.520 So now we have a new plan for deployment, which is: steal the underpants, deploy code and profit! I think there’s one step in here that we don’t need, but I haven’t been able to figure out which one it is. What I like to say is, 'There’s an app for that,' and it’s Cloud Foundry.
00:01:48.600 What Cloud Foundry does is create a system that mimics what you see in a local development environment, making it very easy to deploy applications. You want to be able to give it code, and you want to see your application running on the internet in a matter of minutes. It’s entirely open-source, and the vast majority of it is written in Ruby.
00:02:14.680 A lot of developers get started with Ruby on Rails, and they’ve seen web applications written in Ruby and used a couple of gems. However, this system looks entirely different; it has all sorts of Ruby daemons, most of them written with EventMachine, and communicates over a message bus. There is very little UI aside from monitoring endpoints.
00:02:34.120 It’s really fun to go through the codebase. It took me probably a week before I had any idea what I was looking at when I first started working with it. I sat down at my computer and thought, 'Let’s start setting it up,' and a week later, I came back and was like, 'Oh, I've got it set up now.' So, my goal with this talk is to give you an idea of what components you have to set up and how they fit together.
00:03:05.159 I'm not going to cover all the edge cases because we would be here for a week. The first part of Cloud Foundry that you encounter is a simple gem. We recently renamed this to CF because it used to be called VMC, and we didn’t know what VMC stood for, so we figured CF made a lot more sense.
00:03:21.599 This is probably the clearest Ruby code in the system. As Ruby developers, we’re used to dealing with gems; we know what they look like. You run 'gem install,' you download it, and you run CF, which tells you what you need to do to interact with the system. Once you enter commands, it eventually assembles your commands into HTTP requests that it sends up to Cloud Foundry.
00:03:43.879 It's pretty easy to work with. We’ve recently rewritten a lot of this, and while there’s still a lot of refactoring to do, it's getting better. So, this is the diagram we have right now: we have some developers using the CF gem, but this doesn't get you very far when you're trying to deploy an application in the cloud. You can enter commands, but eventually nothing actually happens.
00:04:15.360 We need to add something to this. The component we add is sort of the central brain of Cloud Foundry, which is an API that we call the Cloud Controller. This is a Sinatra application that is also written using EventMachine and sits on the message bus so it can communicate with the rest of the distributed system and also talk to the outside world.
00:04:38.639 The Cloud Controller's main responsibility is maintaining the desired state for what the system should look like at any given time. If you register with the Cloud Controller, it should know that your user exists. If you create an application, it knows how many instances of that application there are and how many URLs you want mapped to that application.
00:05:05.360 Once it knows the state of what you want, the Cloud Controller takes action to keep the state of the system in sync with what you've told it to do. The Cloud Controller was originally written in Rails, but it was recently rewritten in Sinatra with the SQL gem. It’s a little cleaner this way, but there wasn't really a good reason to choose Sinatra over Rails other than it's just an API.
00:05:37.600 I prefer using Rails for APIs, but it shows that it doesn't really matter which framework you use; most of the code ends up looking the same. So now we have a system where users out in the world can use their CF gem to communicate with the Cloud Controller, which is running in the cloud.
00:06:08.360 When users send an HTTP request to the Cloud Controller, they might say, 'I've got my Hello World app' and they'll provide the code and the URL they'd like to use. For instance, if they go to helloworld.com or cloudfoundry.com, they should be able to see their app.
00:06:35.760 The CF gem uploads the code to the Cloud Controller. Now, the Cloud Controller has this code and is ready to go, except that we aren't ready yet because we have code but what we want, in our terminology, is a droplet. As many people know, you can't just download code and run it, unless you're strictly using the Ruby standard library. Most of you are using gems, as we established earlier, and most of you want to use a specific version of Ruby, not just whatever one we happen to have installed on the system.
00:07:07.520 What we really want to do is take any code uploaded into the system and transform it into something that we can run. What we have is a staging process where we take the code and we run what's known as a build pack, which is essentially a transformation function. We had kept separate implementations of what Heroku has for their build packs. However, we took a step back a few months ago and concluded that it didn’t make sense to maintain two open-source projects doing the exact same thing.
00:07:47.720 So we decided to use Heroku’s build packs. Of note, we didn't like how they handled Java, so we chose to implement Java differently. For Rails, their approach works perfectly fine—it’s exactly what people expect when deploying a Rails app. So, we take your code, run the build pack, which is a transformation function, and end up with a droplet.
00:08:37.080 The goal of a droplet is that no matter what you uploaded, whether it’s a Java app, a Play app, a Scala app, or a Go app, you’ll end up with something that you can unpack and call 'start on' while providing it a port to run on. This abstraction allows us to treat it like an executable. In essence, you have something that looks very much like a binary compatible file that you can run anywhere.
00:09:25.200 The challenge of this staging process is that we need a place to run it—we need to be able to run code somewhere, and the user’s code could theoretically do anything. This is quite daunting; taking user code and executing it on your system. Thus, we’ve ended up with a component we call the DEA, or Droplet Execution Agent.
00:10:08.160 The DEA is another Ruby daemon, written with EventMachine, that listens on the message bus. It has no user interface but runs on a server, along with another component we created called Warden. Warden is similar to LXC or, if you’ve been following recent tech news, Docker.
00:10:41.799 Warden uses similar technologies and essentially allows us to launch processes in a highly isolated environment, where the processes are jails that can run independently of each other but share the same kernel, making them lightweight. This means we can quickly spin them up in just a couple of seconds while ensuring a secure environment where the user cannot exit their container unless there’s a kernel vulnerability.
00:11:36.440 So, once we have this setup, our Cloud Controller sends a message to the DEA saying, 'Here’s some application code; can you go and stage it for me?' The DEA will spin up a Warden container. The Cloud Controller communicates over the message bus, so it pings another box to download the code.
00:12:19.920 Once the code is downloaded, it's put into the Warden container, and it starts running the build pack associated with that application, transforming it into a staged droplet. Once this process is complete, the DEA responds back to the Cloud Controller with 'I took that code you gave me, and I made it into something you can run in the system; here’s the URL for it.'
00:12:54.000 The Cloud Controller will then proceed to find somewhere to run this app and tell the DEA to run the application. This is an interesting problem because you have to locate the appropriate resources in a large system. We’ve implemented several algorithms to select where in the system to deploy the app. The first algorithm used a time delay to assess the load on each DEA in the system.
00:13:40.240 We broadcast a request to the cluster asking, 'Who can run this app for me?' Each DEA would calculate a number based on their load and respond with that after sleeping for a duration. This approach quickly became unreliable due to network delays, as the ones closest to the Cloud Controller would become overloaded. More distant DEAs would not have to perform any work.
00:14:12.040 Recently, we updated the selection algorithm so that the DEAs regularly broadcast their load stats to the Cloud Controller, which maintains an in-memory table of all DEAs and their loads. When the Cloud Controller goes to start up an app, it checks this table to find the DEA with the least load and sends it a direct message to start the application.
00:14:52.680 Additionally, the Cloud Controller passes an environment hash to initiate the app, which includes necessary configurations, such as database details. This way, the configuration process becomes very similar to launching a process on your local machine, only it's done across any number of nodes.
00:15:19.880 Once the start command is sent to the DEA, we’ve set the system in motion. The DEA’s job is to start listening on a specified port for connections. We don’t forward traffic until the application is ready and actively listening, meaning we wait for it to signal that it's operational before routing any requests its way.
00:16:05.960 When it starts listening, it sends a message to all routers in the system, updating them with its IP address and the port it’s running on. The DEA also sets up NAT forwarding rules since the DEA—unlike the containers running your app—has a public IP address. So, when routers receive this request, they can direct incoming connections to the DEA’s public IP address.
00:16:49.840 At this point, we finally have an application running and accessible in the cloud. However, while this is all great, there are challenges that arise as things don't always go perfectly. For example, you will eventually encounter issues with AWS, or any other problematic scenario in your infrastructure that causes a service outage.
00:17:28.680 Should a node in your distributed system disappear—which can happen frequently in a large system—the remaining nodes must also be accounted for. This situation introduces the need for a component we call the Health Manager. In a perfect world, the Health Manager stays idle and does nothing, but this is seldom the reality.
00:18:05.760 The Health Manager listens on the message bus for all events happening in the system. It then builds a profile of the current state and checks with the Cloud Controller to ensure that everything is functioning as intended. By reconciling two states, it can determine whether additional instances need to be started—if one is down or if new instances need to be scaled up.
00:18:40.840 There are many edge cases to consider that can complicate this process. If a user deploys a new version of their app, for instance, only three out of four app instances may be upgraded, leaving one behind. Thus, the Health Manager’s role extends to monitoring version consistency and making necessary adjustments.
00:19:23.920 The Health Manager’s functionalities were thoroughly documented recently, with our product team ensuring that we illustrate how all components connect to make the system operational. This effort to document our code base better is critical as we enhance our services over time.
00:20:03.240 One aspect that I haven't covered substantially is the services being managed by Cloud Foundry. As it stands, while we have systems to run application code, we lack the persistence layer, which is key from a developer's perspective.
00:20:24.720 We deploy several services including PostgreSQL, MySQL, RabbitMQ, and Redis. However, these are not production-ready services at this time. Users often end up using managed third-party database services as it’s common to prefer managing critical data independently.
00:20:58.320 Despite these limitations, we provide developers with a workflow enabling the provisioning of service instances, preparing them for deployment. As we continue to enhance reliability and service management, we're developing systems for seamless management of database instances.
00:21:38.480 For me, this is an exciting area to work in. I enjoy being a developer and likewise have a passion for operations, where the two disciplines meet under the umbrella of effective project implementation. The various components in Cloud Foundry utilize Ruby in ways that differ markedly from traditional Ruby applications.
00:22:10.360 There is much to explore, and I encourage everyone to get involved with these projects and the community around them. Feel free to check out the mailing list, and connect with discussions and resources available online. I'll also be around tomorrow after the conference at the Pivotal Labs meetup.
00:22:56.840 You can find my slides online, and you can also reach out to me on Twitter. Thank you very much.