Docker isn’t just for deployment

00:00:13 Okay, so from the outside, in terms of how it's used, Docker looks quite a lot like a normal virtual machine. If we think about a virtual machine, it allows us to take one host and partition it into multiple smaller hosts. Each of these virtual machines can run different packages and dependencies. We can then run processes in each of those VMs, and those processes are unaware that they are running in a VM. They don't know anything about processes running in other VMs on the same host and are unaware of processes running on the host itself. In this respect, you can say Docker is actually quite similar. We can create a Docker image that defines a base OS, such as a basic Linux OS. We can define packages, application code, and configuration changes. Then we create a container based on this image and run a process in it, which also doesn't know that it's running in a container.

00:01:02 This process is unaware of other containers on the same host and has no knowledge of the other processes running on the host itself. However, there are some significant differences between Docker and traditional VMs. In traditional VMs, we virtualize the physical hardware and run a complete instance of the operating system, including its own kernel. For example, if we imagine running an Ubuntu VM, if that VM takes 500 MB of RAM for its kernel and base components, that VM will utilize 500 MB of RAM for the OS, plus whatever RAM is needed for the processes we are running. Likewise, when we start a VM, if it normally takes 20 or 30 seconds to start, it will take that long to boot, plus however long it takes to start our own process.

00:01:43 Docker works differently. When you run a process in a Docker container, you are actually running that process on the host itself, sharing the host's kernel. Therefore, there is almost no overhead in terms of resources when running a process within a Docker container. Because we're not starting a new kernel and not booting a complete OS, there is almost no overhead in terms of start time. So if we're starting a Unicorn web server in a Docker container, if it takes 10 seconds to start locally, it will take about 10 seconds to start in Docker. We can think of Docker as gaining many of the benefits of a VM without the resource overhead, which is a simplification. For the purposes of this talk, this is what it looks like from the outside.

00:02:45 Because of this, Docker has gained a lot of attention for deployment. We can run containers in development, and we heard a bit about this in the last talk. We can run containers in development and then run identical containers in production, giving us confidence that we will see similar behaviors in both environments. However, to me, that is not the most exciting aspect of Docker.

00:03:05 What excites me most is that because of the way Docker interfaces with containers, we can easily build features around containerization. Rather than having containers merely as a means to deploy existing features, they can become integral parts of new features. This is particularly meaningful in Ruby, where certain functionalities would have been much harder to implement before Docker.

00:03:34 So, I'm Ben, by the way. I'm from a company called Make It With Code, where we teach people Ruby. We discovered early on that one of the main reasons beginners quit using Ruby as their first language isn't because of the language itself but rather due to problems setting up a development environment. Many beginners struggle with installing RVM or RBM, running into issues with system Ruby and path variables, and often give up without ever writing a line of code, which is a real shame. They miss out on discovering how beginner-friendly Ruby actually is, so we aimed to bypass this completely.

00:04:08 We sought to provide a complete browser-based development environment for our students. This included a live terminal, a file browser, and a text editor. We were fortunate because, at that time, the open-source Codebox project was announced. Codebox is a Node.js-based application designed specifically to provide a browser-based development environment, similar to services like Cloud9 that you may have encountered. We started off with groups of about 10 students each week, and we would use Chef Solo to spin up a new VM for that group. Each student would get a Unix user, and we would run an instance of this Node.js app under each user account.

00:05:03 There was a lot of logic in the setup that allowed our front-end proxy to send traffic to these unique development environments corresponding to each Node.js instance. This worked exceptionally well for our students. We saw engagement levels rise, as they progressed much further in learning Ruby without having to worry about getting Ruby set up initially. However, this approach had significant drawbacks on the business side. Notably, provisioning a new VM for each group was a manual task, which meant users had to wait for a cohort to begin instead of being able to start immediately.

00:06:00 Additionally, this was an inefficient use of resources. We could accommodate about 10 students per 2GB VM, and most of these students would only use it for about 5 to 10 hours each month. The Node.js app continued to run all the time in the meantime, which led to escalating costs. This situation made it impossible for us to offer free trials or lessons since we could not afford to provision environments for people who were not guaranteed to pay for the course.

00:06:39 So, we started exploring Docker. I had experimented with Docker earlier and was quite impressed. Like most, my introduction to Docker began with the command line. Following the Docker tutorial typically provides the foundation for starting and running containers. Our first iteration from within our Rails app utilized the Docker command line. Don't worry too much about the exact details of that command, but essentially, when a user signed up for our Rails app, we would kick off a Sidekiq job that constructed a Docker run command.

00:07:30 This command included specifications for the base image we were using, the ports we needed access to, and folders from our shared file system to be mounted into the container. The Sidekiq job would then execute a shell command, invoking SSH on a node in our Docker cluster. Anyone familiar with Docker may find this approach somewhat amusing since it is admittedly rather ridiculous. Docker offers a complete HTTP API, which is a much cleaner interface.

00:08:16 Anything you can accomplish through the Docker CLI can also be achieved via the API. For instance, to create a container, you can perform a simple POST request to an endpoint exposed by the Docker daemon. In that POST request, we included the same information that was present in the Docker run command we just described. We specified the image we wanted to build from, which in this case was a custom Codebox image, along with details regarding the volumes and ports to be mapped.

00:09:18 Finally, we specified the command to be executed when starting this container. This approach allowed us to move away from shelling out and using regex to parse terminal responses, which was a rather flaky process. By using the API, we received response data in JSON format, which we could easily manipulate in Ruby and examine the results of the commands. Naturally, as this is Ruby, there's a gem available to facilitate working with Docker.

00:10:05 I highly recommend this gem if you're interested in automating Docker actions. You can see the same process illustrated previously, but now I'm passing in a standard Ruby hash to the Docker container creation method. If that succeeds, I'll get a container object in return, allowing me to perform additional actions such as starting or stopping the container and checking its status, all directly on that Ruby object. This significantly enhances usability compared to our original command line approach.

00:11:03 It minimizes the switch cost, as we work within a standard Ruby API without worrying about direct HTTP calls, while also getting nice Ruby objects returned to manipulate. However, it's still not perfect because Docker's architecture means there are actually three API calls required to go from nothing to a running container. First, we need to create an image if that image doesn't already exist on the Docker host. You can think of an image as a class definition, which outlines the OS, files, and the packages to use.

00:12:21 The next API call creates a container from that image, similar to creating an instance of a class. At this point, we specify directories for potential external mounting, as well as the ports that need to be exposed. Finally, we make a third API call to actually start the container we just created, specifying necessary parameters, such as the volume directory from our GlusterFS file system to be mounted into that container at a specific point and mapping the relevant ports. However, this process means we still think in terms of Docker's workflow rather than our application’s business logic, resulting in higher switching costs when transitioning between our Rails application and containerized components.

00:13:50 The brilliance of this API and gem lies in their ease of abstraction, allowing us to reason about containers differently. We aimed to avoid thinking about creating images, converting them into containers, and subsequently starting them. Instead, we wanted to reason that a container should have specific properties for each user. Our approach was akin to the way Active Record abstracts database interactions. In other words, we don't concern ourselves with the underlying database mechanisms; we want to ensure a record exists with specified properties and retrieve it.

00:14:54 Thanks to the Docker API and this gem, building this abstraction was straightforward. We whimsically named this abstraction 'DAA'. With it, we use a standard Ruby hash to define specific properties that a container should possess, including the base image, port mappings, mounted volumes, and environment variables, acknowledging that Docker often pulls configurations through environment variables. Once we create this Ruby hash—with values generated automatically from our user object—we pass it to DAA to deploy the container, which assesses whether the container has already been created, starts it if it has, and creates it and starts it if it hasn't.

00:15:53 If the container is already running, it simply does nothing and returns the existing container. This means when working with containers within the app, the complexities of the infrastructure and traditional Docker workflows are abstracted away. The advantage of this is that our containerized infrastructure now functions similarly to an HTTP API, enabling us to treat and interact with it as we would with the GitHub or Twitter APIs. In the same way we wrap third-party APIs within abstractions aligning them to our business logic, we can do the same with infrastructure.

00:16:43 The result is that our application has become much easier to reason about. A new person coming into the application doesn't need an in-depth understanding of Docker terminology, the workflow of creating images, or how to map folders before starting a container. They just need a reasonable understanding of the abstractions we have built, allowing them to dive into application development. The outcome has been incredibly positive.

00:17:39 When a user signs up for our Rails application, it triggers a Sidekiq job responsible for using the DAA API to ensure container management. Upon the job's completion, we make sure our front-end proxies are updated to route the user back to their specific Node.js app when they attempt to access their development environment. One of the biggest business benefits we have achieved through this system is our ability to implement a cron job that monitors when a container was last accessed. If a container hasn’t been used for, say, half an hour, we can stop it.

00:18:40 When the user accesses it again, we can utilize DAA and the Docker API to detect its non-running state, restart it, and then reroute user traffic accordingly. This has allowed us to scale our density from approximately 10 users per 2GB node to an impressive minimum of 500 users per 2GB node, possibly higher, although we have yet to test beyond that. Additionally, this advancement grants us the ability to offer free trials since we can effectively manage resource costs associated with container usage.

00:19:33 I mentioned earlier that I wouldn't focus much on traditional deployment since I didn't find it the most exciting aspect of Docker, yet you could argue I inadvertently touched upon deploying Node.js applications at runtime. There are several scenarios where we have found success with Docker that don’t relate to deployment.

00:19:47 For instance, we had a scenario with a proprietary dataset that we couldn't share with third parties but needed to allow them permission to build analysis tools in C. We enabled them to write their code in C, then injected that into a container, built it, and processed the data, generating summaries to provide back to them. In another instance, Docker has excelled in creating language playgrounds. After a new programming language launches, it often doesn't take long before someone sets up a playground where users can execute code snippets and see results server-side.

00:20:45 These setups are incredibly valuable for educational purposes and allow individuals to interact with the language without going through the entire setup process. We have used Docker extensively for these scenarios, creating simple Ruby objects that handle code execution workflows, building suitable containers for each language, capturing output, and returning results to a Rails API. These implementations simplify the overarching process, allowing the team to concentrate less on containerization intricacies.

00:21:36 To summarize briefly, I wanted to demonstrate that Docker's feature-rich HTTP API allows for the straightforward creation of abstractions over containerized infrastructure. This means we can reason about that infrastructure similarly to how we regard the APIs that make up the rest of our application. If you'd like to explore this further, I've provided a link on my blog containing these slides and numerous resource links. If you are completely new to Docker, I highly recommend checking out the interactive Docker tutorial, which offers a web-based overview of the command line to familiarize you with terminology.

00:22:23 The Docker gem is excellent and, post-tutorial, you should be well-equipped to start utilizing it. Additionally, the DAA gem—which represents our abstraction—is open source on GitHub; feel free to explore it as an example of creating customizable abstractions. One amusing side effect is that, since DAA operates entirely on Ruby hashes, we can also implement YAML file-based deployments, similar to tools like Fig or Docker Compose.

00:22:55 You can easily define your Rails container, Postgres container, and Redis container in a YAML file, with DAA orchestrating those deployments in development or across multiple hosts in production. Thank you very much for listening. Are there any questions?