00:00:00.260
Hello, good afternoon everyone. Thank you for attending my talk today.
00:00:05.700
Today, I'm going to talk about building a container platform in the Ruby ecosystem.
00:00:11.190
I hope I have the kanji correct because I kind of used Google Translate for this.
00:00:20.900
My name is Gio, and I come from Jakarta.
00:00:26.519
If you don't know, Jakarta is located in Indonesia. A lot of people are more familiar with Bali, but actually, the capital city of Indonesia is Jakarta.
00:00:39.000
It took about seven hours to travel from Jakarta to Tokyo, and then I had to change planes for another two hours from Tokyo to Fukuoka.
00:00:51.899
I enjoy traveling, but most likely, when I go back to Tokyo, I will bring some concepts that I want to explore.
00:01:08.240
I work at a company called Gojek. Is there anyone here who knows about Gojek?
00:01:15.090
A few people, sure! Gojek is quite a hot topic in Southeast Asia. So what is it exactly?
00:01:23.729
I will summarize it, but it's quite hard because we provide a lot of services to our customers in Southeast Asia.
00:01:37.439
Some people might say that we are the Uber of Southeast Asia, but it's actually quite different.
00:01:49.340
Uber converges primarily to a ride-sharing platform, while we provide a lot more services.
00:02:02.399
As you can see, we operate in at least four or five countries including Thailand, Indonesia, Singapore, and Vietnam.
00:02:10.110
We provide a variety of services beyond ride-sharing, including some interesting use cases like GoMassage, which allows you to order a massage to your home.
00:02:27.200
If you want to buy tickets to a movie, you can use our platform as well. We call ourselves a super app.
00:02:39.650
What does that mean? We provide a wide range of apps within a single application.
00:02:54.530
If you open our app, you will see all these services. At Gojek, we love using Ruby.
00:03:01.150
In Gojek, we use at least five programming languages. Ruby is one of them, alongside Clojure, Go, Java, and others.
00:03:15.560
Actually, we use JRuby, not Ruby, but the syntax is the same. We prefer using JRuby because we already use other languages on the JVM platform.
00:03:26.959
This allows us to make it easier to use interchangeable libraries. In Jakarta, I also organize Ruby meetups.
00:03:37.909
We hold Ruby meetups almost every month. The Ruby community in Jakarta is quite established, with our first mailing list dating back to 2001.
00:03:49.819
If I remember correctly, 2001 was the year when we had the official documentation translated into Indonesian.
00:04:05.739
DHH met someone at a conference that year and decided to use Ruby for Rails.
00:04:20.239
We've already held several meetups and we hosted a conference two years ago at the Gojek office in Jakarta.
00:04:39.669
We will have another one soon, so if any of you want to submit a CFP, it's already open and will be held in September.
00:04:54.630
Now, moving on to my main topic, I will discuss building a container platform in the Ruby ecosystem.
00:05:02.590
What is it exactly? This is my abstract, but I want to highlight a few sentences.
00:05:20.050
The first point is that I'm trying to build a container platform where almost everything will be in Ruby or MRuby.
00:05:25.449
I will talk about the current situation and whether this ecosystem can support my efforts.
00:05:32.650
For some of you who are not yet familiar with the concepts of container platforms or orchestrators, I will also explain a bit about the generic architecture behind it.
00:05:48.880
This project started early last year due to the rapid growth we are experiencing in my company.
00:06:00.490
There has been a lot of fragmentation because each team has been moving in different directions.
00:06:08.020
We previously had more than 20 services, each maintained by different people.
00:06:18.580
To achieve rapid growth, we allowed teams to experiment independently, which led to this fragmentation in terms of platforms and support services.
00:06:31.710
Therefore, we decided to converge into a centralized platform, consolidating common tools and services.
00:06:45.280
There are a couple of principles and considerations that we think are important for this project.
00:06:50.650
The first is that it must be open source. We decided to use existing open-source projects or contribute to them.
00:07:02.790
Almost everything we create for this undertaking is open source.
00:07:14.470
We want it to be seamless—we already have many services in production and we don't want anything to break during the transition.
00:07:21.600
At that time, we were mostly using VMs, so we wanted to start exploring containers.
00:07:34.090
Moving a significant number of services to containers is challenging since we already have a large ecosystem on VMs, which includes several thousand production VMs.
00:07:54.520
Additionally, we need to support a hybrid infrastructure.
00:08:00.130
We cannot rely solely on cloud solutions. One of our services, Gopay, is a payment platform in a highly regulated industry.
00:08:13.270
One requirement is to host services within the country. Therefore, anything we build must support hybrid infrastructure.
00:08:25.420
Another project I was involved in was logging.
00:08:31.300
Initially, we allowed teams to choose any logging tools they wanted, such as Elastic Search or Stackdriver, depending on their hosting.
00:08:45.000
However, we needed a unified platform, which is why we developed what we call Burrito Log.
00:08:59.020
Burrito Log serves as an Elasticsearch-based service platform, but we intend for it to be flexible enough to switch search providers.
00:09:11.560
For example, we found another project called Loki, which is also a CNCF project, and we aim to allow interchanging full-text search functionality.
00:09:30.520
Additionally, we decided to use LXD to handle components.
00:09:39.260
For those who don't know, LXD was initially used by Docker as its building block.
00:10:06.220
LXD is different now, as Docker has created another container runtime, but for initial implementations, LXD served as a low-level container runtime.
00:10:17.709
We decided to use LXD due to two main reasons. First, it is a drop-in VM replacement, which is easy to use if you are accustomed to VMs.
00:10:32.230
Also, we utilize Chef for almost all of our infrastructure provisioning. We have many existing cookbooks, allowing us to minimize effort.
00:10:46.200
However, we need a system to manage these LXD containers, as manually handling many containers can become tedious.
00:11:01.900
Thus, we created a platform called Pathfinder. It can be found on GitHub.
00:11:15.280
It is a container platform written in Ruby and supports the Go language.
00:11:39.190
To clarify what a container platform is, it sometimes gets referred to as an orchestrator or manager.
00:11:54.120
To provide a clearer explanation, I found a neat online resource that uses a limited vocabulary of the most common words.
00:12:08.610
This tool allows you to define complex ideas while adhering to the constraints of a simple vocabulary, which I find quite useful.
00:12:29.980
However, I did break the rule with the word 'software,' which is not considered a common term.
00:12:42.600
If I were to define a container platform, I would say it is software that allows us to manage many computers as if they were a single big computer.
00:12:56.650
It assigns jobs to these computers so that developers deploying anything do not have to think about where to assign containers or networking.
00:13:11.930
The architecture abstracts CPU, RAM, and storage, so you're presented with a unified resource view.
00:13:28.000
We also need to abstract the network aspect, as containers need to communicate with each other.
00:13:37.190
Now, I will detail the generic architecture of a container platform.
00:13:43.580
I will describe four key components. The first one is the state server, which I refer to as the 'consciousness' of the system.
00:14:16.960
It essentially stores the state of your containers, nodes, and everything else.
00:14:28.060
Next, we have a scheduler, which collects information from the state server and assigns homes for unscheduled containers.
00:14:47.680
The scheduler serves as the system's brain, pinpointing which node or computer a container should be spawned on.
00:15:07.550
Last but not least, there is an agent installed on all nodes that constantly queries the state server.
00:15:33.060
The agent checks whether there are any unscheduled containers on its particular node.
00:15:50.020
If there are, it will spawn those containers.
00:16:05.730
The agent communicates with the container runtime, which can be various runtimes such as Docker, Containerd, LXD, or others.
00:16:26.050
To help me remember the key components, I created a mnemonic for the main subject here.
00:16:46.040
In Pathfinder, we use Go for the CLI and agent, while the state server and scheduler are written in Ruby, specifically with Rails.
00:17:11.260
Currently, the runtime supports LXD but can easily be adapted for other runtimes, such as Docker or ContainerD.
00:17:24.000
Next, I will explain how the self-registration process works.
00:17:37.360
When adding a new worker node, the agent must be installed on that node.
00:17:50.160
Starting the agent triggers the self-registration process, where it contacts the state server.
00:18:04.680
The state server responds with a token or secret for secure communication. It also saves the agent's information in the database.
00:18:31.020
After the self-registration process, if you run the command line interface and type 'get nodes,' you will see all nodes listed.
00:18:55.170
Initially, since no containers have been created, the list will be empty.
00:19:08.950
Let's create a new container by specifying an image. The container is created within the state server, but nothing is scheduled yet.
00:19:23.950
It is now the scheduler's responsibility to check unscheduled containers and see which worker has the least utilization.
00:19:43.020
The scheduler marks the container with the node number, changing its status from pending to scheduled.
00:20:03.440
After this process, the agent will also check information from the state server.
00:20:18.350
If an unscheduled container has yet to be started, the agent will push the information to the runtime.
00:20:33.790
The runtime then creates the container in the worker node, and if you check the CLI, you will see that it is already provisioned with its own IP.
00:20:58.520
The system also has a rescheduling feature, which allows moving a container from one node to another.
00:21:14.170
If you destroy a provisioned container, the scheduler will place it on another node as long as there are adequate resources.
00:21:26.110
As of now, we have around 50 active worker nodes. It is a relatively small scale, but we have been live for six months.
00:21:51.690
Currently, this infrastructure serves our logging project, but other departments may soon start using it.
00:22:06.230
As Gojek expands into other Southeast Asian markets, we expect traffic and usage to significantly increase.
00:22:20.680
Now, let's discuss Pathfinder. Initially, we decided to use Go because it has libraries for communicating with LXD and can compile to a single binary.
00:22:37.430
However, I am experimenting with replacing all components with Ruby, particularly with MRuby.
00:22:51.740
Ruby is interesting for several reasons. First, it is straightforward to create executable files.
00:23:06.050
Second, it has a low footprint, which is important for the agent running on nodes.
00:23:20.240
Lastly, switching to an all-Ruby setup could streamline our project offerings.
00:23:38.410
We have a lot of Ruby developers at Gojek—more than Go developers.
00:23:50.720
However, a significant challenge remains with the Ruby ecosystem, which currently lacks some of the libraries that Go offers.
00:24:05.350
For example, we need a robust agent that interacts with the state server and the local container runtime.
00:24:16.910
The agent must handle self-registration, querying the state server, and communicating with the container daemon.
00:24:30.680
It is also responsible for sending back metrics.
00:24:42.750
We have a simple library to gather the metrics we need, focusing primarily on CPU and RAM.
00:24:58.620
Creating self-contained executables in Ruby is straightforward. There are libraries that facilitate this.
00:25:12.840
Next, let's look at how to write the CLI in Ruby as well.
00:25:24.280
The CLI also needs to interact with the state server. We have created libraries to handle this communication.
00:25:36.330
Over the past couple of years, I started using Go and discovered a cool library called Cobra for structuring CLI applications.
00:25:57.140
I implemented a similar structure for our Ruby CLI.
00:26:09.840
The state server is currently written in Rails, which is substantial but meets our current needs.
00:26:20.640
Since this project must progress quickly, we utilized Rails, but it could be rewritten to something lighter in the future.
00:26:36.480
We also have the scheduler already implemented in Ruby.
00:26:49.460
As of now, I have tested this Ruby infrastructure and may incorporate components into our production setup.
00:27:03.970
We may replace our staging environment's agent with our MRuby version once it's ready for testing.
00:27:17.290
We've extracted a library in Ruby for interacting with the LXD daemon, and everything is running well.
00:27:30.590
We also created a mini-framework for structuring the CLI.
00:27:43.870
Currently, two components in our platform are now written in Ruby, previously in Go.
00:27:55.620
It will be interesting to improve those components, for instance, to enhance the scheduler.
00:28:06.400
Lastly, I am focusing on using a fair scheduling algorithm and found some libraries in Ruby that could help.
00:28:20.140
To wrap up, while using MRuby, I noticed some differences from standard Ruby.
00:28:30.840
We don't have a gem file in MRuby; instead, we encounter build conflicts.
00:28:45.050
When changing a gem, we must recompile everything, which is beneficial when creating a single executable.
00:29:01.210
To my surprise, MRuby has enough libraries for our project requirements.
00:29:12.160
Building MRuby executables is simple, and our tests confirm they are smaller than Go equivalents.
00:29:25.100
Additionally, I faced some challenges finding mocking libraries for interfacing with API servers.
00:29:39.510
As a workaround, we created a simple mock server for testing.
00:29:51.450
Another challenge was handling concurrency, as I had difficulty finding threading support in MRuby.
00:30:07.460
I may experiment with fibers for concurrency while sending metrics or creating containers.
00:30:21.520
Also, implementing graceful shutdown is crucial and needs attention.
00:30:34.550
Documentation on MRuby is lacking, especially in English, but we can work on improving it together.
00:30:49.120
Thank you very much for your time. Do we have any questions?
00:30:58.829
(Audience questions and discussion)