Talks

ØMQ - A way towards fully distributed architectures

This video was recorded on http://wrocloverb.com. You should follow us at https://twitter.com/wrocloverb. See you next year!

In this talk Martin shows basic ØMQ concepts and tells how Rubyists can take advantage of them.

wroc_love.rb 2012

00:00:12.160 Hello, my name is Martin Sustrik. I am the original designer of ØMQ, and I wrote most of the code. Today, I’m going to speak about it.
00:00:24.240 Originally, I wanted to start with some background on distributed computing and messaging. However, Chris did a good job introducing it, so I will skip that. But I would like to add one point: ten years ago, it made a lot of sense to write monolithic applications because what you had was just a single computer at home.
00:00:35.120 Today, you have your computer, your laptop, your mobile phone, your girlfriend’s computer, her mobile phone, and a lot of processors all over the place. And if you are deploying applications, you can easily get a clustered environment on Amazon for a few dollars—like 100 nodes without any problem.
00:01:02.719 Even if you are writing an application for a single computer, you have dual-core laptops, quad-core desktops, and 8 or 16 cores in servers. Therefore, you still have to distribute the work across different cores on the machine. What we are doing today should be distributed.
00:01:27.680 ØMQ is a tool that helps you distribute work. This talk is going to be quite technical; it won’t cover the philosophy presented by Chris. Instead, I’ll focus on the technical parts. One thing to note is that we are forking ØMQ at the moment due to some trademark issues, so in the future, you might hear about Crossroads rather than ØMQ.
00:01:41.840 From your perspective as developers, there is no difference. Additionally, I want to mention that I am not a Ruby programmer. If you have any questions about the Ruby API afterwards, please ask Lawrence, who is over there. He's maintaining one of the Ruby bindings, and he's more familiar with the Ruby API. I do have Ruby examples in the presentation, but I caution you that I'm not a Ruby programmer, so there may be some syntax errors.
00:02:19.280 Now, the very basic thing you can do with distributed applications is something like this: you have Application A and Application B. Application A says 'Hello,' and Application B replies 'World.' So, how would you implement that using ØMQ? Let’s go to the code. This is the application.
00:02:46.399 Let's think of one application as a client and the other as a server. The client sends 'Hello,' and the server replies 'World.' This is the client code, and as you can see, it's quite simple. ØMQ is a socket library, and you are probably familiar with standard POSIX BSD sockets, so it's very similar.
00:03:10.080 You just create a socket, and there is a created context which acts as shared state. You don't have to care about it too much. Then, you create a socket and specify the type, which means 'REQ' for requester. A requester is someone who sends requests and gets replies.
00:03:39.119 Next, you connect your sockets to the server, send 'Hello,' and get the reply, printing it to the console. That’s it—just around seven lines of code. On the server side, it's basically the same thing. The only difference here is that the socket type is 'REP,' meaning replier. A replier receives requests and sends replies.
00:04:10.799 Instead of connecting to some remote endpoint, you bind it, meaning you create a server. In this case, you bind to port 5559, and the server runs continuously. There's an infinite loop; it just waits for requests from the client and replies with 'World.' It's straightforward.
00:04:47.360 The important thing to note here is that this code works for both applications A and B. It also works in a topology like this, and there’s no change to the code required. The code you've just seen can handle many clients, meaning the server can deal with tens of thousands of clients in parallel without needing any additional code.
00:05:21.919 ØMQ handles the establishment of connections for you, including reconnections. For example, if the network fails and then gets restarted, the connection will re-establish itself automatically. You don’t have to worry about that.
00:05:49.280 Moreover, if the server fails, the clients keep running, and once you restart the server, it reconnects, and everything continues to operate smoothly. You can even start a client first and then start a server, and it will still work.
00:06:06.960 Now, what if you want just one client to communicate with many servers? This scenario might arise in load balancing situations. For instance, you may have numerous web service clients, and a single server is not enough to handle all the requests. Thus, you want to start several instances of the server to distribute the load.
00:06:59.039 In this case, the client connects to several servers. The first request goes to the first server, the second request goes to the second server, and the third request returns to the first server, and so on. This is simply a round-robin approach.
00:07:27.679 Once again, you don’t need to change the code for this—just run it, and it works. An interesting feature of ØMQ sockets is that you can connect multiple times as a single socket. A socket can handle many connections.
00:08:08.000 So, you can create a socket and connect it to different servers, or even bind it and connect to both the client and the servers. It's quite flexible. The application's logic remains very simple—just a loop to send and receive.
00:08:42.719 So, what if you want to distribute the same data to many clients? This situation often arises in contexts such as distributing stock prices to clients, weather updates, or game state information.
00:09:02.160 Here’s an example: once again, you create a socket, but this time you use a different socket type: 'PUB,' which means publisher. This is the socket that sends the same message to everyone. The previous sockets we examined facilitate load balancing, but this one is meant for distribution.
00:09:48.880 You bind the socket to a TCP port and send a 'Hello' message every second. On the client side, it’s very similar. The socket type is 'SUB,' meaning subscriber, and you connect to the server.
00:10:06.960 You’ll also see that ØMQ performs name resolving for you, so you don’t need to fill in all the complex structures like you would with BSD sockets. You simply provide the server name and the port, and it gets resolved. Then, in a loop, you receive the messages and print them to the console.
00:10:29.600 One special feature of subscriber sockets is built-in filtering. In some cases, you don’t want to receive all messages, just a subset. For instance, when distributing stock prices, a client might only be interested in Cisco's stock and want to filter for only those messages.
00:11:06.000 That’s where the ØMQ SUBSCRIBE socket option comes into play. In this instance, you subscribe to all messages by saying everything that starts with an empty string is something you want to receive. However, you could subscribe for messages starting with 'ABC' to receive only those.
00:11:40.480 New versions of ØMQ also support subscription forwarding, meaning when you filter messages, it doesn’t happen at the client. Instead, the subscription is sent to the publisher, who filters the messages. This is efficient because if you have a huge load of messages, only the relevant ones are sent over the network.
00:12:08.640 Now, let’s consider the same socket types but in reverse: many publishers and just one subscriber. This is a classic logging scenario. You may have many applications each sending logs to a central server, which then displays or stores them in a database.
00:12:32.000 Once again, you don’t have to change any code—what you’ve seen before will work just as well.
00:13:12.640 These are the two basic patterns: request-reply and publish-subscribe. There are other patterns which I will not dive into now since the talk is only 30 minutes long. You can look them up in the guide and documentation.
00:13:42.320 So, having understood this infrastructure, how do we actually use it? This is an example of a simple multiplayer game—a text adventure game.
00:14:03.680 For a multiplayer game, the first requirement is to send commands to the server and receive responses to determine the outcomes of actions. In this case, the client represents the player sending actions to the server, identified by their name.
00:14:30.960 The actions could be sent in JSON format, but ØMQ is agnostic to the format. The client sends an action, the server processes it, and sends back a reply indicating what happened. As before, the code is quite simple—just seven lines for both the client and server.
00:14:59.760 In a multiplayer game, you also want to distribute the game state to all clients so that each client is aware of changes in the game world. For example, you might distribute the position of players in the game so that all clients are updated.
00:15:31.680 You can use PUB-SUB sockets for this distribution, as the publisher sends the same data to everyone. So, let’s imagine we combine the previous examples into one application.
00:15:56.159 The server has two sockets: one for replies (REPL) and one for publishing (PUB). The REPL socket is used to receive user actions and send the outcomes back to clients, while the PUB socket distributes the game state.
00:16:24.800 In this simple example with two clients, each has a request socket to send actions to the server and receive replies, while also having a sub socket to get updates on the game state.
00:16:52.000 For instance, if the first client sends an action to the server via the REPL socket and the server determines that the game state has changed, it publishes that change through the PUB socket, updating all clients.
00:17:26.880 This demonstrates the basic idea of how to build applications using ØMQ, focusing on two main patterns: request-reply and publish-subscribe. However, there are a few more advanced patterns like push-pull, which are used in more complex contexts for distributing workloads across multiple servers.
00:17:57.920 Now, imagine you have developed your multiplayer game and are providing it as a web service. At some point, you might encounter scaling problems as your user base grows, which can put too much load on a single server.
00:18:34.960 For example, the processing time for the game state could increase, causing slow response times for users. In such scenarios, scaling up your architecture becomes essential.
00:18:56.640 In this example, envision that your data center contains the primary server where all clients connect. You can employ a request socket within the server to connect to different worker machines. When clients send requests to the main server, it can balance the load by distributing those requests to the available worker nodes.
00:19:26.879 This simple adjustment keeps the code minimal—once again, just a few lines—and once implemented, if performance starts to lag, you just start more REPL sockets without requiring any restarts or reconfigurations.
00:20:11.440 This functionality extends to cloud environments; imagine simply spinning up a few more instances on Amazon, and your architecture scales effortlessly.
00:20:39.120 Another problem you might encounter is regional performance. Suppose your game server is operating in Wrocław but suddenly experiences a large influx of users from New Zealand. As we know, New Zealand has poor connectivity.
00:20:59.440 Users typically have to rely on slow satellite internet or limited bandwidth undersea cables, resulting in a situation where multiple clients are simultaneously sending the same data, effectively overloading the available bandwidth.
00:21:31.680 To address this, you can set up a server in Wellington that runs a very simple application with two sockets—one PUB and one SUB. The server reads messages from the SUB socket and publishes them via the PUB socket.
00:22:11.680 This setup allows the server in Wellington to connect to your data center in Wrocław, receiving messages only once. The server then distributes the messages locally to clients in Wellington, Auckland, and Christchurch, dramatically reducing bandwidth use.
00:22:38.400 These features are just some examples of what ØMQ can do. ØMQ is built to be fast and efficient, with the ability to achieve low latency and high throughput if you have the right hardware.
00:23:03.680 For instance, latency can be reduced to around 20 microseconds, with throughput potentially reaching up to six million messages per second. However, when used from Ruby, the performance will be limited by the Ruby bindings and the performance characteristics of the Ruby language itself.
00:23:52.000 This is often demonstrated where Ruby can handle around a million messages, possibly up to 200,000 under optimal conditions. But you also have garbage collection overhead to consider.
00:24:25.760 Now, while I’m not well-versed in Ruby, one relevant example would be the Python binding, which was developed for scientific computing. In cases where Python’s global interpreter lock prevents concurrent scaling, users run multiple Python processes and leverage ØMQ as the glue between those processes.
00:25:07.840 If you are in a similar position using Ruby (especially with MRI), ØMQ can also be beneficial as a communication method between multiple processes.
00:25:50.080 Furthermore, ØMQ supports various underlying transports, not just TCP. Apart from TCP, you can also use IPC (Inter-process communication), which allows communication between processes on UNIX systems using UNIX domain sockets.
00:26:19.040 There are also transport options for threads, and multicast protocols like PGM can be utilized for many computers to receive a single packet simultaneously.
00:26:56.880 ØMQ is frequently used as a backend for web applications. On the frontend, HTTP may be used, while ØMQ serves as an efficient backend for processing requests. For instance, several projects like Mongrel2 and NGINX integrate ØMQ for backend operations.
00:27:41.440 The licensing for ØMQ is LGPL, meaning you can link it with proprietary applications. If you modify ØMQ itself, you are required to contribute those modifications back.
00:28:07.760 Several notable users of ØMQ include CERN for scientific computing tasks, Los Alamos Laboratories, GitHub, Spotify, and Twitter for processing large-scale data traffic.
00:28:51.760 For more information, you can visit ØMQ's official site, and as mentioned earlier, we are intending to fork it, so look for Crossroads.io in the near future.
00:29:05.680 At this moment, I will open the floor for questions.
00:29:19.680 This is not a persistent queue. If you send a message and the second process is down, the message is lost. It’s similar to a networking stack; if one client dies, the message cannot be retrieved.
00:29:46.880 If you have any more questions, feel free to ask. I may not be a Ruby expert, but I’m here to help.
00:30:24.799 Regarding potential performance issues, there is an interest in incorporating ØMQ into Linux kernel in the long term. Although patches already exist, complete integration is likely a decade away.
00:31:21.920 In conclusion, ØMQ is a versatile solution that supports various programming languages and environments. Its ability to efficiently distribute workloads makes it a valuable asset for both small and large-scale applications.