How to Hijack, Proxy and Smuggle Sockets with Rack/Ruby

Ruby

Dávid Halász

How to Hijack, Proxy and Smuggle Sockets with Rack/Ruby

by Dávid Halász

In this talk at Ruby Unconf 2019, Dávid Halász explores the intricate workings of hijacking, proxying, and smuggling sockets using Rack and Ruby. The presentation is both educational and practical, highlighting techniques crucial for managing network connections effectively.

The key points discussed include:

Introduction to Socket Management: Halász starts with an overview of socket management and the significance of non-blocking I/O for efficient data handling in Ruby applications.
Live Demonstration: He demonstrates a live Vagrant setup that integrates various components necessary for remote desktop sessions, showing the end-to-end system in action.
Understanding Proxies: The talk delves into how proxies work, explaining the necessity of non-blocking read/write operations and how the IO.select method can help with managing multiple sockets effectively.
Blocking vs. Non-blocking I/O: Halász explains the downsides of blocking I/O, particularly in high-connection scenarios, leading to CPU inefficiencies.
Dynamic Sockets Handling: The solution presented involves dynamic management of socket readiness to avoid deadlocks, enhancing the performance of socket interaction.
WebSocket Integration: The transition from standard HTTP requests to WebSockets is covered, showing how Rack can manage long-lived connections more effectively using socket hijacking.
HTTP to VNC: A unique approach is introduced which allows HTTP connections to be upgraded seamlessly to VNC sessions, facilitating smoother remote control and virtualization experiences.
Real-world Applications: Halász discusses the motivations behind developing this protocol, citing the limitations of existing remote console experiences and potential improvements.
C Code Integration: He also highlights the implementation of C code to enhance socket registration and readiness monitoring, particularly for environments requiring high performance.
Conclusion: The presentation wraps up with the operational demo showcasing a successful VNC session tunneled through the HTTP connection, followed by an invitation for questions.

In conclusion, Halász emphasizes the importance of these techniques in building efficient network-related applications in Ruby, especially in cloud management scenarios. The session fosters a deeper understanding of how to manage connections effectively in different networking contexts while also promoting community engagement through follow-up discussions.

00:00:02.620 Hi everyone, welcome to the next, and actually the last talk for today. No pressure here; I’m Dávid Halász, and I will be talking about Rack and Ruby.

00:00:06.460 So, I guess the topic is socket hijacking, proxying, and smuggling with Rack and Ruby. Give a warm welcome to myself! Hello! Okay, I can hear myself great. So, has anyone seen this talk already? The people who have seen this talk, did they vote for it? That's a bit weird. Anyway, as I said, I'm Dávid, and today I'm going to discuss how to hijack, proxy, and smuggle sockets using Rack and Ruby.

00:00:22.300 Yesterday, I realized Hamburg is great. This is my first time here. Unconf is like Basecamp, and I learned about it recently. Someone explained to me that Basecamp is like Unconf. I also learned about fishbowl discussions. I talked about something similar and had a nice discussion during the first talk with one of the organizers.

00:00:44.020 I also like to live dangerously, so I prepared a live demo that I will run just now. Is Vagrant visible? Okay, I’m going to start a Vagrant box; it will download in the background. So please no torrenting or anything else that could make this talk longer. In the end, everything will make sense, so let's start it!

00:01:18.729 Sorry! As I was saying, I like to live dangerously, and it's really challenging to do a talk after such a party yesterday. I feel like I borrowed too much happiness from the future and now I’m paying the interest. Today, I'm a little uncomfortable because of that. You know how this lifestyle goes, right?

00:01:22.300 So, I'm from the Hungarian-speaking part of Slovakia. I came here from the Czech Republic, from a town called Brno, which is famous for beer, MotoGP, and a genetic experiment conducted 150 years ago. Not this one, but Gregor Johann Mendel did the famous pea experiment in Brno.

00:01:37.340 I work for a small company called Red Hat. This is the new logo; we deal with a project called ManageIQ, which is an application to manage your clouds. If you have a cloud, we can manage it. And of course, it's open-source. One of the things I'm working on is called Remote Desktop Sessions. If you don't know what Remote Desktop Sessions are, you've probably used Windows, right? There's a feature in Windows where you can connect to your computer and take over control. If you enter the right password, you gain remote control of the desktop.

00:02:00.030 We are working on in-browser remote consoles, which are like Remote Desktop Sessions but inside the browser. Surprisingly, they look like this: we have an HTML canvas that connects to your remote endpoint. Inside the browser window, it is a full-featured desktop environment. In our project, it looks like this. You’ll see a button to access the VM console, which opens a pop-up window where you can see your desktop.

00:02:20.200 If we look at the architecture of this setup, we have a virtual machine somewhere, and we manage clouds, so obviously it runs on a hypervisor. This hypervisor exposes a Remote Desktop Protocol. Let’s just say VNC, though there are others. For simplification, I will just talk about one; we support about five or six protocols. However, the browser doesn't really speak VNC directly.

00:02:32.000 The closest thing to VNC is maybe WebSockets, which allow bi-directional communication. We need something in the middle—a bridge for those two endpoints together, which I will refer to as the proxy. Thus, we enter the first chapter of the talk: the proxy.

00:02:44.000 If we look into this proxy closely, we'll notice that it consists of many components like fibers, sockets, concurrency, and networking. We will cover all these aspects. From a different perspective, you have two endpoints: one is the WebSocket, and the other is VNC. You need to read data from the WebSocket endpoint and write data to the VNC endpoint, translating between the two 'languages.'

00:03:04.000 So, let's discuss these endpoints. Essentially, they are sockets. I made a joke about this in Australia, and they didn't quite get it. However, when you use Ruby, you write sockets like this: you connect to the website on port 80, which is HTTP, and then send a request by writing and reading back what you get, including headers and some payload. This represents an HTTP request.

00:03:28.000 If you reflect on the methods of reading, writing, and closing a connection, what comes to mind? Other than sockets? Yes, files! Sockets are basically like files but for networking, with a few twists that we will cover later.

00:03:42.000 When you write to a socket, you are essentially writing to a buffer. When time passes, and the network or the operating system takes this data, it forms packets and sends them over the network card. Do you know what happens if the buffer gets full and you try to continue writing? Some faces here suggest that you don’t know. When you write to a full buffer, this is what you call blocking behavior.

00:04:02.000 Your write operation will wait until there's space in that buffer for writing. This can take even minutes, which is not ideal because you could be doing useful work instead of waiting. Reading works similarly, but I was lazy and didn’t create animations for that. So you will have to use your imagination: on the opposite end, you wait for something to appear in the buffer. If there's nothing there, you'll be stuck waiting.

00:04:31.000 This process is called 'blocking transmission.' If you want to implement a proxy with blocking transmission, you need to create two sockets—one for the VNC and another for the WebSocket—and you need to read from both, translating with some method in the middle before writing to the other side. The two loops are symmetric, and you need to do this unendingly until one of the connections ends.

00:05:02.000 This becomes problematic because if you have, say, 500 connections, you need 1000 threads. Ruby isn’t really equipped to handle that efficiently. Fortunately, there's a solution called non-blocking I/O. Instead of the standard read/write, you would use non-blocking read and write methods, which do not wait if there's nothing in the buffer.

00:05:25.000 If there's nothing to read, you will get an error. Similarly, if you try to write when there's no space in the buffer, you’ll encounter an error. With regard to readiness testing, the waiting condition is separated into a call to IO.select, which takes three arguments and a timeout as input.

00:05:47.000 You can provide arrays for sockets that are meant for reading, writing, and error checking. This select method will wait until something becomes ready. The timeout can be set to infinite, but in practice, I just set it to one second while using an endless loop.

00:06:09.000 Here’s an example of using IO.select where I implement a very simple class. In this class, I have a background thread with an event loop method that tests sockets for readiness. When something is ready, you iterate through the ready sockets and read the data from them.

00:06:23.000 Does this make sense? This process works well for one-directional transfers, but if you want to read from one socket and write to another, you encounter dependency issues. Both sockets need to be ready, which can complicate things, as it’s not easy to handle with a single IO.select.

00:06:39.000 Let’s say we have two sockets, socket A and socket B, mapped together—one being the VNC socket and the other the WebSocket. I have some methods for translation, but here’s what happens when we use IO.select. You iterate through the ready sockets and try to perform the translation, but this can inadvertently lead to a deadlock if socket A is ready for reading but socket B isn't, which can lead to some complications in processing.

00:07:07.000 You may end up in this loop where your process is doing nothing but consuming CPU cycles, which is very inefficient. We use this approach in production, so I explored alternative solutions, such as quitting my job or not using Ruby anymore. It consumed a year of thinking on how to solve this effectively.

00:07:27.000 One option was using threads with blocking I/O, but that’s not effective due to the Global VM Lock. You don’t really want to execute 500 or 1000 threads for half the number of connections. Libraries for asynchronous I/O such as Celluloid and Avid Machine are great, but unfortunately, we are using Postgres in asynchronous mode, which causes failures with these libraries.

00:07:48.000 The asynchronous library is awesome, and I really suggest it, but when I was trying to solve this particular problem, it didn’t exist. We wouldn’t be having this talk if it were; I would have just used that solution now instead. Also, I considered the promise of rainbow fibers in Ruby, where fibers would automatically yield when calling I/O wait, similar to how Crystal works.

00:08:14.000 It would be fantastic to have something like that, allowing you to define fibers like threads, perform blocking operations, and the system would magically convert them into non-blocking communication within the Ruby VM. However, this feature is only planned for Ruby 3.

00:08:51.000 So my colleague and I came up with a solution involving bouncing select. The key to this approach is having the arrays used in IO.select be dynamic. This code shows the updated method with adjusted elements for readability when performing reads and writes. Here, you see that you are dynamically managing the arrays, meaning when you call IO.select, you remove the ready sockets from the arrays.

00:09:22.000 After data transmission, you push them back to the end of the arrays. Because of this, if socket A is ready for only reading, it is removed from one of the arrays. The next IO.select call won’t have socket A included, preventing the potential for spin-locks while iterating.

00:09:48.000 This approach dynamically edits the array of sockets, but it’s still not the most ideal solution. Each time you call IO.select, you have to push all those sockets to the kernel, and this can become time-consuming, especially if you have a large number of socket descriptors.

00:10:12.000 I investigated an alternative called epoll, which divides the socket registration process and socket readiness checking into two separate operations. It copies the socket only once into a permanent structure, then separately checks its readiness. However, this option is currently only available on Linux.

00:10:33.000 I apologize to the Mac users; however, there's an alternative for BSD, known as KQueue, that works on Mac. If anyone wants to implement this feature, please feel free to do so! As of now, I’m falling back to using normal IO.select since our setup runs Linux in production.

00:11:02.000 A really cool feature of epoll is it has a flag called EPOLLONESHOT. When you register a socket with this flag, it gets removed automatically from the internal epoll structure after becoming ready for reading or writing.

00:11:31.000 This means that the second removal step that we talked about happens on its own. To implement this, we wrote some C code for socket registration, which works well with our Ruby structure and compiles on both platforms.

00:11:51.000 Now, let’s pivot to the second part of the talk, focusing on WebSockets. WebSockets are bi-directional connections that are built on top of HTTP. Instead of using the request-response model, you ask the server to upgrade the connection from HTTP to something else via a handshake.

00:12:17.000 Your browser communicates with the web server, sending a request with connection upgrade headers. The server responds to confirm it can speak WebSockets, allowing you to switch protocols during runtime. Essentially, you open one connection, and it becomes something else, which is pretty fantastic.

00:12:43.000 If we take a look at Rack, Ruby's server middleware, it follows a straightforward request-response model. You define a lambda or method that handles incoming requests. It returns three elements—the status code, a hash of HTTP response headers, and a body array that contains the content.

00:13:01.000 This is a basic Hello World example from the Rack website, where the server simply responds with 'Hello' to any HTTP request. Each time you open a website, it triggers that function call, essentially functioning as a server.

00:13:26.000 However, this pattern doesn’t work well for long-lived connections, as you'd become stuck in that function call and exhaust resources.

00:13:44.000 To address this, the developer behind Rack proposed socket hijacking. This technique essentially asks the WebSocket Rack server to hand over the socket so that you can manage it independently. I came across a fun reference in popular culture about hijacking; there’s a character in a movie who misunderstands the greeting 'hi' as hijack.

00:14:10.000 It's a humorous take on the idea of hijacking sockets. This model allows you to take control of how connections are managed instead of strictly adhering to Rack’s defaults. So, for instance, you can implement your proxy that operates in the background, managing connections independently.

00:14:41.000 With that in mind, you can connect to a remote endpoint VNC socket and push this to the proxy while returning something dummy so that Rack does not complain about ongoing processing.

00:15:02.000 Now, let’s talk about some smuggling and crazy alternatives. When I was exploring the HTTP upgrade feature, I wondered if we could upgrade to something different. The answer turned out to be yes—it works!

00:15:23.000 Initially, my goal was to experiment with changing the architecture slightly. By using this trick, it would be possible to run a standard VNC desktop client connection through an HTTP proxy, converting it back to VNC at the remote proxy to connect to your VM. Crazy, right?

00:15:48.000 Now you may still be wondering, why would I consider doing this? From the perspective of some of our customers, the experience of remote consoles isn't always smooth. You often lack features like clipboard support and proper key mapping, making it quite uncomfortable to navigate through the HTML5 canvas, which can be slow.

00:16:10.000 It becomes a challenge when you compare the functionality of browser-based tools against dedicated desktop applications, which are typically faster and more efficient. Additionally, using SSH from your preferred terminal instead of from the browser window can be a better experience due to key shortcut issues.

00:16:39.000 In software engineering, two notoriously hard problems are cache invalidation and the act of naming things. I tried to come up with a name for this protocol upgrade that reflects its purpose. You've likely seen the term PROR, which stands for 'Protocol Upgrade Raw Request.' It's about upgrading or downgrading from HTTP to VNC, SSH, or any other protocol.

00:17:05.000 This method allows for TCP smuggling through an HTTP connection. With our product, the plan is to execute it as follows: we maintain a cluster with our ManageIQ system running, alongside a web server proxy.

00:17:26.000 Let’s break down the workflow: the browser accesses a VM and initiates the connection through the web server, which in turn communicates with the proxy using an HTTP upgrade to establish a protocol connection. The proxy sends back an HTTP 101 response, transitioning to a per-connection that effectively behaves like a VNC connection but is initiated through an HTTP handshake.

00:17:48.000 Through this connection, illustrated by the blue line on screen, you create a tunnel to your VM over an HTTP connection, effectively bypassing firewall rules—thiis enables accessibility even in restricted environments where only HTTP is permitted.

00:18:12.000 But then we hit another hurdle: the plug-in doesn’t make it easy, as you cannot just open TCP ports directly from a browser. Browsers have limitations, and while there are some potential solutions being discussed for future standards, they aren't entirely available yet. Thus, a native library approach emerges as an alternative.

00:18:36.000 Currently, my client application, written in Go for its excellent concurrency capabilities, manages both Windows and Linux configurations. I still need a front-end library, and I'm open to any JavaScript developers who’d like to contribute. What I've created so far operates similarly to a Rack application.

00:19:01.000 The server handles HTTP requests to establish the connections necessary for the underlying applications and potential clients. By leveraging middleware practices, I implemented features for managing authentication, logging, routing, and more.

00:19:20.000 Although it still relies on both the browser plug-in and native libraries, this functionality remains in its nascent stages. But it is fully operational, and I am continually improving the capabilities of this system.

00:19:43.000 I really hope this live demonstration is going smoothly, and I have about six minutes remaining; that's certainly enough time!

00:20:05.000 Here, you can see the demo is running. I’ll show you what this Vagrant setup looks like. Should I zoom in more? Okay, perhaps that’s too much. Anyway, I've installed Docker and a few essential packages. Ignore the first line; I pulled down two containers—a VNC server that runs a desktop environment within a container and an SSH server.

00:20:35.000 Both servers are running on a VM that isn't directly accessible from my computer. On the VM itself, I set up Puma—a web server for handling requests—and used aliases to facilitate simpler access to the resources.

00:20:55.000 Currently, we have a shell script running Puma alongside the VNC server, which forwards requests based on the incoming URL, either to the VNC server or SSH server as appropriate. The index file for the application contains JavaScript code necessary for interacting with the browser plug-in.

00:21:20.000 The script dispatches an app or request event, opening either VNC or SSH depending on the button clicked. So now I will start the server and check the VM address.

00:21:42.000 Okay, this URL is something you shouldn’t see, but here is a demo to trigger VNC connection. I really hope this works; usually, it does.

00:22:03.000 The password for accessing it is simply ‘demo.’ Here, you can see the VNC connection running through the HTTP browser plug-in. If you are concerned about the performance, I assure you it’s effective.

00:22:31.000 Now you can see it’s successfully accessing the VNC session tunneled through the HTTP web server and routed back to the container where it’s running.

Ruby Unconf 2019