00:00:08.400
Okay, first of all, thank you all for coming.
00:00:10.920
I know I'm the only thing standing between you and lunch, so I really appreciate you making it today.
00:00:13.880
We're going to talk about how to build an exchange. Yes, I know you're probably thinking, 'Not another talk about exchanges!' Every Ruby conference seems to have one of these by now.
00:00:22.760
Building financial exchanges can sound a bit boring, right? Well, I built two before breakfast already, but hopefully, we'll learn something together.
00:00:36.360
But first, we have to have a speaker introduction. My name is Tim, and let's skip the last name because, well, leave it up to the Germans to confuse everybody with umlauts!
00:00:40.920
What you need to know about me is that I work in finance at a German exchange called Börse Stuttgart. I don't have a trust fund, I'm not 65, and I don't have blue eyes. So, that's basically it. I'm really sorry I'm not the whole package!
00:01:15.200
Now let's start by defining what an exchange is. An exchange is a marketplace where people meet to buy and sell securities, derivatives, and other financial instruments.
00:01:22.200
Now that we know how Investopedia defines a financial exchange, let’s consider what it really is. Considering this picture, perhaps back then they didn’t have color, and so that’s how they rolled.
00:01:37.399
Nobody noticed that the suits were pretty boring; it was just boring people in boring suits shouting prices at each other and swapping papers in a building—that's all there was to it.
00:01:47.600
These days, you're probably more familiar with the New York Stock Exchange, where big screens display prices on television. Revolutionary stuff, really, but don't be fooled! These days, people still shout prices at each other; it’s really just a glorified TV studio.
00:02:11.120
The real trading takes place elsewhere—admittedly, this is a Google data center, and to be honest, financial data centers are usually not this fancy.
00:02:15.480
They focus on cost-cutting, so there’s generally no light; it’s just darkness. And this is the reason we talk about it today.
00:02:30.480
While I have seen a trading floor from above, I haven’t been on one yet. So, if it’s just a boring building or a data center, why go to an exchange in the first place? Sure, I can shout prices myself, but an exchange offers something crucial.
00:03:14.200
The responsibility of an exchange is to match buyers and sellers fairly. One crucial feature is price transparency; you know what something is worth and what people are willing to pay.
00:03:24.920
This also includes predetermined rules. For example, when two people want to buy something at the same price, how do we determine who comes first? It's generally not a good practice to leave such matters up for chance.
00:03:36.440
For stocks, that would be very bad for business! So, how do we manage this? We maintain a central list of buy and sell orders for each market.
00:03:44.480
What does that really mean? A market can be anything: a commodity like onions, stocks, or derivatives. Each stock represents a separate market. For instance, Stock A is one market, while Stock B is another—we need to make clear they are not the same!
00:04:03.440
If you have a buy order, it goes into the list, and if you have a sell order, it goes into that list. We keep a separate list for each stock, derivative, onion—whatever it may be.
00:04:13.040
Now let’s examine the heart of the exchange. At its core, it’s really quite creative. If you know about an admin interface, you would be familiar with this table—it’s pretty straightforward.
00:04:18.000
This table shows the buy side, sell side, prices, and quantities. The red line differentiates between the buy and sell sides. How do you read this? Let’s look at the first row.
00:04:30.800
We see there's a demand to buy 20 shares at a price of 8 or less. The 'less' is crucial here; if someone comes around and offers you the shares for 7, you would be foolish to say 'no, I wanted to pay 8.' That's not smart business practice! So the limit here is clear: I'm willing to pay up to 8.
00:05:09.720
And the same logic applies to the sell side. If somebody wants to sell shares for 15 or more, and an offer comes in for 16, they' would take it—it’s natural to want the extra money.
00:05:15.680
To really understand how an order works, let’s run through an example. Suppose I want to buy 20 shares at a maximum price of 12. If we look at the order book—it’s essentially the same as what we had before—I’d point out that we have an overlap: 12 on the buy side and 12 on the sell side. And you know what that means? We can create a trade!
00:05:42.800
Amazing, right? After updating everything, we note that we wanted to buy 20 shares, but only 23 are available for sale. Therefore, we write the remaining 3 on the sell side. From here, we end up completing the trade.
00:06:20.880
A trade consists of two sides: a buy side and a sell side. The agreed-upon quantities and price are fundamental components of any trade. Sometimes, additional information may also be important, but for now, that is the essence of it.
00:06:40.360
For those who might want to know, a real trade always has two sides. If there’s only one side, it’s not a true trade—just ask Bernie Madoff!
00:07:00.760
Now that we've discussed the fundamentals, let’s dive into the technical side of things. This is where we encounter the concept of Big O notation, which relates to data structures.
00:07:09.680
One important aspect of an exchange is speed, which also involves latency. Therefore, all common operations ideally should occur in constant time. In terms of performance, we target O(1) as the ideal; less than that may be acceptable, but generally, O(1) is our goal.
00:07:40.080
Let’s look at the data structures in use. Limit orders are represented in the table, as we are willing to pay up to a specific price level, which includes price and the orders to buy or sell at that price.
00:08:04.240
It's essentially a linked list. So if you've learned about that in university, it still exists and is applied in real life. We work with this structure as a queue, typically popping from the beginning; then the first order is processed, and we append to the end.
00:08:26.520
Now let’s introduce limit orders; these are basic Ruby structs. We assign an ID to identify the order at the end of the day, along with user ID. We need to know who entered the trade, the order quantity, and the specified price.
00:08:42.200
Remember, a trade always requires both a buy and a sell side, so they are both included in the trade structure. The order book is a bit more complex, as it consists of a buy side and a sell side.
00:09:11.040
They are organized as sorted maps, with mapping established from prices to limits. Each limit already has a price, but we desire this explicit mapping for ease of use. And for sorted maps, we generally employ structures like a B-tree or a red-black tree—all the fancy data structures you may have heard of but never used.
00:09:49.360
Alright, let’s now focus on the matching process. We’ve already done a theoretical match on paper, but now we need to implement it in code.
00:10:08.000
It’s fairly simple: you get the highest buy order and the lowest sell order, meaning the highest price on the buy side and the lowest sell price. After identifying these two orders, you check for overlaps—whether they match at any capacity.
00:10:30.160
Establishing the minimum quantity for exchange is crucial. Please remember: if one person is willing to sell 23 shares and the other wants to buy 20, we can't swap out 23, given that someone is going to end up with more than they initially wanted.
00:10:53.600
So we must check the available quantities before we proceed to create the trade object we discussed earlier. Then, we update our order book accordingly at that price level, publishing the trade to proceed.
00:11:10.680
This code in the presentation is a simplified version of the actual implementation, but it’s pretty close to the real thing. For reference, there’s a loop in Ruby that you may have never encountered, but it is part of the language.
00:11:32.400
Now let’s consider something interesting: the system components. We need system requirements that prioritize high availability. If you start trading in the morning at 7 AM and wrap it up at 10 PM, you need to be up and running the entire time.
00:12:05.040
When substantial sums of money are involved, people frown upon exchanges going down, so high availability should be considered a standard parameter.
00:12:24.680
During a trading day, processing requests and orders can be overwhelming. We also have to focus on auditability since the sum of money traded is significant. If someone contests an order, you must be able to pinpoint where it entered the system.
00:12:54.440
Lastly, we need to ensure determinism in our system—nobody wants to be the developer caught saying, 'I know it's a bug, but I couldn't reproduce it in RPC.' We can't have the excuse of 'well, there was a full moon, and trades just happened,' so we need consistent performance across all transactions.
00:13:43.760
So, how do we achieve it? Let’s lay some ground rules: first, no threading—not now, not ever. We'll utilize a single while loop to handle our exchange, processing one message after another without involving complicated threading logic.
00:14:00.040
Secondly, we avoid any IO operations. For those familiar with functional programming, we won't read data randomly or conduct system operations. Instead, we'll manage the system like this: we have a state, apply an event, and generate a new state.
00:14:27.680
This structure essentially exemplifies a state machine. Everyone usually has a state machine in their system to some extent, but often it isn’t well-defined. What we need is a clearly articulated state machine, laying the foundation of our structure.
00:14:58.080
Consider it easy with this setup. Clients can connect via WebSocket, or in many cases, a TCP socket found outside the browser realm. Every message received enters a message queue, and then we advance to the matching engine.
00:15:32.640
This is the heart of the exchange, processing messages according to the established rules. Each process sends the result back via the message queue to the WebSocket, ensuring a full ordering of events as they remain sequenced in the message queue.
00:16:09.200
This method allows us to tackle the issue of determinism efficiently, as we can replay everything if needed.
00:16:34.350
Now let's address the elephant in the room: where’s the database? We don’t require an actual relational database. Yes, databases have their strengths, including ACID compliance, transaction safety, and isolation.
00:16:55.440
However, they often come with a cost—speed and latency—and are not viable for our setup.
00:17:25.280
To provide persistence without sacrificing speed, we can take the message bus, rename it to an event bus, and introduce a persisted log on it. This might be a write-ahead log or a key-value store. Through the event bus, we can maintain specific ordering, adding sequence numbers to messages.
00:17:52.680
The end objective is to ensure that everything is persisted efficiently. Naturally, we still have to clarify how the guarantees that provide persistence actually work.
00:18:15.680
In this process, we should discuss availability. Theoretically, we could run this entire operation on a single machine, but we know how unpredictable that can be.
00:18:48.640
With only one machine, while it's easy to set up, it also poses higher risks. Therefore, we must consider the reliability provided by TCP.
00:19:12.640
This protocol serves as the backbone of the internet, providing reliability, flow control, and congestion management—all critical for a proper network operation.
00:19:38.440
However, these reliability mechanisms impose costs, primarily due to the constant acknowledgment processes required.
00:20:01.160
Let’s explore UDP, which is the unreliable protocol without the need for a connection—imagine screaming into the void! People who have never used UDP often gasp in horror at the thought of losing messages.
00:20:37.000
But it can be beneficial! Remember: we have our sequence numbers. Messages can be checked chronologically; should any be missing, we can request that specific message again.
00:20:58.400
Doing so allows us to implement a very optimistic protocol and thus unleash potential improvements.
00:21:19.680
So, why choose this? We prioritize UDP to avoid the hassles of constant acknowledgment while also considering the advancements in network infrastructure, such as multicast.
00:21:38.400
Multicast is a fantastic technique where a single message gets replicated and sent to multiple subscribers without needing to know all recipients individually.
00:22:00.120
This approach saves on bandwidth and reduces latency, making it quite effective, especially with modern hardware switches that handle routing efficiently.
00:22:41.720
The architecture now looks a bit different. We still have WebSockets for connectivity, but without the event bus. Instead, we introduce a sequencer that organizes messages.
00:23:06.520
This sequencer tags each message with a number, after which it uses multicast for distribution to multiple matching engines.
00:23:31.960
The state machine ensures that same inputs yield the same outputs. As long as messages are sent to all matching engines, they will eventually produce consistent results.
00:23:46.920
The sequencer utilizes multicast to relay messages to all processing engines. Each of these then sends results to a publisher. The publisher gathers and compares outputs from each matching engine.
00:24:16.080
The consistent messaging helps identify any discrepancies in order results, ensuring accurate agreement among the engines.
00:24:49.520
From the publisher, we then persist the logs, which allows us to maintain our auditability standards. This continuity ensures that if a matching engine crashes, we can restore operations quickly.
00:25:16.680
In the event of a crash, a new matching engine can boot up and listen to the multicast group. It will acquire the last processed state and apply any missed events to ensure continuity in operations.
00:25:43.680
Thus, we've significantly minimized downtime caused by any potential engine failures Additionally, because the engines hold little state, redeploying them becomes a swift process.
00:26:04.040
So, to summarize, you may not need a traditional database setup. A well-structured system can effectively reduce fallout in case of failures.
00:26:31.520
Lastly, a clearly defined state machine can make a significant difference in managing operations. That’s it! You can find my slides on the left. You can also check out repo.timler.me where you have access to a fully operational version with a React front end. Enter your own price and quantity, and perform paper trades. Thank you very much for your attention, and I hope you learned something!