Building a Collaborative Text Editor

00:00:11 Hey, I'm looking at a slide counter that says slide 1 of 177, so I'm just going to get started. I hope your day has been going well so far. Have you seen some cool talks and learned some new and exciting things? Thanks for coming by, and I'll do my best not to keep you from the happy hour.

00:00:22 I'm Justin Weiss, and I'm on the engineering team at AHA, which is the world's number one product roadmap company. We're a distributed company, and we're hiring remote positions all across North America. I also occasionally write for my site, justinweiss.com, and I wrote a book called Practicing Rails to help developers learn Rails without getting overwhelmed.

00:00:35 I've been using Rails for over a decade, and one of my favorite things about Rails is how effectively it solves the problems of about 99% of websites. You know, take some information, do some stuff with it, and show it to some other users.

00:00:47 But eventually, hopefully your site becomes super popular, and you start collecting a lot of information from many people. Sometimes, you may find yourself trying to gather lots of data from numerous individuals into the same place and at the same time, and this can be disastrous.

00:01:05 Have you ever written a beautiful multi-page bug description, thrown some photos in there, and maybe added a short screencast? Then, a coworker comes back from lunch, notices a typo, and helpfully destroys your entire morning's work? Or consider a situation where several smart people try to put flight information into the same table, constantly overwriting each other's changes.

00:01:30 Wouldn't it be great if everyone could just work on a document at the same time? If your text editor could handle all of those updates instead of you having to coordinate on Slack like a bunch of stranded preteens from Lord of the Flies?

00:01:52 As more companies and individuals move their documents online, collaborative editing becomes an essential feature. Google Docs is probably the first example that comes to mind, but there are new apps like Quip and Coda that are building entire businesses around the idea of collaborative editing.

00:02:12 Other text editors are incorporating this feature too, like Sublime and Atom, and even Confluence now supports collaborative editing. If you want to avoid the hassle of passing around the editing football just to get work done, you need it in your app.

00:02:33 What does collaborative work look like for developers? As developers, you probably envision making changes to your document while someone else does the same. But then you have to merge and resolve any conflicts that arise.

00:02:55 That's great if you can make changes without the need to check in and check out the file, or to ensure the other person is done before starting your edits. This approach is known as making changes optimistically.

00:03:10 However, it’s not ideal if you and I are editing the same document simultaneously; we don’t want to be interrupted every few minutes to deal with conflicts. Conflicts don’t occur often, only when our changes interfere with one another.

00:03:27 When a conflict happens, instead of being bugged about it, what if the document could just make its best guess on what to do? One of us can come back and fix it if it guessed wrong. In theory, this sounds terrible and likely wouldn't work, but in practice, it mostly does.

00:03:53 That's because the system doesn’t need to be perfect; it just needs to maintain consistency and understand your intent. Here's what that means: if I type the word 'hello' into a document, the system should do its very best to ensure that 'hello' ends up in the final state of that document.

00:04:12 If someone else types 'bye' in the same spot at the same time, the two documents should eventually end up exactly the same, whether that's 'hello bye' or 'bye hello.' They need to be consistent in the end.

00:04:29 Now, let's talk about how we resolve conflicts. People are natural conflict-resolving machines. If you're walking down the hallway and someone is about to walk into you, you'll both stop and likely move over to the same side. If you're unlucky, you might both move back over to the original side.

00:04:43 Eventually, one person will stand still while the other one crosses, and the conflict is resolved. You want your editor to respond quickly to the person using it. If I make a change, I don't want to wait for a round trip to the server before I can see the result.

00:05:02 You also want others to see the changes that I made as best as they can, and you want that to happen quickly. So how can you achieve this?

00:05:15 The first thought might be sending diffs, such as 'I changed line 5 from this to that.' However, it's challenging to see intent in a diff. It merely shows what changed and not why.

00:05:35 A better way is to conceptualize it in terms of actions a person could take, like 'I inserted character A at position 5' or 'I deleted character B at position 8.' You can define these actions or operations to suit your needs.

00:05:50 Everything from inserting text to making text bold can fit into these actions. You can apply these operations to a document. When you apply an operation, the document changes.

00:06:10 For instance, applying an operation here would change the document from 'Hello' to 'Hello World.' If client A sends an insert 'World' at position five to client B, after applying that operation, both could end up with the same document.

00:06:31 However, this isn't foolproof. Both clients can be changing the document simultaneously. For example, if the left client types the letter 'C' at position zero, making the word 'Cap,' and the right client types 'R' at position one, forming 'Art,' they are now on a collision course.

00:06:50 The client on the right receives 'Insert C at 0' and ends up with 'Cart'. But then, when the client on the left gets the 'Insert R at 1' operation, it results in 'Kratt.' This creates a situation where both documents do not remain consistent.

00:07:11 If the client on the left deletes the character 'A', it actually deletes 'R' on the other document, which is incorrect and can be confusing. To address this issue, we introduce operational transformation.

00:07:30 Operational transformation allows us to take operations and modify them. Let's revisit the problem: we have two documents where operations occur at the same moment. When I say 'at the same time,' I mean on the same document state.

00:07:48 When two operations happen on the same document, we may have to change one of them because these operations can affect each other. The order of operations matters, so the insert 'R' operation needs to follow the insert 'C,' which means that it should shift its position.

00:08:06 If 'C' has been inserted, the position that was initially one becomes position two. The same holds for the other client; it should insert 'C' at position zero without adjustments because nothing preceeds it.

00:08:26 This is an example of how one operation may need to change due to another operation. Essentially, you are attempting to ascertain how to adjust operation B based on what happens with operation A before applying it.

00:08:46 This might be hard, especially as operations get more complex. I draw boxes to visually lay out the process. In the top left corner, I note the document state, draw an arrow to the right, and write one operation, like 'Insert C at 0.' Then, I diagram the new result in the top right.

00:09:06 Next, I draw a line going down for the other operation, 'Insert R at 1,' and write down the resulting document state after that operation. Now, in the lower right-hand corner, what should the document look like after both operations have occurred?

00:09:25 Sometimes this is clear, while other times you must think about user expectations. In this case, it is apparent that the answer is 'Cart.' To achieve this, we have two arrows to fill in to demonstrate what needs to occur to transition from one word to another.

00:09:42 Transformations become easier to test once you understand what they entail. In TDD, you can apply these examples as they serve as the actual situation. The expected results can also be depicted and compared with transformations.

00:10:00 However, before writing the transformation function, one important consideration is how to resolve conflicts when both clients try to insert text into the same position. You need to have a consistent tie-breaker.

00:10:16 If you are using client-server communication, for instance, you could decide that the server always wins. Alternatively, in a client-to-client scenario, you can assign random client IDs and say the largest client ID always wins.

00:10:33 Now that we have some operations that transform and some return values, let's look at what this transformation function would look like. It transforms the top side against the left side to get the bottom, and vice versa, completing our square.

00:10:54 But this merely shifts the question further down the road. The key question is, how do you take the left operation into account with the top operation to create the right outcome? For now, let's focus specifically on inserting text.

00:11:12 This isn't too complex; the function handles most of it. We return a new operation to avoid modifying the original. We must consider if someone inserts text before our operation or at the same position, which may require a change.

00:11:28 If a character is inserted before us, we must shift our position. The change is also based on the number of characters inserted before our position. If we insert one character, we shift by one; if we insert two, the shift is by two.

00:11:46 For instance, if someone types a 'C' before our operation, we need to change our operation’s position by one. While this is a simple example of a transformation function, most follow this pattern: figure out if the previous operation can affect yours.

00:12:05 Transformation functions are quite mathematical; they'll return the same outputs for the same inputs. This consistency makes it easy to unit test. There are more properties these functions must fulfill.

00:12:26 For example, given two documents at the same state, if you apply the first operation and then transform the second, you arrive at the same state as if you apply the second and transform the first operation.

00:12:46 This roundabout way of putting it conveys that no matter which order you engage with these operations, you end up in the same state. Having a mathematical basis simplifies testing for transformation functions.

00:13:04 The two-dimensional square diagrams only work with two clients performing operations at the same time. If you have three clients, you create three-dimensional diagrams; four clients lead to four-dimensional diagrams. Each path must lead to the same endpoint.

00:13:23 When you have a single source of truth, like a server or a database, things become easier. Instead of navigating a complex diagram, you have straightforward two-dimensional diagrams, each for the client-server connection.

00:13:40 From now on, I'm going to operate under the assumption that we have a server. We just discussed transformation functions, which adjust operations so that you can have one operation affect the outcome of another.

00:14:00 Yet another crucial piece is knowing when to transform operations. The control algorithm must understand whether two documents are the same, ultimately determining if two operations occurred at the same time.

00:14:16 This is straightforward when working with a server. We can assign each document state a unique version number and state that two operations are identical if they originated from the same document version.

00:14:35 Once we have the document version, we can assign a version to each operation. This makes it easy to identify when two operations take place simultaneously; simply compare the versions.

00:14:54 What happens if you go a step further before syncing? For instance, what if each client runs two operations before collaborating instead of just one? This situation complicates things, so a square diagram comes in handy.

00:15:07 Just like before, we have arrows at the top and on the left, and we want to fill in the square by completing the arrows on the right and at the bottom. Each side must transform the top against the left operation.

00:15:25 As one operation becomes the left side of the subsequent square, this flow is vital. In a sense, the end of one stage becomes the start of the next. You must carry all related operations forward.

00:15:45 By working row by row, you transform every operation in the top as an inner loop, calling the transformation function recursively. This will result in getting back the transformed operations.

00:16:06 Remember, the recent operation in your inner loop becomes the left operation for the next iteration. Meanwhile, the bottom operation gets pushed into the bottom list. Continue this until each row is processed.

00:16:27 With each completed cycle, the last operation is added to the right list. Each subsequent time through the loop sees the bottom list carry over as the new top list. Continue until you finish processing.

00:16:50 When we've reached the conclusion of these transformations, we can present the completed lists back to the user, as they contain all the updated operations that have been applied.

00:17:05 However, it's vital to recognize that this method does not pertain to any specific application’s needs. Rather, it focuses on when to transform operations and how to approach those transformations.

00:17:20 This raises the central inquiry of defining what the operations actually entail. The answer is simple: they should do anything your application requires. By establishing transformation functions that meet these criteria, you can create diverse operations.

00:17:38 However, this approach does present trade-offs; as the specificity of operations increases, you inevitably incur a higher volume of necessary transformation functions. When I tackled this problem, I wound up with 13 distinct operations.

00:17:59 Though this doesn't seem excessive, it led to the need for over a hundred transformation functions to manage them. While relying on helpers and shortcuts eases this burden, it still demands significant effort.

00:18:16 But, the advantage of many specific operations means strong user intent is preserved during document modifications. This level of granularity allows us to reflect closely on what users aim to achieve.

00:18:36 Now, what kinds of choices facilitate easier collaboration, and which ones might hinder it, making collaborative capabilities harder to implement? The first decision you need to consider is thinking in operations instead of document states.

00:18:56 You have to talk about what actions users perform, not the document state. If you only store complete document states, you may hit obstacles down the road. There are ways to reverse-engineer actions from entire documents, but this often leads to a loss of intent.

00:19:18 Thus, frame your analysis in terms of operations, for example, 'insert T at position 0,' instead of conceptualizing a document change from 'A' to 'Add.' Some applications, like Quill, represent documents as an assembly of operations.

00:19:38 Keep your document states linear to simplify transformations. If possible, design your document like an array of characters or objects, making it straightforward to handle array indexes.

00:19:55 Represent trees linearly when suitable; for instance, using elements in your array that signify entering or exiting a subtree adds needed complexity but aids in transformations.

00:20:13 Keep real-time data transformations as straightforward as possible. For example, strings and numbers are easy to merge, while resolving conflicts in custom objects poses challenges. Maintaining user intent in such scenarios becomes difficult.

00:20:29 At a high level, observe how everything ties together. You have document states, which I'll simplify as arrays of characters. Each document has a version, and both clients and servers have copies of the document in sync.

00:20:45 When you apply an operation to your document, you execute it immediately and send it to the server. The server disseminates it to other clients, ensuring that they see the updates as they happen.

00:21:05 Sometimes the server acknowledges your version without any issues and resets your status. Other times, it acknowledges that you've submitted something newer to it.

00:21:25 If a version conflict arises, the server will present you the operational differences between the two versions. You must transform your operations against these changes because your actions have already occurred.

00:21:42 Then, you transform your operations against the server version, followed by sending your modified operations to synchronize everything. This is how multiple users can collaborate successfully.

00:22:01 However, a truly great collaborative experience requires more than just functionalities to handle multiple users editing simultaneously. For example, consider cursor synchronization.

00:22:19 What exactly is a cursor? When viewing a document as an array, the cursor indicates a specific index. For instance, for the document 'hello,' if my cursor is before 'e,' it’s at position 1.

00:22:36 For other users, the cursor can be represented by numbers, but it’s essential to attach identifiers such as client IDs to differentiate whose cursor is whose.

00:22:54 Thus, a remote cursor consists of a position and a client ID. We now have an array of things, a version, our cursor, and a list of remote cursors, which allows us to render all cursors visually as desired.

00:23:12 When an action occurs, you may insert a character or remove one. This action modifies the document and requires us to consider where and if we should move remote cursors.

00:23:30 For instance, suppose client two’s cursor is positioned between ‘A’ and ‘N.’ However, after the next operation, such as 'Insert H at Position 1,' the document reads 'Chart.' Where should we now place client two’s cursor?

00:23:50 It makes sense to keep their cursor in place, just like before. This notion allows us to conceptualize cursor placements as operations to be transformed alongside other altering operations.

00:24:05 Transforming cursors tends to be relatively simple, mirroring the logic used in text insertions, so it’s straightforward to keep track of cursor movements.

00:24:20 When a client shares their cursor, it’s also vital to know which version of the document it extends from. If the cursor references an unseen document, you cannot render it, as you lack that position.

00:24:41 If the cursor comes from an older version, it might not apply to your document. However, if it references the current version, it would need transforming against any pending operations.

00:24:57 Of course, if the cursor references a future version, you can either hold it for display later or discard it, depending on your server and internal logic.

00:25:16 Regarding undo functionality, we need to evaluate how it works, particularly in collaborative settings. Initially, let's think about a general approach to undoing changes.

00:25:38 By framing changes as operations, like 'insert A at position 3,' you can indicate how to undo them, translating into 'remove A at position 3.' To redo it, you just reinstate it.

00:25:59 Undo actions form a stack where the most recent action is on top. So pursuing an operation, you first carry out the insert, revert its action, and place it on your undo stack.

00:26:16 If you decide to undo, you simply pop the top and execute that operation. After, if you want to support redo, you invert that operation again and push it back onto your redo stack.

00:26:34 This structure allows you to avoid fumbled undo stacks, helping you prevent the chaotic management of operations.

00:26:50 Now let's explore how this unravels when multiple users are active. When an operations input 'insert s at position 4' is invoked, it pushes 'remove s at position 4' onto the undo stack.

00:27:09 Later, when the server sends an operation, such as 'insert H at position 1,' it reflects no simultaneous conflicts, so you apply it directly. This leads to a document state of 'Charts.'

00:27:30 Taking a closer look at the undo stack reveals a potential failure: if you return to undo now, it aims to remove 's' from a non-existing position. This discrepancy raises red flags.

00:27:47 This situation highlights that every time an operation is received, you must transform the entire undo stack accordingly to the new operation's state.

00:28:00 Therefore, you must ensure that key components of functionality, such as undo, retain essential synchronization when receiving new operations. This principle preserves the integrity of your edit stack.

00:28:19 To outline how local undo can function within a collaborative environment, when you perform any operation, you take its inverse and place it on the stack.

00:28:38 On receiving an operation, you first transform your entire undo stack against this incoming operation. If you pop the operation off the undo stack, it is executed.

00:28:57 This maintains synchronization effectively across platforms when other clients see your updates and allows you to manage subsequent requests.

00:29:13 However, this method will not guarantee that undo actions will appear consistent to all users. If all users execute undos and then redos in a different sequence, not every user would see the document revert to its previous state.

00:29:27 This scenario signifies a helpful balance between complexity and usability, a familiar refrain across collaborative editors like Google Docs, demonstrating this edge case.

00:29:45 To summarize, collaboration functionality across many apps can indeed thrive on a comprehensive structure; one that commences with a linear document, operates on shared versions, and tracks multiple operations.

00:30:04 Key components include transformation functions, synchronization management, cursor handling, and meaningful edits resulting in shared document states.

00:30:23 A collaborative editing app thrives on operational transformation, where the necessity for instantaneous updates coexists with user intent and expectations.

00:30:41 What remains vital is that the process must be adaptable enough to accommodate any context, all while balancing the intricacies of various applications.

00:31:01 I encourage everyone involved in collaborative development to explore methodologies enabling effective document sharing, allowing for real-time updates and an optimal structure.

00:31:19 Additionally, I’ve relished working at AHA, delving into countless compelling projects, and I encourage those interested in solving engaging problems for great customers to reach out. We're hiring!

00:31:38 My email address is shown here, so feel free to reach out with your questions. As a last note, if you're inclined to save one slide, it should be this one containing a link to the demo app, the source code, and a collection of other resources on this topic.

00:31:56 I might not have touched upon a lot of interesting content, but if you wish to explore more or discuss them, please find me. I'll gladly engage you about collaborative editing or various technical subjects.