Part 1 | Part 2 | Part 3 | Part 4 |

Table of Contents


In the second installment of this multi-part blog series that dives deeply into Yjs, the real-time collaboration framework, we inspected how Yjs works, its algorithm, and how Yjs achieves impressive performance outcomes despite its peer-to-peer nature. As part of its recent evaluation of shared editing frameworks, Tag1 Consulting chose Yjs for use with ProseMirror on an implementation of real-time collaborative editing for a major Fortune 50 company.

A little while ago, your correspondent (Preston So, Editor in Chief at Tag1 and author of Decoupled Drupal in Practice) had the privilege to collaborate on a Tag1 Team Talks webinar and podcast with Kevin Jahns (Real-Time Collaboration Systems Lead at Tag1 and creator of Yjs), Fabian Franz (Senior Technical Architect and Performance Lead at Tag1), and Michael Meyers (Managing Editor at Tag1) about Yjs and the specific features that make it shine in the realm of real-time collaboration: namely awareness, offline editing, and versioning. In the next two installments of this blog series, we cover all three of these essential topics.

Recap: OT vs. CRDT

Just before you continue reading, we highly recommend that you take a look at the first and second installments of this Yjs blog series in order to familiarize yourself with some of the most important concepts in the Yjs ecosystem.

However, before I move on to discussing how awareness features are implemented in Yjs, let’s recap the difference between commutative replicated data types (CRDT), the approach employed by Yjs, and operational transformation (OT), a common approach for real-time collaboration. Fortunately, Fabian has an apt metaphor to describe the primary difference between these two complex concepts: the addresses and streets of New York City.

In New York City, we often speak of city blocks made up of streets and avenues. For instance, if someone asks where Grand Central is located, and the questioner is on 38th Street, we may tell them either “four blocks away” (a relative location) or “42nd Street” (an absolute location). Suppose, for instance, that it is possible, SimCity-style, to add arbitrary blocks to the Manhattan street grid, and we choose to interpolate another street ahead of 42nd Street before Grand Central.

In this hypothetical scenario, adding an arbitrary block now means that we must say “five blocks away” instead of four to our uninitiated visitor. This sort of relative positioning is what we find in operational transformation (OT). Meanwhile, even though we have added another block, the address “42nd Street” cannot change until we have renamed the street itself. In this case, this absolute address is how CRDT functions. In other words, as Fabian states, CRDT has addresses, while OT has positions..

The Yjs ecosystem

As the recap above shows, collaborative editing, at its core, is simply about representing characters and operations inflicted on those characters. In the previous installment of this series, we learned about how Yjs conducts conflict resolution and how characters are represented, together with examples of how structs are defined.

Note that the Yjs ecosystem is broader than the Yjs framework on its own. Yjs itself is primarily responsible for concurrency, conflict resolution, and defining data types. Other features key to the implementation of collaborative applications, meanwhile, such as communication with other servers, storing data updates in a database, and data bindings to other editors for actual collaboration in user interfaces, are all separate modules. These distinct modules can be readily found on the Yjs GitHub organization.

Awareness

Fundamental to any application handling real-time collaboration is the notion of awareness. Awareness is a topic common in other fields beyond computer science, and it is a well-worn research topic as well; in fact, there are scholarly journals whose sole focus is awareness in the context of computer science.

Awareness information is highly contextual. For instance, a high level of awareness can be garnered in spaces that are conducive to collaboration through contextual cues. In video conferencing solutions, for example, we can learn much about the other attendees’ emotional states and current actions based on the visual cues we see. Meetings that occur in the same physical space are also conducive to more efficient collaboration. Such meeting rooms are perhaps the most optimal place to collaborate with others, but it does distract us from completing actual work due to the need to participate actively in ongoing discussions (and this may be a big part of the reason why developers are so averse to meetings!).

Another key example of awareness is collaboration through Git and other source control mechanisms. While committing code to Git, we tend to be focused on accomplishing our task at hand rather than inspecting the work that others have completed. And in the end, the only way to be fully aware of what others are doing on a codebase is to pull other developers’ branches and see what they’ve done in the past.

In short, awareness is important in real-time collaboration, because we need to find just the right amount of awareness for the task at hand. In collaborative editing, we frequently have access to a manifest of users in the document (with information such as who is currently online or editing and whether they are actively working). Thanks to visual cues such as Google Docs’ cursors, we can understand what others are doing and if any work we are pushing forward conflicts with theirs.

In real-time collaboration, conflict resolution can be uniquely challenging. Awareness is important not only from the standpoint of business value but also from the perspective of working relationships. Whether it’s twelve or 200 people, awareness gives us important information about what others are doing and how we can avoid human conflict as well. It can help us recognize when it’s the right time to step away and say, “Since you’re working on this paragraph, I’ll take a break and get some coffee until you’re finished.”

Awareness in Yjs

In Yjs, awareness is governed by a new protocol that is built atop the framework. We return to the particulars of this protocol shortly, but first we pivot to some of the most important aspects of awareness in Yjs and considerations for performance.

Implementing awareness features with shared types using CRDT can lead to certain performance issues. For instance, consider a scenario where we need to register a user’s cursor location by representing it as a struct element in the data structure we defined in the previous installment of this blog series. This would inflate the size of the document significantly. Because we as users have a tendency to jump around documents quite frequently with our cursors, it would be a mistake to include this information in the CRDT model.

Shared cursors

In Yjs, rather than approaching shared cursors in a manner that potentially introduces considerable bloat, we exchange the identifier of the struct we are pointing to given the cursor location. For instance, consider the following document:

	ABC

As we learned in the previous blog post in this series, we know that each character in the document above is uniquely addressed in Yjs (in other words, we have a “42nd Street” rather than “four blocks away” reference point). When we create a cursor, we compute to the location to which we are currently pointing. If your cursor lies to the left of character A, Yjs broadcasts the following message: “My cursor location is just before character A.”

In a peer-to-peer context, this is how other clients are aware of where to insert the cursor in question and how to play nicely with other people working in the same document. As a result, inserting content before or after the cursor doesn’t change the location of the cursor, because it remains associated with character A, the character to which it is directly associated.

This is entirely different from other solutions, such as OT, in which everything is position-based. If my cursor is currently located at position 0 (in this document, before “A”), and someone inserts something before that position, the cursor location no longer points to character “A” — it points to the character inserted at the start of the document. Fortunately, this issue is a gracefully solved problem in Yjs.

The Yjs awareness protocol

The awareness protocol in Yjs, found in y-protocols, resolves many more problems than just this illustration of conflicts around cursor locations. y-protocols is responsible for implementing awareness represented through messages that are broadcast and that have unique identifiers.

For example, if a user broadcasts the message “I’m online,” this message is assigned an identifier of 1. A second message, such as “I’ve changed my cursor location to this new location” would receive an incremented identifier of 2. This differs noticeably from a CRDT solution in that this awareness feature has nothing to do with concurrency; instead, it solely issues messages broadcast by the user’s local environment.

Among the distinguishing characteristics of y-protocols is also the fact that it is safe for concurrency and commutativity. Messages are commutative in that we can apply these messages in any order, an important trait for peer-to-peer networks and distributed systems. These same messages comprise how document updates are stored offline. Becauses we send messages to other peers from our node on the network, we can also store all document updates in a single canonical database.

How Yjs encodes delete operations

In the previous installment of this Yjs series, we also covered how structs are represented with a structure similar to the following JSON:

{
      content: "ABC",
      id: {
        user: 1,
        clock: 0
      }
    }

This particular struct consists of three operations that collectively add the characters “ABC” to the document. As we mentioned previously, this also demonstrates one of the unique characteristics of Yjs in that we can collect multiple operations in a single struct.

For deletions, however, we use a different encoding. We can summarize each delete operation as a deletion from a particular client and the number of operations we wish to delete. Hence, if we have two operations, we can use the following notation to encode deletions:

[1, 0, 2]

In this array, the first member represents the user identifier (the client performing the deletion), the second member the clock (which marks the location of the operation in the sequence, as we saw previously), and the third member the number of operations we wish to delete. This merger of structs is highly convenient, because we can now define delete operations much more efficiently. In short, the struct will be split into two, and the first two characters (“AB”) are now deleted.

All structs in Yjs are encoded efficiently in a binary block, and delete operations are no different in terms of their efficiency, because we can merge delete operation together as well. For instance, consider the subsequent delete operation we undertake:

[1, 2, 1]

This operation deletes one operation from user 1 with a clock of 2 (meaning it comes after the first deletion we described in this section). The result of this would be the deletion of the third character (“C”) from the document. However, both of these operations described in this section can be merged into a single operation:

[1, 0, 3]

If we have considerable information to encode, this merge action can come in very handy. Moreover, this approach means that we can encode deletions and store them in a database without significant bloat. In short, we can encode the entire document as a single binary block consisting of all of these structs and delete operations, which can all be sent over the wire to clients.

Conclusion

Awareness can be a deeply complex concept, but it has ramifications across all personas that need to collaborate in real time on a regular basis. In Yjs, we can represent awareness information in a uniquely efficient way thanks to an addressing system that ensures your cursor remains in the same place no matter what occurs in its surroundings. In addition, owing to the efficiency with which Yjs can represent deletions, we can avoid document bloat that would otherwise ratchet up the amount of data we send over the wire.

In the next installment in this blog series, we conclude our analysis of key features in Yjs that help collaborative applications shine beyond awareness. In particular, we turn our attention to offline editing, tracking changes, and revision histories. In the meantime, don’t forget to subscribe to Tag1 Consulting’s YouTube channel for more Tag1 Team Talks episodes and further information about fascinating technologies like Yjs, including the episode with Yjs creator Kevin Jahns that inspired the content in this blog post.

Special thanks to Fabian Franz, Kevin Jahns, and Michael Meyers for their feedback during the writing process.

For more Yjs content, see Yjs - Add real-time collaboration to any application.

Part 1 | Part 2 | Part 3 | Part 4 |