Building offline-first applications with Yjs: Garbage collection and content revisioning - part 6

It’s an all-too-common scenario. You board a flight only to hear the flight crew announce to groans all around that Wi-Fi is unavailable. How will you deliver that document in time that your colleagues were supposed to review later today? Fortunately, with the help of emerging web technologies like Yjs, an open-source real-time collaboration framework, and IndexedDB, a local database that houses offline content, any developer can successfully architect an offline-first architecture that also functions well for peer-to-peer collaboration use cases. With the addition of Service Workers, you can facilitate another layer of caching that ensures not only content persistence but also better performance.

Recently, your correspondent (Preston So, Editor in Chief at Tag1 and author of Decoupled Drupal in Practice) had the opportunity and privilege to host a wide-ranging conversation with Tag1 luminaries Kevin Jahns (creator of Yjs and Real-Time Collaboration Systems Lead at Tag1), Fabian Franz (Senior Technical Architect and Performance Lead at Tag1), and Michael Meyers (Managing Director at Tag1) about Yjs and how it can enable effective offline-first applications. In this multi-part blog series, we talk through all of the ways in which Yjs can support offline capabilities. In this sixth installment, we dive head-first into garbage collection and content revisioning.

Garbage collection

If you have not already, I recommend that all readers of this series first peruse the first, second, third, fourth, and fifth installments of this blog series about Yjs in order to count on a full understanding of how Yjs enables offline-first applications. The previous installments cover in copious detail many of the emerging web technologies that are of paramount importance to any implementation of offline-first functionality, including IndexedDB, WebRTC, and Service Workers. In addition, we cover some of the ways in which these tools can enable peer-to-peer offline collaboration in conjunction with Yjs.

Conflict resolution and garbage collection

Let us return to the scenario we illustrated at the end of the fifth installment of this blog series. Consider the scenario in which we need to perform some kind of garbage collection due to the large amount of garbage resulting from heavy conflict resolution. Consider the following scenario: If Aditya has created a large quantity of content and Bianca has also created a substantial amount of content, there is a potentially large amount of garbage that could serve as a by-product. A key question Kevin asks during our Tag1 Team Talks episode is: How do you create a proper synchronization process that will make perfect sense to the human who has to review such a conflict?

We can also make sense of this question by relying on the context of Git, the popular source control mechanism. In Git terms, if there is a significant and irreconcilable merge conflict, how can we resolve it without the assistance of a human at the reins? This is best illustrated by a situation in which Aditya has made modifications in a document in response to a request to fix an issue, Meanwhile, Bianca has made similar revisions in a separate instance of that document while disconnected that also corrects the problem in question.

Yjs versions

In any case, with unresolvable merge conflicts that require the input of a human, it is generally desirable to have a central server instance that can be monitored by human eyes and whose issues regarding reconciliation can be inspected by humans as well. Fortunately, Yjs has a feature in the form of Yjs versions that permit the highlighting of differences between versions that were implemented while a user was offline.

For example, consider a scenario in which Aditya has now reconnected and needs to synchronize with other clients on a peer-to-peer network. Bianca can see the evolution that transpired while she was disconnected. How? By showing the differences between the changes that just came in and the changes that Aditya applied, Bianca can view from her perspective how the changes relate to one another. If Aditya made a change, and Bianca needs to revert said change, this is very similar to the approach undertaken by a typical execution of a git merge command in Git. Git resolves the differences and makes the modifications that make sense for the new state of the application.

Revisioning

To return to the Drupal ecosystem for a moment, let’s turn our attention to how Drupal handles content revisions. Every time a Drupal user makes a change, you add to an existing revision history, which in Yjs parlance is known as a state vector. Drupal has a record of every single change across every single point in time, in a manner similar to MacOS’s Time Machine, which allows you to restore a backup from any point in time available on the machine. With Time Machine, you can revert your state to every state that your hard drive ever occupied.

Yjs versions

In scenarios like Time Machine, because we have all of the revision history of our data recorded in one place, we can open the door to still more compelling use cases. Consider a scenario in which Cem and Doretta are working on an important document together. Cem and Doretta are teasing out a particularly difficult couple of paragraphs and require Eunsook’s involvement in order to untangle the mess.

If the conflict resolution in question requires the level of granularity of sentence-level modifications rather than larger-scale changes, garbage collection becomes a significant problem, and this is where Yjs versions enter the picture. Cem can see how his changes affect Doretta’s version and how her changes affect his version. With Yjs versions, you can also “rewrite history,” much in the same fashion as a git rebase command execution. Both actions rewrite the history of revisions according to the whims of the developer.

Stashing and applying versions

In addition, Eunjae can successfully revert back to Cem’s version in theory by copying it somewhere else, in a mechanism familiar to frequent users of the git stash command, before deleting the modifications such that it is as if those changes had never existed. If Eunsook goes ahead and reapplies those changes, the intelligent garbage collector in Yjs understands that there is no conflict, because the content deleted was the same as the content added. As we discussed in a previous Tag1 Team Talks episode, this result is made possible by the fact that each character has a unique identifier in Yjs.

Therefore, Cem can tell Doretta and Eunsook: Simply accept this version as the canonical and authoritative version, and let us apply these changes manually together. As you can see, there are many potential opportunities for conflict resolution and garbage collection even without the support that a centralized server provides. Kevin does agree that many Git conflicts have to be resolved by humans, because machines have an intrinsic inability to understand merge conflicts at the level of nuance humans have by default. As such, it is essential that a human-friendly tool be available to resolve conflicts, and Yjs does just that with Yjs versions.

Thanks to Yjs versions, you can even see how your document is evolving live when you have been absent from the collaborative document for a long period of time, and Kevin admits this is one of the coolest features he has ever seen. After all, giving a human the tools to resolve conflicts is a much more empowering endeavor thanks to requiring a Git-like reconciliation process for scenarios in which significant conflicts arise. With the support of Yjs versions, you can rest assured that all differences incurred due to changes applied to the document during collaboration are not only recorded but also present on every collaborator’s local environment.

Conclusion

In recent years, thanks to large enterprises like Twitter, the prospect of offline-first applications has become more pressing and urgent. Many organizations are unaccustomed to offline-first application architectures and are rightfully reluctant to jump aboard a bandwagon that has seen significant innovation over the last few years. Fortunately, with the assistance of Yjs, an open-source real-time collaboration framework for more than just text, and IndexedDB, a database technology for local storage at the browser level, offline-first applications are not only around the corner but also made especially easy.

In this sixth installment of a multi-part blog series about how Yjs enables offline-first applications and offline collaborative editing, we covered garbage collection comprehensively and how Yjs versions is best-positioned to enable performant peer-to-peer collaboration in offline settings. In addition, we discussed conflict resolution and some of the manners in which Yjs versions enable a more human-friendly experience for reconciling conflicts. In the seventh installment of this blog series, we dive into the other challenges surrounding offline shared editing and other prospective applications for y-indexeddb.

Special thanks to Fabian Franz, Kevin Jahns, and Michael Meyers for their feedback during the writing process.

For more Yjs content, see Yjs - Add real-time collaboration to any application.

Photo by bantersnaps on Unsplash

Building offline-first applications with Yjs: Garbage collection and content revisioning - part 6

Preston So

Editor in Chief

Garbage collection

Conflict resolution and garbage collection

Yjs versions

Revisioning

Yjs versions

Stashing and applying versions

Conclusion

More Migration Resources

Performance testing with Gander

Popular content

Popular blogs