It seems like an increasingly common request these days: Make my application work offline! But many developers are rightfully asking: Are offline collaborative applications truly feasible in this day and age? After all, consistent connectivity can be quite the luxury when it comes to less-than-ideal environments like Amtrak trains and commercial flights. Luckily, Yjs, the open-source real-time collaboration framework, is here to save the day, in conjunction with emerging web technologies that run the gamut from Service Workers (okay, not that old) to IndexedDB, a browser-based local database that is optimized for offline use cases.

Not long ago, yours truly (Preston So, Editor in Chief at Tag1 and author of Decoupled Drupal in Practice) produced a Tag1 Team Talks episodes with my valued colleagues at Tag1 Kevin Jahns (creator of Yjs and Real-Time Collaboration Systems Lead at Tag1), Fabian Franz (Senior Technical Architect and Performance Lead at Tag1), and Michael Meyers (Managing Director at Tag1) regarding Yjs and how it enables offline shared editing use cases in a robust fashion. In this multi-part blog series covering offline shared editing with Yjs, we cover how Yjs and its ecosystem support a variety of offline-first features. In this seventh installment in the series, we discuss other challenges involved in offline shared editing and how enterprise organizations are leveraging offline editing with Yjs today.

Challenges in offline shared editing

Before we begin, I encourage my dear readers to take a look at installments number one, two, three, four, five, and six of this multi-part blog series in order to grasp some of the key fundamentals involved in offline shared editing in Yjs, starting with what constitutes a true offline application. Over the course of this blog series, we also discuss in detail the technologies that are required to build an effective offline application, such as IndexedDB, WebRTC, Service Workers, and Web Workers. In addition, we cover some of the ways in which conflict resolution in offline shared editing is strikingly similar to how we reconcile conflicts in source control systems like Git.

Garbage collection

Kevin contends during our Tag1 Team Talks episode that the trickiest aspect of offline shared editing is garbage collection, a familiar theme from the previous two installments of this blog series. In the previous installment of this series, we spelunked into the depths of how versions operate in Yjs. Nonetheless, one of the most important considerations for editors is the ability to revert content to previous versions and save those iterations as the authoritative, most current revision.

However, it is critical to recall that if we were to store all of the content and all of the modifications that led to the creation of that content, our documents would be oversized to the point of being unstorable. To help illustrate this situation, Kevin discusses the case of the official website for the Yjs ecosystem, Yjs.dev. Kevin admits that if he were to store all of the content that website visitors arbitrarily changed to experiment with Yjs, that document size would balloon to the point of unusability.

One particularly funny situation that Kevin illustrated is the fact that many users who visited Yjs.dev attempted to stretch Yjs to the very limits of its range of capabilities. For example, many visitors to the demoable Yjs site would repeatedly copy and paste entire Wikipedia articles into the Yjs.dev website in order to test the extremes of its performance. As soon as you delete that Wikipedia article, this is just fine from the perspective of Yjs, but after all, because of the need to allow synchronization for all collaborators, we can never truly delete this junk Wikipedia content from the document when other peers are involved.

Garbage collection between versions

Fortunately, in Yjs, there exists a method available that enables garbage collection of the content that was created in between versions in an intermediary fashion. Consider, for instance, the following scenario: You visit Yjs.dev and create content; this content will never be garbage-collected, but if you create a Wikipedia article in the Yjs.dev website and someone else deletes that same Wikipedia article, that content stored in memory is garbage-collected and not restored in the Yjs document across all peers. In addition, Yjs makes use of a fine granularity when it comes to awareness of what content was deleted from the document.

According to Fabian, this is part of what optimizations Yjs implements that makes commutative replicated data types (CRDT) so possible and robust. To offer up another example from the Drupal world, consider the fact that Drupal 8 now makes available forward revisions with the help of content workspaces. This means that an editor can work on an article for some arbitrarily lengthy period of time, and once they are finally ready to go live with that content, they can publish said content.

Garbage collection and CMS revisions

But now suppose a situation in which they have twenty or thirty revisions in between. These revisions may of course have been important to the author during the writing process, but any efficient CMS would conclude that once the new revision of the article is published—that is, traveling between states of draft, stage, and publish in a typical editorial workflow—the previously intervening revisions are no longer necessary. As such, all of those versions, not only in the content preview system and content workspaces, are automatically garbage-collected, because we can generally consider those revisions to be much less important in the grand scheme of things.

One crucial difference that needs to be emphasized between Drupal and other content management and collaboration platforms is the notion that it is very important for the user to be able to publish a specific version—a specific rendition at a certain point in time—of content. For instance, if Franklin wishes to preserve a particular state of content for publication, this allows for garbage collection to feel much more natural, particularly with the correct data retention policy in which older revisions are maintained for no more than thirty or ninety days. Most content management systems do this already, of course.

Balancing revision history and storage space

In another common editorial scenario, an editor may require a previous revision restored after a piece of content has been published, even though it was considered no longer relevant by the normal garbage collection process. As such, one of the most important questions architects need to answer when building offline shared editing is how much space for data they are willing to devote to storing old revisions and at what point they should be deleted. This tradeoff between preserving a full content revision history and retaining disk space is a key issue in today's increasingly collaborative world.

Many organizations, for example those who need to keep entire revision histories for compliance or other reasons, could feasibly take a Yjs document and put it on physical storage such as tape, which would still be the least expensive way to store all of those revisions incurred since the beginning of editorial time. It isn't a huge logical leap to consider a scenario in which we can preserve all of the revision histories of the entire corpus of Wikipedia articles since the free encyclopedia's inception and thus house the entire human history of all interactions on Wikipedia with Yjs.

If, on the other hand, editorial revisions are deliberately and carefully created and curated by the editor working with content, it is much easier to apply a more human approach to deleting revisions that are no longer necessary. Purposeful creation of editorial revisions is something that we can understand as a signal that, because revisions are not simply created upon each minuscule change (especially in the case of Drupal's autosave, a feature now commonplace in the CMS market), we can garbage-collect the unnecessary data from revisions that are no longer mission-critical. This approach can resolve a significant problem in data editing in general, applied well beyond the confines of offline shared editing.

Conclusion

One key question doubtlessly on decision-makers' minds as they traverse this blog series is whether any large organizations are already leveraging Yjs and IndexedDB for offline editing. Due to the recent release of y-indexeddb for version 15 of Yjs, these are still early days. Many developers in the Yjs community have chosen to leverage y-indexeddb in previous versions of Yjs by default, as it made their websites faster without any additional overhead needed, a clear example of the performance enhancements we can derive for free from leveraging these technologies together. Kevin cites two companies using these libraries already, but currently there are no offline-first applications in production that make use of both Yjs and y-indexeddb like the implementation of Yjs.dev, which works precisely the same way both offline and online.

For readers interested in exploring Yjs, you can find information most efficiently on the Yjs GitHub organization, which contains a variety of repositories, including yjs/demos, where you can find working examples employing Yjs, y-indexeddb, and ProseMirror. The yjs/yjs repository also provides documentation, and for those interested in the inner workings of Yjs, we go into detail in earlier Tag1 Team Talks episodes, and there is also an academic paper covering the same topic at the end of the README. In this multi-part blog series, we examined how Yjs and associated technologies can enable offline-first applications with collaborative editing, an exciting paradigm certain to see an unprecedented amount of evolution in the coming years.

Special thanks to Fabian Franz, Kevin Jahns, and Michael Meyers for their feedback during the writing process.