Table of Contents
Peer-to-peer collaborative editing is one of the most fascinating frontiers of editorial collaboration in our industry. As a longstanding requirement with formidable technical challenges, enabling collaborative editing in a decentralized fashion has been a dream for many years. However, with the advent of emerging technologies in the real-time collaboration space, most notably Yjs and WebRTC, the possibilities for peer-to-peer editing are not only realistic but compelling for a wide range of ecosystems. With the help of Yjs and WebRTC, a communication protocol that now enjoys wide browser support, you too can implement real-time collaboration without the need for an expensive central server or untested custom solutions.
Recently, I (Preston So, Editor in Chief at Tag1 and author of Decoupled Drupal in Practice) had the opportunity to sit down with my dear friends and colleagues Kevin Jahns (Real-Time Collaboration Systems Lead at Tag1 and founder and project lead of Yjs), Fabian Franz (Senior Technical Architect and Performance Lead at Tag1), and Michael Meyers (Managing Director at Tag1) for a deep dive into peer-to-peer collaborative editing enabled by Yjs and WebRTC. In this multi-part blog series, we dig into some of the implications of Yjs and WebRTC on real-time collaboration and why you should consider these two technologies in your next foray into peer-to-peer collaborative editing.
The term decentralized application has been defined and redefined over the past few years (as an aside, I've spoken extensively about the realm of decentralized applications when it comes to their place in the decentralized web and in blockchain technology), and there are many opinions about what truly characterizes a decentralized application. Increasingly, according to Kevin Jahns, the notion of a true decentralized application centers around the ability for applications to communicate directly with one another in order to save server load or to be more robust.
WebRTC is one of the technologies gaining considerable steam due to its promising potential in the real-time collaboration space. Now with ample browser support, WebRTC is a communication protocol much like TCP that enables developers to create channels of communication between browsers on different computers. One of the most exciting traits of WebRTC is that it can penetrate network layers, including proxies, to allow messages to arrive at their destination.
One of the most challenging problems in online communication has always been the ability to connect directly with a friend on a different network to share media or collaborate on a single shared document. The issue that we frequently face is that there are many layers separating us from those we wish to communicate with directly — whether it's a router, an internet service provider (ISP), or other obstacles that get in the way. In short, WebRTC ameliorates these problems by providing a protocol that "punches holes" through the network and establishes communication that can transcend those layers of separation.
In Kevin's view, WebRTC is presently the most straightforward way to create a decentralized application on the web. And in his definition, anything that uses WebRTC to share data between parties can be classified as a decentralized application. In fact, many frameworks already employ WebRTC as an underlying technology to enable the sharing, synchronization, and storage of data, most notably the Interplanetary File System (IPFS) and the Dat protocol, which both use WebRTC to a certain degree.
The Interplanetary File System (IPFS) is one of the most prominent decentralized web technologies currently in existence in which data can be stored across many different computers across the globe. You can think of it as a decentralized YouTube. Rather than streaming video directly from YouTube's or Amazon's servers, where all data is housed under the auspices of corporate interests, you can download video from any other individual's computer that also has the data available.
Though WebRTC has only recently been seeing greater attention in the web community, in fact, distributed peer-to-peer communication thanks to WebRTC is already a reality in the form of conference calls. After all, no single company is interested in receiving a prohibitively expensive bill at the end of every month due to all of their web conferencing traffic being routed to a single server. And with WebRTC, data can be shared, though a mechanism for conflict resolution is absent. Because data conflicts across a distributed network will inevitably occur, we need an algorithm or protocol to reconcile conflicts and to indicate to users on the network what is the most authoritative version of shared data. That's where Yjs comes in.
One of the most exciting elements of WebRTC adoption is that it can be used for much more than simple file sharing across a distributed network. In fact, we can also make collaborative text editing in the browser and other collaboration use cases (such as collaborative drawing or drafting) possible. Yjs is an open-source real-time collaboration framework that can make such use cases a reality by operating over any network, irrespective of whether you are employing a traditional communication layer or WebRTC. There is also a WebRTC provider officially supported for Yjs.
WebRTC enables peer-to-peer collaboration in real time by enabling collaboration between multiple discrete clients. As a client enabled for WebRTC, your external internet protocol (IP) address and additional information are submitted as candidates for how you can make yourself reachable by other peers on the distributed network. That peer receives the aforementioned information and determines whether the communication method is amenable to both peers. The same peer then discovers its own IP address and sends it as a session description to the first party. If the other peer decides that this means of communication is appropriate, they can establish a connection to one another.
In order to establish a connection between two parties and exchange session data, you need an additional networking layer to communicate that session description, namely the external IP address or additional information unique to the peer. This networking layer usually takes the form of a signaling server. Though signaling servers are not necessarily a prerequisite for WebRTC-driven communication, as you can use any method of your choice to exchange this signaling data, they are the most common.
For example, peers can use e-mail to send signaling information, and Kevin also cites the case of a colleague in Aachen who successfully utilized QR codes to transmit signaling data from one mobile device to another. While such experimental approaches using e-mail and QR codes adhere to the definition of peer-to-peer communication and are physically more approachable, most implementations currently use a central signaling server in order to reach the other client, establish a temporary connection, exchange session descriptions, and finally enable a direct connection between the peers in question.
Of course, one of the most important considerations for peer-to-peer collaboration is security and the protection of data, particularly during the transmission of said information. Fortunately, by standard, WebRTC is encrypted, and every WebRTC connection you establish is encrypted by default. WebRTC employs a biometric encryption protocol that is widely used and fully integrated in the WebRTC protocol.
In many scenarios, juggling public keys and private keys is required to provide encryption when data is transmitted across networks. However, thanks to WebRTC's wide browser support, there is no need to interact as a user with public and private keys or to worry about the poor user experience that public key cryptography typically requires. Instead, all handling of public and private keys is conducted by the browser and is added automatically to the session description data. In addition, a new certificate is created for every connection that is established under WebRTC.
One outstanding question when it comes to WebRTC encryption is whether trust in the signaling server is required for communication over WebRTC to be truly secure out of the box. Kevin, for his part, has prototyped a signaling server that doesn't require trust, made possible with isometric encryption. In isometric encryption, if both peers have a shared secret such as a certificate, they can both communicate through the signaling server without having trust in that signaling server. Though this is possible and a fascinating possible direction to take, very few are currently exploring the question of trust in signaling servers for WebRTC communication.
As real-time collaborative editing continues to gain steam in the wider web development community, our customers and stakeholders will doubtlessly be interested in how to ensure secure peer-to-peer communication. Though collaborative editing has been a seemingly intractable challenge for many years in web applications, thanks to emerging decentralized approaches and frameworks like Yjs and particularly widening browser support for WebRTC, peer-to-peer editing is fast becoming an achievable reality. WebRTC, after all, is encrypted by default, and with the help of frameworks enabling conflict resolution like Yjs, a wide variety of collaborative editing use cases are now possible.
In this blog post, we inspected some of the key concepts we need to understand how decentralized peer-to-peer collaboration can function in this fast-moving landscape. We explored WebRTC and Yjs as well as some of the security implications inherent in peer-to-peer communication and the motivations for employing a centralized signaling server as opposed to more overt means of communication. In the following installment of this multi-part blog series, we will dive deeper into signaling servers and begin to explore how Yjs enables WebRTC-driven communication.