Part 1 | Part 2 | Part 3 | Part 4 | Part 5 |

Table of contents


When it comes to the most exciting new developments in the field of web applications, perhaps no other phenomenon is gaining as much momentum as the prospect of true peer-to-peer collaborative editing. With its considerable obstacles and difficulties, collaborative editing has long been one of the less-explored realms of editorial workflows. Fortunately, thanks to new solutions like Yjs and better browser support for the WebRTC protocol, real-time collaboration is not only possible but also accessible to a growing range of developers. Armed with the combination of WebRTC and Yjs, you can construct a robust peer-to-peer editing implementation without excessive overhead.

Not long ago, your correspondent (Preston So, Editor in Chief at Tag1 and author of Decoupled Drupal in Practice) was privileged to host yet another Tag1 Team Talks webinar about the future of peer-to-peer collaborative editing with Yjs and WebRTC, together with my friends and colleagues Kevin Jahns (Real-Time Collaboration Systems Lead at Tag1 and founder and project lead of Yjs), Fabian Franz (Senior Technical Architect and Performance Lead at Tag1), and Michael Meyers (Managing Director at Tag1). In this multi-part blog series, we inspect Yjs and WebRTC and why they make an intriguing combination for decentralized collaboration. In this second installment, we take a closer look at signaling servers and begin to discuss the y-webrtc project, which integrates Yjs and WebRTC.

Signaling servers

As we discussed in the previous installment of this blog series, signaling servers are the mechanisms by which peers establish direct connections with one another. Signaling servers are responsible for negotiating between peers and handling key information such as IP address and other information unique to each client. As a communication layer, WebRTC allows signaling data to be exchanged through a signaling server implementation.

Encryption in signaling servers

Signaling servers can be reimplemented in any language, as they are akin to a publish–subscribe (Pub/Sub) server. When one client sends a message to a room and everyone present in that room and subscribed to that room receives that data, the actual data is encrypted with a key that only the peers know so that those subscribed but not entitled to view the data are excluded from information access. In this way, we as peers can encrypt data without trusting a signaling server.

Kevin shared on our Tag1 Team Talks webinar that he has several servers that can be utilized by users interested in signaling servers for free. But are signaling servers required in the first place? If you wish to develop your own decentralized application, it is not necessarily a requirement to set up a signaling server of your own. Instead, you can employ a default signaling server, and this approach is easily scalable due to the scalability intrinsic to the Pub/Sub model. However, Kevin adds that this is usually a bad idea due to potential security risks.

How secret messages are exchanged in WebRTC

During our time together, Fabian Franz explained how exactly two peers can exchange a secret message over this decentralized collaboration approach. Consider a scenario in which Fabian and Michael wish to exchange a secret message with one another. They can define a code word that enables the secure transmission of a message in a process known as symmetrical encryption. Now, with the code word, they both can access the information contained in the encrypted message.

Another important component of encryption is public key cryptography, which is the mechanism by which HTTPS (Hypertext Transfer Protocol Secure) operates. In public key cryptography, one peer has one part of a secret, and another peer has the other portion. Imagine a scenario in which Michael sends to Fabian a box to which only he has the access key, but the box is received open. Fabian can then place his secret message inside and submit the box back to Michael, but no one else has the key to the message, including Fabian. Once Fabian closes the box, no one can open the message besides Michael.

Exchanging information with signaling servers

The key issue now is how we can exchange large amounts of information between peers using the approaches outlined above. When a signaling server receives an encrypted message — that is, the box that Fabian has just closed and that only Michael can open — the information contained inside is safe, because no others can access the data in question. Fabian could also send along his public key with the encrypted message, but this would enable a man-in-the-middle attack in which information is accessed by untrusted parties during transmission between two peers.

If we consider the implications of this process from a Drupal or WordPress context, particularly from the standpoint of editorial sessions, we can see how peer-to-peer collaborative editing can be enabled. Consider, for instance, that in Drupal, editorial sessions outline who has access to which information within Drupal based on their configured roles and permissions. All of these editorial sessions in Drupal are trusted and are HTTPS-enabled. Hypothetically, we can generate arbitrary token (such as 12345) that is shared by all editorial sessions. Drupal is aware of this token, but the signaling server has no knowledge of it.

Through this approach, we can use the signaling server to transmit information without risking a situation in which the signaling server becomes aware of our secret. As soon as two Drupal editors are connected, they can exchange data using the secret token received from Drupal and use any secure server on the internet to transmit the information. According to Fabian, setting up one's own signaling server and not having to trust that server is an increasingly common problem in this form of peer-to-peer communication.

Do we need to trust signaling servers?

But because our encryption is end-to-end in this case, and because we are directly encrypting messages via a secret key we know and trust — the one generated by Drupal in this hypothetical scenario — we no longer need to have trust in a signaling server. This means that a signaling server can be created for us by anyone, including untrusted peers, because the signaling server in question will never be able to access the encrypted messages Drupal editors are exchanging.

The potential advantages of this approach cannot be overstated, as it enables a great degree of flexibility for peer-to-peer collaborative editing. Allowing the use of untrusted signaling servers can facilitate greater adoption of Yjs and other such frameworks, as there is no longer a requirement for a Node.js server or PHP polling server to be instantiated for an initial connection to be established between two peers. Instead, those two peers can depend on any signaling server available to them without any reduction in security.

What is y-webrtc?

The implications of untrusted signaling servers bring us to y-webrtc, which is the officially supported integration between Yjs and WebRTC. With y-webrtc, according to Kevin, you can access many disparate signaling servers at the same time. If two peers are connected to the same signaling server and accessing the same collaboration room, they can now find each other and establish a connection between them. We can use this mechanism to scale our signaling server by setting up many other signaling servers, each of which is tied to a particular room. If you wish to connect to a specific room, you can use one of several signaling servers available to you.

All of these signaling servers do not need to communicate with one another, since communication is focused on enabling synchronization across clients. The client itself decides which signaling server to utilize, thus allowing providers to avoid a scenario in which they need to set up a complex server environment such as Redis or a Pub/Sub node, which would inevitably obligate scaling to many instances in the case of a large application. Instead, you can now connect to many different signaling servers, and hopefully, one of them is reachable by every single client on the network.

Conclusion

Thanks to advancements in peer-to-peer encryption, decentralized collaboration is quickly becoming a clear possibility for more than the wealthiest organizations. For example, we can make such collaboration possible on Drupal thanks to the fact that untrusted signaling servers lack access to encrypted messages and due to projects like y-webrtc, which enables integration between Yjs and WebRTC. Furthermore, WebRTC now has wide support across modern web browsers, and Yjs is seeing growing adoption from CMS ecosystems like Drupal and WordPress interested in empowering editors with capabilities for collaboration.

In this blog post, we discussed why signaling servers do not need trust in order to allow the transmission of encrypted messages between peers. We also examined a hypothetical scenario in which Drupal would be able to support such exchanges of messages between editors on disparate networks. Finally, we took a brief look at y-webrtc and how it supports rich integrations between WebRTC and Yjs. In the next installment of this blog series, we continue our journey through key concepts in decentralized collaboration and how Yjs and WebRTC can empower editors all over the world to collaborate as if they were in the same room.

Special thanks to Fabian Franz, Kevin Jahns, and Michael Meyers for their feedback during the writing process.

For more Yjs content, see Yjs - Add real-time collaboration to any application.

Part 1 | Part 2 | Part 3 | Part 4 | Part 5 |


Photo by Markus Spiske on Unsplash