This is a transcript. For the full video, see Yjs Offline Apps with IndexedDB - Tag1 TeamTalk #009.
Preston So: [00:00:00] Hello and welcome to yet another episode of Tag1 Team Talks. I'm your host Preston So, editor in chief here at Tag1 Consulting and the author of Decoupled Drupal in Practice. Today we're going to be talking about Yjs offline apps with IndexedDB during this webinar and podcast series about emerging web technologies.
Of course, one of the things that we always like to mention on the show at the very beginning is you can check out our past talks at tag1.com/tagteamtalks. And if you liked this webinar session today, please, please remember to upvote subscribe and share it with your family and friends. Today we've got an amazing cast of characters on our webinar.
First of all, I'd like to introduce our special guest, Kevin Jahns, the creator of Yjs. He's based out of Berlin, Germany, currently. He's the founder and project lead of Yjs and the realtime collaboration systems lead at Tag1. We've also got with us, our dear friend Fabian Franz. Senior technical architect and performance lead at Tag1.
Fabian is of course, one of the five Drupal 7 core branch maintainers. He's also one of the top 50 contributors to Drupal 8 and maintainer for several Drupal 8 core subsystems, including Big Pipe, Dynamic page cache, and Theme API. And finally we're joined by our good friend, Michael Meyers, managing director at Tag1.
Uh, Michael, do you want us to kick us off with a little bit of a description about Tag1 and why we're all interested in this topic?
Michael Meyers: [00:01:23] Awesome. Thanks guys. Really appreciate you being here with us today. Uh, so Tag1 is a web development consulting company specialized in infrastructure, software development.
Uh, we've been doing a lot of projects that involve a real time collaboration and we're excited to be doing more work with Kevin on these projects. I think we're going to see real time collaboration become a key part of web applications or all applications for that matter. Moving forward , enabling people in a workforce to work together in real time is how business gets done.
Uh, so for example , with content management systems like Drupal and WordPress, which we do a lot of work with, uh, enabling content editors to edit rich text, you know, to create documents together in real time, to work on layouts and page setups in real time together. So this should be a component of every modern CMS system.
And so that's the stuff we're gonna be talking about today, is how we're making that happen.
Preston So: [00:02:24] I absolutely agree. I think that being able to edit offline is a huge feature that a lot of CMSs need. And here to join us today to talk about this, a very important topic is Kevin Jahns. Hey, Kevin. Uh, so it's such a pleasure to see you again on the program.
Um, can you tell us right away, what is an offline app and why is this so important?
Kevin Jahns: [00:02:42] Alright. Thanks for that. Hi. Um, yeah, so ever since the web was created, we have misused the web to create web applications. Right? We use web applications because they are always available on all the devices and, they sync to the cloud, which is really comfortable.
Um, but huge disadvantage of the web was always that it is not available offline. so when you are in a train on a plane, you just have a bad connection at a bad place. You can't access your favorite website or your favorite documents. Just imagine like you are flying and you really need to see what's in this, uh, Google docs document.
You can't really see it because it's not available offline. so that's the huge advantage of offline apps . And there are some apps now emerging from the web that are also work offline. And there's the technology service workers that make this possible. And these apps get more and more complicated.
So there, for example, some block, uh, lock websites that fetch the data in the background and just store them for you to see when you don't have internet connection. So you can read your article. Uh, you can read the next article even without an internet connection, which is really comfortable. And, there are also more complicated apps like Twitter.
that were offline fetch data in the background and you can tweet away and these, um, these tweets that you create while offline, they just later sync as soon as you have an internet connection again. So, for me, this is really interesting time right now.
Preston So: [00:04:25] So one of the things I think a lot of, um, Tag1 team talk's viewers have seen is that, you know, there are a lot of, there's a proliferation of online applications on the web really emerging, but we're also hearing. You know, this new marketing technique, I hear a lot of people, not only in my spheres, but you know, I've done this myself as well. Talk about a lot of the frameworks that are out there or apps that are out there as, as offline first, but what does offline first really mean? Is that different from offline enabled?
Kevin Jahns: [00:04:54] Yeah, I think so. Um, I think you can categorize them as following, like offline enabled or like offline as a progressive enhancement. Just means that some parts of the application still work while offline. And a good example for this is, um, a lot of websites that just show a different website when you're offline.
For example, and sorry, we are offline right now. We can't really do anything for you, but, um, some of them show limited functionality. And, uh, some handle the case of offline really graciously. The, so for example, Twitter, on the website, when you open the Twitter website and you lose internet connection and you post a tweet, this tweet will be stored in a local database.
When you don't have the internet connection and later when you are online, it is sent to the server. But the whole Twitter app, of course, with all these tweets is not available offline. Um, but some of the tweets are fetched and you can even, see a lot of the tweets while offline. You don't need an internet connection for that.
So there's limited functionality offline first in comparison. Is your whole application, works offline as well as it does online. Um, that's how I would categorize it. A good example for that is maybe Google docs with a native plugin, uh, on Chrome as far example, you can use Google docs while offline completely.
There is not a problem. It's really cool. Even the changes that you do are later synced to their platform. Eh, blocks are often, they're so simple because it's just a website that you show. Um, so, , there's basically a background job that checks if you have internet connection, it will serve the latest content to you.
And if you don't have internet connection, it will save the content that is stored in the local database to you. Or even, yeah. So, um, yeah, this is how it categorize it. And here's a fun thing. There are also offline only applications. Um, just search with, uh, their applications that only work while offline, um, to keep you from all the distractions of the internet.
Preston So: [00:07:15] Well, I for one, can certainly use an offline only application when I have trouble focusing. Um, okay. So I think that this really helps us, you know, this, this spectrum between offline enabled offline first and offline only. Um, very, very interesting. I think that this emergence of all of this spectrum is going to be a really important to the next few years.
Um, but let's talk a little bit about how this actually works from the technical standpoint. when we say offline and we're talking about a web application, how does offline actually work on the web? And let's talk a little bit about some of the technologies involved there.
Kevin Jahns: [00:07:51] Great. , so the technology that makes it possible is, um, are Service Workers and IndexedDB. Um, IndexedDB is a local database on the browser that you can use to store anything, for example, websites, the content of a website and a service worker is a really interesting technology. Basically sits in the background of your website and, um, can intercept all the requests or the network requests that you do from the website.
So this is a browser technology and it has started one process for each website. And, um, it can intercept all the network requests and it can do anything with it. So for example, you can check if you have internet access and then grab the latest state of your website, just fetch all the content with all the dependencies, the CSS stored in the database, and then serve the content that you have.
Um, to the website. So your website will still work. Um, if it doesn't support service workers. Um, but if service workers are supported, um, it will intercept the requests and, um, so a few different content or even the same content, but also offline. Now that advantage is if you configure this service worker correctly, you can make your wep app much faster.
Um, they start sample this first. Um, cache first technology. Like the idea here is instead of always using the network, requests, you always serve what you have in your local database first and you update your local database. And this is, for example, something I use on Yjs update. Um, it's a cache first website.
So in the background it always fetches, the latest state. But as soon as the website really loads very fast because, um, it was served the content that it has on offline, in a database. So, that's basically it. Um, this is the technology that we, um, that we have, but of course, they have so many problems that you need to figure out.
Um, for example, the UI patterns and how that works. The syncing of the background data. Um, yeah,
Fabian Franz: [00:10:10] Yeah. I quickly want go into that topic again, in caching. Um. We validate why it is stale. That's how it's called. And it's technical term. That means, um, whenever someone comes to you and once data is sent on whatever you have and then your checklists, um, in a CDN like fastly, like varnish, like, um.
CloudFlare, and then while is the, um, you're doing this? Um, and the user has already happily browsing the website with all the things, et cetera. You check, Hey, I saw that. Might there be a new one? It's not this. And then server says, Hey, yes, there is. And then your stored in the cache and the next time the user comes, you, they get automatically the fresh version and that.
So of course, um, usually there's, um. Um, from a caching perspective. So that's like a timeout in that, like, you don't want to store content more than 24 hours or something that, because they took very few confusions in that. Um, but that's generally the idea of revalidate by state, um, that if you still have the version and as promised you have many visitors.
It doesn't really matter if someone gets the new cached version 10 seconds faster than someone else. Uh, everyone will still have a fast experience. While the, um, the server in the background is heavily reproducing, recreating the website, and it's then giving out to new information. And this is basically the idea of, of putting, um, so it's for those really obvious wanted CDN terms, et cetera.
You can also sync off this, tell this service worker Odin Postmus CDN is in the browser offs. A user. It actually cloud first approach, service worker of the cloud. So it's kind of funny that kind of took this approach and put it back in the CDN. And so it's even more connected now. Um, but yeah, that's how you can think about it.
You have your own CDN basically as a layer within the user's browser, and you can do whatever you want.
Michael Meyers: [00:12:21] You said indexedDB. So this is some kind of database technology. I assume it's not like Oracle embedded in your browser is this, you know, is this a SQL database? Uh, how do you query it? You tell us a bit more about that.
Kevin Jahns: [00:12:35] Uh, IndexedDB is an object storage. It is basically, um, an evolved version of levelDB that, uh, Google developed. Um, just for. , not really a database that you start on your own, but, and database that is embedded in your application. And in this case, it is embedded into the browser.
And each website has access to, uh, one indexed DB database, and it can query. Um, it has limited query functionality. It is not as powerful as SQL, but it is very fast and you can implement basically anything with it. And yeah, and in the service worker, uses indexedDB, uh, in the background to store the website.
So it has a context to store websites or resources, basically anything. And, um, it can access this database to. Um, retrieve, um, other content and serve it to the, to the client.
Michael Meyers: [00:13:38] When we did our Web RTC webinar, uh, you know, tag team talk last week, you talked about end to end encryption and how with Web RTC it's encrypted by default.
Um, it sounds like, you know, with indexedDB, I can't access databases for other sites. Uh, how is that information stored online? Local computer though. Is it encrypted by default? Can it be encrypted? What kind of security mechanisms are in place for that?
Kevin Jahns: [00:14:08] Honestly, I don't know. On your local computer and probably everyone can read it, who has access to your account. Um, so it is protected by your, um, uh, operations system. Um, whatever encryption mechanism you use or if you use any, um, but aside from that, of course, browser doesn't encrypt the database. Um, of course you could if you want, um, create your own encryption mechanism.
Uh, they are examples for that. Um, for example, some, uh , password managers use indexedDB to store the passwords and, uh, make them available while offline. But you need the key to process that. And indexed DB is really an really interesting technology because you can even store, um, keys like, um, S M R S H keys.
Um, in a very secure manner so that your application can't access the content of the key and can't send the key to other people, but you can store the key. Um, so you can have access it and decrypt things so that, like, they a lot of thought put into this. And I am not an expert on that, but I can generally say that you can implement secure applications with it.
Michael Meyers: [00:15:36] Wonderful.
Preston So: [00:15:36] And one of the things that I think is really important to stress for our audience today is that, um, not only are we talking about offline. Uh, first applications as being something where, uh, you have access to all of this data offline, it also improves the end user performance, um, because you've constantly fetching from the cash.
I think that's just a very important thing to call out for the audience who might be less familiar with how, uh, this can enable that really quick time to first paint as well in the browser. Um, so let's, let's move a little bit to topics that we know about, um, that we really, really play with on a daily basis.
Um, like Yjs? Um, as I understand that, Yjs is a topic, we've covered, uh, many times in this channel on, on this, uh, on the series, and it's a realtime collaboration framework that's open source. How does this support offline editing. Yeah.
Kevin Jahns: [00:16:29] So, yeah, the idea is just stores the document offline. Right. Um, and there's even more.
Um, there's a lot of interesting things with indexedDB and Yjs because, um, well to start again, uh, I created recently the Y indexedDB adapter for Yjs which enables you to store, um, your document. In the local index to be database, but also store document updates very efficiently. So, um, you can retrieve it very fast.
And also Henderson case of concurrency, when you have multiple pages open that access to the same database, they can all access the same database, retrieve the information, share the content through the database, which was really interesting and something that you want because, um. Often he wants to have, um, often you have different browser caps open, accessing the same content, and, uh, they need to be able to communicate with each other.
So, um, this is what, why index DB enables you, um, even why offline. Um, your browser tabs can communicate with each other and store document updates. And instead of storing the whole document as a big binary blob, every time you make a change, you only, um, store small increments of the data to the database.
And when you reload your page, it just excesses index to be database and retrieves the document. Again.
Fabian Franz: [00:18:07] Just to give a very practical example, assume you have 3000, 500 tabs open, um, in a browser, and you've worked a lot with your CMS and you do things and you, um, edit content and then you go away and, um, have a nice day outside and you come back and you continue working.
And you totally forgot that you had this article open in your Drupal CMS. And, um. Then you go back to your CMS log in, edit a content, and it's all there. Everything you've changed already on the content is all there off. Automatically. You don't need to do anything. And even if you know, change it here and save it, then you go back to your, basically your other draft and that then automatically it has all changed to what you already see, all your changes say on that and that you already saved that, et cetera.
So, um. Um, even if you forget, kinda, you don't have conflicts, you don't have things to, Oh, how did I do this? I need to reproduce this here, et cetera. And even, in another case, I'm, assume we would have had this for like Drupal log issues. You're working on a few issues, writing long texts, et cetera. Computer crashes, browser crashes, power goes off. Whatever happens, all is lost and that you haven't submitted yet and you're freaking out on it. If you had attached this, to Y, in a sense, to Y indexedDB, it's all there. You just come back. You can continue working like, like you never had been being away in that. So, um, it can also help safe data in that.
Um, that you never lose any, any content in that. And that's very, very great in that because it's a, it's a problem. I mean, we have Ultrasafe in Drupal, for example, as a module. The treat country saves but it saves to server. So what happens when you're offline, you can't save from that. And obviously we also have some, we tried to put it in like local storage in that, and it looks a little bit, but it doesn't really, um, help us in cases of, you make up this other tab of the same content, et cetera. And it's really nice that this with Yjs can just work, just works.
Kevin Jahns: [00:20:31] Right? And this is another use case. Um, you talked about the crashing of your local browser, or like losing your internet connection. Uh, just imagine you go in a plane and, Oh man, I need to have this tab open.
No, until I can sync the content to the server. Now what you can do is, um, because service workers also have access to your indexedDB database. As soon as you have access to the internet, again. It can, the browser will start your service worker even when your website isn't open. And, um, you can sync the content to your server and persist it as so other applications can also see what you created.
Um, so what this really allows you is to create offline first applications, uh, that gradually sync, but also enable you to. Uh, progressively enhance your application because now your content will be available immediately after you open the website. And the case of a Drupal document that is collaboratively edited, um, like, um, you start your CMS and you immediately see the content popping up because it is, it doesn't need to wait for the server.
It will serve it from a local database. And also. The amount of data that needs to be synced is reduced because Yjs already supports, um, differential synchronization. So, um, instead of exchanging the whole document, it will only exchange the parts that changed. So the server will send you a small update of the content that was created by other users, and you will send to the server the content that you created.
So, um, that regard, um, indexedDB and Y indexedDB is a really nice enhancement to your application regardless . If you want to use it, create an offline application, it is just there to improve your, uh, the performance of your web application.
Fabian Franz: [00:22:35] So Yjs was basically perfect for James Bond, like clicking the scenario.
They, um , they need to send some very important information. They have to that, but there's no internet in that. But they know this trek eventually have internet. Even if James Bond has, has gotten gone off the rails already because, but this laptop is still there. As soon as it gains back the internet content, it will send the data and then all is safe.
Kevin Jahns: [00:23:01] All right. Awesome example. Yjs is for James Bond? Let's keep it at that.
Preston So: [00:23:09] I think we can all agree that some of these features are very much secret agent features and very futuristic. Um, so, uh, we've talked a little bit about Y indexedDB, um, in terms of that a adapter for Yjs, um, how does this exactly tie to web workers though.
Kevin Jahns: [00:23:29] I mean, um, it is not, when you tied to it, it does work in conjunction with, uh, service workers. Um, so,
Preston So: [00:23:38] Okay, sorry. Yeah, no, we should cut that. We should cut that question out.
Kevin Jahns: [00:23:43] Well, some question because, um, they're also web workers, which is a completely different thing. And, uh, service worker is a job that sits in the background and intercepts your, um, HTTP requests.
But, um, web workers are just background jobs that you can start, um, for example, to do heavy calculations and with indexedDB and web workers. Um, actually accidentally, this is an awesome question. You can do really awesome things. Um, for example. If you want to do some heavy calculations on your Yjs document, you can start a web worker that fetches the Yjs document, um, from the indexedDB database.
And then does these calculations maybe do an very expensive sync job with the server, or, I don't know. Um, for example, you want to render an HTML page based on the, um. Data that is in the Yjs document and then send it back to the main thread. So, um, web workers allow you to do heavy calculations and indexed DB allows you to exchange data with service workers or the web worker and very efficiently, basically with all the browser contexts, because, um, indexed DB is shared between all the broader contexts.
It's basically the ideal, um, database. Um, that multiple processes can access concurrently and, um, they can use it to do different things. For example, as a service worker might want to create a web socket connection to the server to, um, sync the data, um, or even use a different technology to sync the data to, um, persisted and the web and the cloud wherever, and the main thread,your website. Um, it doesn't really need to do that much and it doesn't need to be open to sync the content.
Fabian Franz: [00:25:47] Yeah. I think that's, that's very important thing. Um, so, um, this thing called web workers. Um, the beginning to this is set synchronization, especially if you have lots of changes on the server and lots of changes on your local. They all need to be some conflict free, which Yjs can do it still takes a while and just saying this, um, as a user you don't want to wait. Cause a bias. So you don't want to have like a school. So click certain thing data. You can start working in 10 minutes. You can start working in nine minutes. We can start working eight minutes.
Hopefully it doesn't take that long. But anyway, you get the idea. So if you put all of that into Web worker and you can solve, already start working on your document, you can see live changes that are already happening in that, um, that happens soon. And, um. Ah, that all happens in the background. Basically a web worker gives you a different thread than the main thread where all of the interactions from this user happens.
So a web worker is the basically for not blocking the user interaction, which is very, very important because as we speak about real time and all of that, uh, one of the most important things here is to responsibility, um, responsive. a responsive maths. That means that the user really gets feedback. Really fast on every key stroke?
So it's not like I click a key, then it gets sent to the server, and the server says . Yes, you can write that key and send it back and then I can take the next key in that because that's a even, so how I'm. If you try some of the other technologies like Google Docs, that's us. Uh, have a low latency connection. It can happen in that I have this happen in one of my, um, tests that I did originally when I was comparing Yjs with Google docs.
Then suddenly I had a very, very, very slow Google docs, I was clicking a key and it, it took like. I dunno. I think it took three or four seconds. I did the keystroke was for just that and I checked the next key. And so in that regard, um, Web workers indexedDB service workers, they play like perfect triangle of, of things that are now available for us to build great offline apps.
Michael Meyers: [00:28:03] You just gave me flashbacks. Fabian to writing an email in Gmail and you try and send it when you're offline and it gives you an error. Or when I want to check my mail and it's like we're going to retry and five, four, you know? Um, so could Yjs enable, uh, Gmail to work offline? Like I could write my emails, store them for submitting later when I'm back online.
Kevin Jahns: [00:28:32] You can do that without Yjs, but as soon as you have a collaborative document, you need something like Yjs because so much can happen while offline and other clients can edit the documents. So you basically need a very, uh, involved synchronization framework. And the case of Google docs document.
I also remember that I remember all the forms that I lost because. Um, my post request wasn't performed on the server because the service unavailable or whatever, and I get a four Oh four, and all the content that I created is lost. Now with service workers, what you can do is instead of losing that content, the service worker will store the document, the request offline, send it later, as soon as you have, you have internet access.
And the same goes for Yjs documents for collaborative documents for for your react app, you can share stage with Yjs basically anything or your drawing app, just store it in indexedDB and let the service worker. Um, will figure out when to sync the content to the server.
Fabian Franz: [00:29:39] And basically that brings us to very, very, very important point because, um, with Yjs, a technology is changing from a traditional client server model to one of synchronization.
And that is very, very, very, very key in all of this. And I'll explain why. Um, the reason why this is so key is. Um, that's as Kevin said, usually when we have a traditional application, build the spec of whatever you're doing, state changes and those state changes eventually go to the server. And that's usually done with the post.
And, um. If said, pay is okay, the application can try again, et cetera. But imagine that you would need to not need to deal with any of that actually, you would just use like the Yjs primatives, like the Y array will store it in and you would now eventually it builds, sent to server when there's connection, when there's everything in that, and that's different way to think about data.
It's no longer in class. Those data. It's stored to the server, and I get my new state back from a server, but it's just, I put my daytime here. It's synchronised automatically, and there's a completely different way of thinking about applications, of thinking about data, how it gets in, how it gets out, et cetera. And that's why I'm.
Offline first with Yjs is so key and here not just Yjs as a text collaboration tool, but it's a tool that's built upon data types so that you can have a collaborative to do list. Basically, just by having the data structure that's collaborative. As soon as you put an item in there, it gets into to the server, it gets thrown to the other users that are collaborating on the to-do list automatically.
And, um, this is in its own way, a kind of revolution.
Preston So: [00:31:34] Absolutely. I think that,the use cases that this enables, will really reinvent, the way that we interact with, probably the vast majority of the applications we use that involve some level of collaboration really, really easily.
I mean, I can just think of so many ways in which, for example, scheduling collaboratively could become easier, with this kind of model. A very, very exciting. And, so last time we got together as a group, we talked a little bit about this new communication protocol that allows her browser-based communication called web RTC. and for those of you who are in the audience who are, who are watching this or listening to this, do check out our web RTC webinar, the Y web RTC webinar, uhepisode, because that goes into a whole lot of detail about some of the really interesting features coming out because of that. So one question I had for you feet on one question I had for you, Kevin, was how does this work in conjunction with Y web RTC, and what sorts of ways can we combine these, really interesting technologies together.
Kevin Jahns: [00:32:42] Yeah. that's an awesome question. I think the only downside of Y web RTC, which allows you to basically without a server, create a peer to peer network between all the clients that are interested in the same document. And the only downside of this is. That it takes some time to create a connection to the other peers.
And it takes some time to synchronize because a web RTC, as soon as the connection is established, it is very fast. But creating connection, um, like establishing that connection sometimes takes awhile, especially on, um, bad internet connections. So, um, and this fits together with, um, Y indexedDB because it allows you to store the document offline and serve it immediately without waiting for the other peers.
And, um, it also allows you to do some very interesting things. Just imagine our, um, video chat application right now. They are a lot of web RTC, um, um, based chat applications on the web for the web that are accessible for everyone. Um, some of them allow you to, to collaborate on documents. Actually, I only know of one.
And, um, so with indexedDB, now you can store the document on offline. So the next time you open this, um, the chat room that you created for collaborating, for chatting, um, it is still there. And you don't need a server for this because. Um, it is persisted locally and among all the other clients, and you can even sync with them, um, as we discussed, even if they don't have, uh, um, if, if they don't have the website open because there are background jobs that can do that.
Um, so this allows for a lot of interesting technology, especially with Y web RTC. Um, I think it is necessary to have faster access to the document. Um. And how would you can do that? Its now Y indexedDB.
Preston So: [00:34:56] Very interesting. Let's talk a little bit about, some of the kind of byproducts of all of this. Um, how does this not result in, in an overload of garbage? Um, you know, I think that this is a very important topic for, uh, when it comes to performance and scalability.
Kevin Jahns: [00:35:13] Right? I mean, um, there's, there are two aspects here.
Um, the first one is how can you actually resolve all these conflicts and, um, yeah, I think we got, we made a deep dive. I think the second deep dive, we go into detail how the conflict resolution, um, actually figures out the stuff you didn't figure out that we didn't explain it. In full detail or prove it, but there's a paper, uh, just go to the Yjs Git Hub repository.
On the bottom. There's an explanation on how the conflicts are resolved. And here very important is, um, Yjs you as was always an, um, a framework that doesn't need a server connection. It's a network agnostic and doesn't need a unique order of the messages that are sent. Um, it basically only listens to document updates.
So it can always think as soon as your local document gets all the updates from the other peers, it is synced. Um, there's no central instance that manages, um, how your conflicts are resolved. This is all decentralized.
Fabian Franz: [00:36:30] Um, just to remind our audience a little bit about, uh, what we talked last time. It is set.
Um, when we, have Web RTC. We can have also partition. That means we are cross connected across the ocean. Mike in NY, Kevin and me in Europe connection gets severed. There's no connection anymore between the us and this. Basically, we are at least from a relative standpoint out of each other offline. And so, um, that's why basically Yjs has already handles the offline case.
It must, every peer-to-peer application, um, that is working in a way that it can circle changes after a connection is restablished is basically offline. In a way, and, um, this just takes this further, um, that if we all close our connections down, the document is not lost because every one of us even has a copy of it in that.
So, um, um, let's say we are working on this document across the ocean and the NASA cable goes down and we also frustrated. We all close our browsers. So, and now we started up again and then no one would have a copy anymore. No one would have the changes anymore in that. But if I have my own copy and Kevin does his own copy and Michael has own copy, then just can all sync up again, as soon as, Hey, networks back.
Okay, let's try it up again. And, and um, it will automatically. to. So I'm really, um, um, this Y indexedDB is just, um, like it compliments, um, the Y web RTC approach. It's basically, even in a way for me, myself, a requirement, um, that we have not only the capability to sync in real time, this is pretty nice for video.
Um, but also to store whatever we worked on.
Kevin Jahns: [00:38:28] That's right. Yeah. I think the second part of Preston's question was, um, like as soon as you sink, uh, there's also garbage that is there. Like for example, if I create a lot of content, you create a lot of content. There's a lot of garbage, possibly. Possibly. Um, how do you actually figure out how to make a proper sync that makes sense to the human, um, in good terms.
Um, yeah. How, like if there are conflicts, how do you resolve them? If I fixed a typo in the document and Fabian while offline, or while not having a connection to me also fixes the same issue. Um, when the documents merge, we might end up with garbage. Right. Um, we just discussed this in the prose mirror channel.
Uh, briefly. And, uh, one of the arguments against CRDT is, is that, um, you, well, you don't want to have that, that garbage. He wants to have a central instance that manages and only serves one version of the document that makes sense. And that was well proven to be correct by a human. And. There is something that you can do against that.
And that's Yjs versions and which allow you to, as soon as you sync with the other clients, you can show the differences that happened, um, while others were offline or why you were offline. And then with, by showing the differences between the, um. The changes that just came in, you can figure out. Um, okay.
Fabian also fixed this issue. Now we have, um, the fixed two times applied, which doesn't make sense anymore. Um, I can't, I need to revert his change, um, or I can just edit his change. And this is basically the same thing that you want to do. When did you, uh, git merge? Um, you just figure out the differences and you, um, apply the changes that happened concurrently and in a way that makes sense for the new state of the application.
Fabian Franz: [00:40:44] What, Yjs can do just in a, in a much more broader term. And that is, and we didn't talk about that, but it's so exciting. It's basically , if you know Drupal, it's revisions and you know, it basically, um, every time we do a change, we have like a revision and letting Yjs quote a state vector.
So we have all the history of all time. So I've, so you do, most was built on right? JS and we could fall back to every point in time. it's time machine. Basically, if you're a Mac and, uh, we have as a time machine, you can also go back to every state. That your hard disk to was ever in as long as you have space on your back up anyway.
So, um, because we have to make rules of history or full time in labs. Um, and in a more practical example, Michael and me are working on a document together. It's important. And, uh, he booked offline and then, um, and I worked offline, then I will change and sync, and, um, we have two paragraphs that really don't fit together too well in that. And for that you need someone like Preston, who is an editor, um, and joint chief, um, who can then untangle the mess basically. But what happens if it's within the sentences and it's really, uh, it's really gets to garbage. So, and that's where Yjs versions come in, basically because Michael can see how my changes affected his version, and I can see how this changes affected my version in that.
And now the nicest thing about that, and its not implemented, but we will probably do it , which, um, is, um, you can also rewrite history saying about like, it gets rebased into XF. Um, they basically rewriting the history. So Michael says, Hey. This whole law that Fabian here wrote. Um, that's cool, but I don't really need it. So he could go back to basically my version, in theory at least, um, take out that copy it somewhere else, like stash it, um, for later usage and some other document, then delete it.
So then I've never made this changes basically. And then he would reapply basically those changes and that basically would allow to see that. And, um, with it was Yjs's its in theory possible, because we have all of the changes from everyone now. Everyone changed what character they changed. And even if you type like total garbage in, that might be put, just say, Hey, this change, let's just take this version for me.
And this is now, now the version again, if we just apply whatever come came in their manually in that. So that's a lot of possibilities even without central server to do that. And I think that's, um, that's much more important because, um, I do think in the end, every git conflict. Who resolves it? It's a human, it's a human.
There are no machines doing that because they can understand that mentioned anonymous there that's approaching that. But in the end, it's a human, and I think it's much more important to give a human the tools to do that. And Yjs is written as such a tool. They can see all of this has changed since we last sentence.
Um, and that's a way we available that that's when the map site, you can see it and it's quick. It's really fantastic. And even see how your doc, how the document is changing since you've been away in real time. It's, it's really freaking awesome. It's one of the coolest features I've seen in that. And we are definitely linked in that.
Um, but giving the human the tools, it is, in my opinion, the way, and the nice thing is whenever we cannot make the sync we will ever be, can't be, can still involve the human with seeing them. And I think that's much more empowering, than to say, um, it needs to be a human anyway. So, um, uh, we do, just to get, like we, constellation approach
Kevin Jahns: [00:44:42] Right. Well, put. By the way, this is really available on the Yjs.dev website. Just give a quick overview here. Um, Yjs.dev now works with what? Web RTC. So there's no central instance that manages the content and also, um, it has versioning support for the examples, or at least for the crossmember example. So you can click on one of the versions, you can create versions, and you can actually see the differences that happened.
Why you were offline. So all of this is actually possible. The only thing that is missing is like, create an UI for your application to show those differences. So for text documents and for prose mirror, this is, uh, almost been solved basically. But, um, I think they are some UI cons. Uh, things that you need to consider to do that.
Right? It's mostly an UI thing, but the framework possibilities are all there. They're all open source and they are ready to use right now.
Preston So: [00:45:46] Wonderful. And I think, um, that's one of the things that I'm looking forward to checking out as soon as we hop off this episode. Um, one of the things I wanted to ask was, are there any other challenges in offline sharing that we haven't already mentioned? Um, what are some of the other sorts of hidden or lesser known shared in use cases that, uh, this particular approach around Y indexedDB enables.
Kevin Jahns: [00:46:11] Oh yeah. I think the, um, the trickiest thing about offline editing is, um, garbage collection. Um, so, um, we talked about versions, um, that we want to jump back to previous versions. We want to save versions and, um, but how does it tie, um, that like if we store all the content that was ever created, um, our documents would blow up.
Um, in the case of, um, my, the website,Yjs .dev? Um, if I would store all the content that all the users that visit this website, um, if I would store that, um, in the document, the document size would get huge. They are people who copy of Wikipedia articles and just put it there like a thousand times.
And the document can be pretty large as soon as you delete content again. The content should be freight. But, um, if we have versions, if we store all the content that was ever created, we can never delete that content from the Yjs document. And now, um, there's a, a method that allows you to garbage collect the content that was created in between versions.
Um, so this is really cool right now. Um. You go on the website, you create a version, the version, the content of that version. It will never be garbage collected, but if you insert a wikipedia article or and somebody else, somebody else deletes that we wikipedia article again.
Uh, the content is basically garbage collected and not be stored in the Yjs document. So there's a finely grained granularity that you can have, uh, on what content is actually deleted from the content, uh, from the document.
Fabian Franz: [00:48:04] Yeah. Basically, it's something we talked about early on of what optimizations managers make that makes CRDTs feasible, and that's one of them.
Right. And basically, um, to give it an example from the Drupal, world um, yeah. Um, and now in Drupal 8 we now have phobic provisions. These workspaces it means an editor can work on some article
a long, long, long, long time.
And once I'm finally ready to publish it, uh, um, they had published this article and then it goes basically live So, but now they have like, um, 20 30 or whatever, how many revisions in between that might have been important by writing this article, but are longer important ones. The new version of the article is published actually. So basically my article goes from draft stage to publish, or they take an existing article to change it again, and it goes back from draft to publish in that.
So all of those versions, um, uh, at least, for example, in the kind of previous system workspace it could be implemented as well. We automatically Garbage collect because those are no longer important. So one of the things we've found. In terms of Drupal how Drupal does test things and how um, and to Alexa or Google jobs, the things of like, it's very important for the user to be able to publish something, to click on a button and to say, this is the version that I want to save.
This is something that didn't need to be able to save to, et cetera, et cetera, et cetera. This is a state and really want to save the state. I want to print off, this is the state I want published. And play I'm changing this over to half versions, not be every keystroke. It's like a new version in that. And then you basically compress them a little bit, or, um, uh, but really by saying.
When I click publish and new revision is created, um, this allows us and the garbage collection to feel much more natural in that, of course, you can still like hold the back up plan to say, Hey, for the ultimate visions, I still keep them for 30 days, for 90 days. So whatever your data retention policy is, that's a possibility.
And that's basically what we're doing for the previous system in practice. Um, that if an editor says, ah. I've written something, I've published it, but in my head something, I need that back. Okay. And two days you can have it back up to what's probably not relevant anymore, uh, can make or whatever. It depends on how much data space you're willing to sacrifice, because that's basically always the same thing.
And that, um, that it's a trade off between having all the history. And using all the space. But if you're really proud of that, you could also take the Yjs document, put it on a tape tape. Seems to still be the cheapest way to store data right now. So you could do that. I mean, no one would hold you fromit and Kevin could hoard all the Wikipedia articles that users ever create, um, for all of history of, uh, how humans Interact this Yjs does Um. But, um, the main takeaway here is basically, um, versioning and not just arbitrary like every spin at milliseconds or whatever, but they're deliberately created by skews. And whenever there's no version created, we can at least have after some point garbage collection And this causes huge problem of, um, data editing in general, even.
In that, um, that's overall you want all history, but on the other hand, you don't
Kevin Jahns: [00:51:52] right.
Preston So: [00:51:53] That's
a very good illustration of, of that conflict. Um, uh, no pun intended, by the way. Um, I think that, um, uh, this is, this is very interesting. We're running out of time though. So I wanted to jump to one final question just for our audience who might be considering adopting.
Uh, Yjs and Y indexedDB and, and all of these, uh, this incredible suite of technologies. Um, are there any companies already using, Y indexedDB for offline editing? Are there any people out there already making use of this in production?
Kevin Jahns: [00:52:24] Um, so the new implementation of Y indexedDB which is created for, um, version of 13 um, it's pretty fresh.
I'm not aware of anyone using it. Um, except the Yjs.dev website But for version 12, there was a similar project, right? IndexedDB But for the previous version, which is now pretty old, but, um, I know a lot of people who used it just by default because it made their website faster and it didn't really cost anything.
Um, it works the same way It has a different API. And, but now you can do more stuff. And yes, there are at least two companies I know who use them. Uh, I'm not sure if I can name them right now, but, um, yeah, basically it's just some progressive enhancement and you should always use it if you can. Um, although I want to add, I'm not aware of any offline only or offline first application.
That uses Yjs and Y indexedDB And this is something that I would be interested in seeing a lot more. Uh, and also I want you to add,Yjs.dev is completely offline first. Um, just visit it while offline. Um, there's no difference in the online and offline version.
Preston So: [00:53:41] Wonderful. And for people who want to dig into the code and, and report issues and look at some of the way that the, that Yjs actually works, where can they find the source on GitHub?
Kevin Jahns: [00:53:51] Okay. Just go to, um, github/yjs this is a github organization for Yjs. There are lots of repositories.
Yjs-demos is, um, should be your go to resource for figuring out how to create examples, how to use Web RTC indexedDB, prose mirror, like all the things are there. Eh, there's a lot of documentation in the Read Me on Yjs . And, um, yeah. If you interested in the inner workings, watch our previous webinars and also there is a paper at the end of the Read Me and yeah, please come visit us.
Preston So: [00:54:35] Well, I know I'm going to be telling all my friends as I have been every week that we've been able to have you on the show, Kevin, to try out Yjs. Um, in fact, I'll probably tell some of my colleagues at work later today that we should be looking at. Yes. Um, so, uh, with that, unfortunately we're out of time.
Uh, but I'm really glad to share this time with, um, our good friends, uh, on the type one team talk show today. By the way, we post all of these talks at tag1.com/tagteamttalks. All the links that we mentioned today, including all of the github links, and all of those examples that we mentioned are going to be posted online with the talk.
And as always, if you enjoyed this episode of Tag1 team talks, we'd love to hear from you and get your feedback about things you want to hear about. Please remember to upvote subscribe and share it with your friends and family. And as always, if you have any topics you want to hear about or anything you want to talk about on the show or hear about on the show.
Please write to us at firstname.lastname@example.org and I want to thank Kevin, Michael and Fabian today, three of my dear good friends, uh, on the show. Uh, thank you so much for your time today. I learned a lot. I hope that our audience did. I'm sure our audience did as well. And we'll see you next time on Tag1 TeamTalks!