Transcript: Unraveling the ETL Data Migration Process - Understanding Transform

This is an edited transcript. For the blog post and video, see Unraveling the ETL Process- Part 2: Transform.

[00:00:00] Janez Urevc: Welcome to Tag1 Team Talks, brought to you by the Tag1 Consulting. With Drupal 7 rapidly approaching and Drupal 9 already end of life, we are hearing people talk about migrating and upgrading more than ever before. And anyone who's ever been involved with a large scale migration, Migrating a large site or application from one technology stack to another will tell you that it's complex, time consuming, and it demands expertise.

[00:00:34] Janez Urevc: That's why we're bringing you this series of talks. Diving deep into the world of Drupal migrations. And who better to guide us than Tag1's very own Drupal migration experts. From the masterminds and maintainers of Drupal's migration tooling to the individuals behind the most groundbreaking Drupal migrations, we've got an all star lineup who'll cover everything you need to know about [00:01:00] every aspect of migrating large scale applications.

[00:01:05] Janez Urevc: This team talk is part of the three part series about ETL, extract, transform, and load process, which is used by many enterprise migration systems, Drupal's Migrate included.

[00:01:18] Janez Urevc: In today's episode, we're going to talk about how to use Drupal's Migrate system to transform the data before loading it into the Drupal's database.

[00:01:27] Janez Urevc: Be sure to stick around to the end because we are going to announce the next few talks in our series. Let's dive in. I'm Janez Urevc , senior engineer here at Tag1, and a longtime contributor to Drupal. I'm joined today by, well-known top contributors to Drupal, Benji Fisher, one of the five current Drupal Migrate core system maintainers.

[00:01:49] Janez Urevc: And Mike Ryan, co-creator of Migrate. Welcome. Thank you both for joining me.

[00:01:55] Benji Fisher: Thanks for having us.

[00:01:57] Janez Urevc: We're glad to have you. [00:02:00] Before we dive in, I would just like to mention that in case you didn't already watch or listen to the previous episode in this series about E, Extract, we'd suggest that you do so.

[00:02:11] Janez Urevc: In that episode, we, among other things, provided a high level overview of what ETL stands for, so we'll not repeat that in this episode. Now, finally, let's dive into today's topic, which is T Transform. Um, Mike, could you tell us what is being done as part of the transport phase in general and how Drupal does it?

[00:02:35] Janez Urevc: Uh, is it like similar to how other enterprise systems do it or are there any specialties to it?

[00:02:44] Mike Ryan: Well, one difference from your classic ETL is, um, the classic ETL usually goes in bulk. You extract all your data into a big blob. Then you run it through a transformer, which transforms everything. And then you run it into [00:03:00] a loader, which does a bulk load.

[00:03:02] Mike Ryan: Uh, our approach is to run through the data one logical row at a time. We say row because most often we're dealing with, uh, databases as our sources. But. Technically, it could be anything like any form like a web service or CSV. So, um, basically, we Use the Drupal plugin system. Um, and you can, for a given pipeline, each field that is being run through the pipeline, um, can go through any number of transformers because they're plugins, it's very flexible.

[00:03:48] Mike Ryan: Uh, there's a YAML format you can use to write your migrations, which specifies for each field, what plugins it's going to transform with [00:04:00] and whatever configuration you add. So it takes the output of the source plugin and all the source plugins, regardless of the source CSV, et cetera. Uh, produce a common data structure, which feeds into the transformer.

[00:04:17] Mike Ryan: and the transform pipeline will take one row from that. And, um, it will go through each piece of the process. Uh, the transform step, uh, we call it process in, um, Drupal and apply the transformers. And each transformer can take actually multiple fields from the source row, or it can take none. Uh, you might use a processor that simply sets a constant value.

[00:04:57] Mike Ryan: Uh, and the [00:05:00] transformers can be very flexible and for the most part, they're not very Drupal dependent. Um, you'll do a lot of string manipulation, for example. Um, Let's see, I'm not sure what else there is to say about the general process. But, um,

[00:05:20] Benji Fisher: One thing I'd like to add at this point is that sometimes we cheat.

[00:05:25] Benji Fisher: We don't strictly follow the ETL paradigm. Um, but we take a peek at the database. Um, so for example, we might look for an existing taxonomy term that has, uh, the name dessert, or we might check how the editor is configured. So that's one way in which it's very Drupal specific and, and doesn't strictly follow the ETL paradigm.

[00:05:49] Mike Ryan: Right. You, you have complete flexibility. You can do anything you want. Good or bad.

[00:05:56] Janez Urevc: Like always.[00:06:00]

[00:06:00] Mike Ryan: And, and, well, maybe while we're mentioning bad things to do in these processors. It should be noted that this pipeline is run for each source row in your data, and when dealing with multiple value fields, it might run several times for one source row.

[00:06:21] Mike Ryan: So the processing pipeline is a key place to watch performance. One, one slow processor will kill the overall migration process.

[00:06:36] Janez Urevc: Makes sense. Cause it could be run a lot of times and that adds up, right?

[00:06:42] Mike Ryan: Oh, thousands, millions.

[00:06:44] Janez Urevc: Yeah. And it can take days as we discussed before. So Benji, I heard you state in the past that the transform stage is the most interesting part of the migration and I know for [00:07:00] a fact that you are probably the most excited about it in the whole. ETL migrate world. Um, why is that?

[00:07:11] Benji Fisher: Yeah, you're, you're right. And this is something that I decided, uh, when, when I first started working for migrations. Um, and by the way, I, I am the most junior member of the current maintainers and, uh, And I have a lot less experience than most of them, or than Mike, so I defer to Mike on questions of, uh, experience and performance in large scale migrations.

[00:07:37] Benji Fisher: But I do have some pretty strong opinions about the transform stage, the process plugins. So, the first reason that it's the most interesting is that any migration project will be broken up into a bunch of different migrations. And each one of those migrations will have a single source. and a single destination.

[00:07:58] Benji Fisher: Uh, but any [00:08:00] one migration has many fields. So if you have, uh, a migration for your article nodes, it'll have a body field. It'll have a couple of timestamps. It might have taxonomy and images and so on. Each one of those fields is going to have at least one process plugin. One transformer, as Mike described them, and some fields will have several transformations.

[00:08:25] Benji Fisher: So, in that sense, it's, um, it's where the most variety is. You know, one, each migration again has, has one source plugin, one destination plugin, but can have many, many transformation plugins or process plugins. Um, the second thing is that the transform stage, the process plugins are where you have the most opportunity for reasoning your code.

[00:08:57] Benji Fisher: So if you look at the source plugin, [00:09:00] it has to understand whatever cruft is involved in your source data, um, the site you're migrating from. And the only time you're going to be able to reuse a source plugin is if you have the same type of source. So once you've written a source plugin for a WordPress XML file, you can reuse that.

[00:09:23] Benji Fisher: And once you've written a source plugin for Drupal 6 or Drupal 7, you can reuse that. Um, the destination plugin, almost always migrating into Drupal entities. They could be taxonomy terms or nodes, um, menu links are entities, um, and the core migration system already understands the destination. So that's already done.

[00:09:49] Benji Fisher: Uh, but getting from one to the other is in my opinion, the interesting part and, and the part that has the most opportunity for reusing code. [00:10:00] So that's why I think that the transform stage is, is the most interesting.

[00:10:05] Mike Ryan: Yeah, it, it's, it, for most migrations, you'll find that the source and the, uh, the extract and the load phases, uh, you simply need to use

[00:10:20] Mike Ryan: core plugins and some configuration. Uh, you don't usually need to do very much PHP coding. It's the process plugins where you're most likely to meet, need to write your own plugins, write your own application logic, because that's where, you know, you're transmogrifying your data. You can do the, uh, new system.

[00:10:46] Benji Fisher: Although there are some people who prefer to do it all in the source plugin. They'll just write all their custom PHP there and prepare everything so that it's ready to be imported. And,

[00:10:58] Benji Fisher: and again, I don't like that [00:11:00] approach because it, you can't reuse the code if you do it that way.

[00:11:06] Janez Urevc: Yeah, it makes it way harder to reuse it. Um, it's also against the ETL paradigm, I guess. Because then you're... Basically throwing away this separation of different phases that we're trying to introduce here. Um, what, so to be a little bit more concrete, what would be the most common transform operations, um, in a migration?

[00:11:37] Janez Urevc: Like what would we do in transform process plugins?

[00:11:42] Benji Fisher: Yeah, so by far the most common is just a straight copy. You know, you have a text field, and you pass it over to the new text field, which often has the same field name. Sometimes you decide to change that as part of your site redesign. [00:12:00] That's the most common.

[00:12:03] Benji Fisher: And, you know, that's almost not like using a transform plugin at all. It's technically using the get plugin, but it's not doing any transformation. Um, another common thing is that your source has a comma separated list of values, and you split that into pieces, and you convert each word into a taxonomy term ID.

[00:12:29] Benji Fisher: Um, so that's something that comes up pretty commonly. Um, another really important one is since Drupal deals with structured data, you might have, um, references to other nodes, other taxonomy terms identified by their entity IDs. And if those entity IDs are changing as they often do in a complex migration, then you have to translate the old entity id, the ID on the [00:13:00] source system to the new entity id.

[00:13:03] Benji Fisher: Um, and that's possible because the migration system keeps track of the old and new entity IDs. Um, so that, that's a really important one. Um, some other things you might wanna do is, make your site better as you're transforming it. So if you see that people are consistently using CSS classes, font dash bold, size dash large, color dash red, well, you can replace that with my theme dash warning.

[00:13:40] Benji Fisher: And suddenly your CSS markup is a lot more semantic and a lot easier to maintain in the long run. Um, another common one is to convert date formats, like maybe they're in year, year, month, month, day, day format, and you want to convert it to a timestamp or vice versa. [00:14:00] And then there are a whole bunch of utility operations.

[00:14:04] Benji Fisher: And, and you wouldn't think of these as the things you want to do to your data, but they're the things that end up getting used, in the middle of the process. So flatten an array, combine several arrays into one, filter out empty values, or apply a callback function. So, so those I think are the most commonly used, um, process plugins.

[00:14:27] Benji Fisher: Uh, Mike, am I leaving anything out?

[00:14:28] Mike Ryan: Yeah, I think that those are the key ones. I'm sort of looking now at the list of all the ones that are in core and maybe we might want to highlight a few other interesting ones.

[00:14:45] Mike Ryan: here?

[00:14:46] Janez Urevc: While you're looking at it, I just wanted to comment. The callback one is an interesting one because it almost lets you cheat a little bit. Like if you're, if you, if you need to introduce your custom logic, [00:15:00] but you don't want to create a plugin and go through all that.

[00:15:05] Janez Urevc: Um. You can always use a callback existing callback function process plugin, and then just create a function in PHP, which will be called.

[00:15:17] Mike Ryan: Or use a basic PHP function. Or use a basic PHP function. Right, you don't need to wrap trim in a plugin. You simply use callback, specify trim, it's the callback, and boom, you got it.

[00:15:33] Benji Fisher: Maybe this is a good time to mention that our show notes include some links to the documentation where we list all of the plugins that are in core, and those will be available on our pages after we publish this talk.

[00:15:49] Mike Ryan: Right. So, uh, one of the interesting ones is static map. Um, this is... [00:16:00] Um, it basically, it's like translating enums that is if the source field contains a finite list of distinct strings and those need to be different on the Drupal side, you use a static map plugin, which says change this string to that string, and that's a very handy in a lot of cases.

[00:16:28] Benji Fisher: Or if you're dealing with NFL team names and the Redskins are now the commanders, you can say that this finite list of names has changed and anything else you pass through unchanged.

[00:16:40] Mike Ryan: Yeah. And the Cleveland Guardians in baseball. Right. Um, well, I'm seeing sub process, and that is one of the more complicated ones that allows you to do some [00:17:00] really complicated things.

[00:17:03] Mike Ryan: When a field consists of a list, an array, and it allows you to basically have a sub process pipeline for the pieces of this source field. And this is very, uh, complicated and technical. I'm not going to go through it right now because... I always have to relearn it

[00:17:28] Mike Ryan: when I need to use it.

[00:17:31] Janez Urevc: Well, will a subprocess use the same set of plugins as the main migration?

[00:17:38] Janez Urevc: With, like, source and transform and all those things?

[00:17:43] Mike Ryan: Oh, no. The source, rather than being a row from your source plugin, At the source is the contents of the field, the extracted field. So it's used on fields, which [00:18:00] themselves have structure.

[00:18:01] Benji Fisher: But it does use or have access to all the same process plugins that, that the general transform stage has,

[00:18:10] Janez Urevc: Which obviously is immensely powerful then..

[00:18:15] Mike Ryan: It is. It is. I mean, in theory, instead of. Migrating your taxonomy terms, well, that's a bad example because we've got a shortcut for that, but user accounts, in theory, instead of doing them in a separate migration from your main content migration, you could do them sort of dynamically within a sub process within your content migration.

[00:18:41] Mike Ryan: We do not recommend that, like I said, that the process plugins are very powerful and you can cut yourself.

[00:18:50] Mike Ryan: Um, there, there is a plugin for copying files, which, you know, if you're going from one system to another, an old version [00:19:00] of Drupal to a new one, uh, you want your images, your videos, your documents to come across too. And the file copy plugin is, um, very flexible because it gives you.

[00:19:14] Mike Ryan: Uh, few different options for doing that and, and for doing that performantly. Um, for example, it could simply copy it into Drupal's public files directory. Um, and it can keep track. Um, You can set a flag on it. So that if the file already exists at the destination, you don't overwrite it. And that's great for your performance when you're rerunning migrations, especially during development.

[00:19:51] Mike Ryan: Um, you can also, um,

[00:19:55] Mike Ryan: now I'm trying to remember the other, but, uh, of course you could use it to copy [00:20:00] directly into the files directory. You could copy it to an S3 bucket. Or some other, you know, uh, storage service, uh,

[00:20:15] Janez Urevc: Or I think that I had a use case in the past where we needed to copy files from like another website into our local file system as part of the migration. And I think that file copy was used for that as well, which is obviously terribly slow.

[00:20:33] Benji Fisher: And a little tidbit is that the file copy relies on the download plugin if the source is remote. And that uses, uh, Guzzle in a way that's slightly different from anywhere else it's used in Drupal Core and caused some interesting test failures years ago.

[00:20:56] Mike Ryan: Yes, you know, you have, sometimes you have to be clever to make [00:21:00] things work and work performantly.

[00:21:03] Janez Urevc: Yes. I also remember... Um, during one of the migrations I was working on, we used file copy to copy files straight from NFS probably to another public sites folder, public files folder, um, and we were actually copying files and that slowed the migration a lot. Um, and then we figure out that it's better to rsync before running the migration and then this, you know, check if the file exists kicks in and you don't need to copy, you just find it there and that sped up the migration significantly.

[00:21:48] Janez Urevc: But we're getting into performance considerations now, which is another talk in the future we will be doing, so. Um, uh, what about contrib? Like, [00:22:00] what kind of interesting, um, process plugins can we find in contrib that are not part of core?

[00:22:09] Mike Ryan: Oh, so many. So, so many, I need to jog my memory here and take a look.

[00:22:20] Mike Ryan: So the Migrate Plus module is an add on to the core migration system, and it has a number of, uh, interesting ones. Um, there are

[00:22:32] Mike Ryan: several, um, several plugins for manipulating and, um, and scanning a DOM, um, document object model. So you can scan your HTML or XML and find, you know, extract, you know, the span with the TextBold class [00:23:00] that's in the, underneath a P, for example, if you need to manipulate that piece of your content.

[00:23:10] Mike Ryan: Um, There is, uh, entity lookup entity generate that makes it easy, um, to find a matching entity. That's not one necessarily one that you migrated and what you can find via the map tables that migration provides. But if you're migrating in the system, you've got a, maybe a taxonomy there you want to hook up to.

[00:23:38] Mike Ryan: You can use entity lookup. To find a matching term in that vocabulary and link to it. And you can also use entity generate, which does the same thing, but also if it doesn't find the matching term would create it for you. Um, let's see, there's [00:24:00] file blob. You know, if you've got file data in a database blob, you can convert that to a real file with that.

[00:24:09] Janez Urevc: Um. File blob reminded me of the beginning of my career, which predates Drupal, where I had experience with proprietary CMS that was really into storing all files in the database. That was fun.

[00:24:27] Mike Ryan: Yeah. So those, and those are the ones that pop out to me immediately.

[00:24:33] Mike Ryan: um, besides migrate plus, which is sort of a grab bag of several different, uh, plugins, uh, there are several other contributed modules. Um, that have plugins of all sorts. And, uh, before you go writing your own plugins, take a look through the, uh, contrib modules that are available on drupal. org and you might find someone that's already solved [00:25:00] your problem. Maybe they've got a SOAP plugin. Actually, I know they do because I wrote it, but whatever your scenario is, assume you are not that unique. Until you prove you are.

[00:25:17] Janez Urevc: And Migrate has been around for years, like probably more than a decade at this point.

[00:25:24] Janez Urevc: Uh, and it's, uh, migrated many, many enterprise. Large scale applications. So I'd be almost convinced that if there is a use case, it has probably already been done.

[00:25:44] Benji Fisher: Yeah, it started out as a contributed module. Uh, Mike and Moshe Weitzman developed it in Drupal 5 or Drupal 6?

[00:25:54] Mike Ryan: 6. Okay. It was 6. I think we may have started trying it for 5, uh, [00:26:00] and we jumped ahead to 6 because 6 was It's more conducive to what we were doing.

[00:26:07] Benji Fisher: Yeah. My first experience with it was the Drupal 7 version.

[00:26:11] Mike Ryan: Yeah. The first big project, um, the first client you would know was the Economist. Economist. com way back in the day.

[00:26:24] Janez Urevc: Which I think also sponsored a lot of initial Migrate module work, right?

[00:26:28] Mike Ryan: Yes. Yes.

[00:26:31] Janez Urevc: After Economist, it was examiner. com, which all of them are sponsored.

[00:26:39] Janez Urevc: A lot of D7 work, right?

[00:26:41] Mike Ryan: Yes, they sponsored most of our Port D7. Martha Stewart Living was about that time too.

[00:26:52] Janez Urevc: Speaking about history, what, do you have any anecdotes or any interesting or unusual [00:27:00] process related use cases that you've experienced in the past?

[00:27:07] Mike Ryan: Oh, boy, they're all jumbled together. Well, one I don't want to remember is the time, uh, her client thought we had created a major security breach because, uh, our development migration, um, suddenly started sending out emails to all their customers.

[00:27:34] Mike Ryan: And this was the, the migration, and this is something to watch out for. Um, the migration system actually explicitly disables, uh, the mail system while running, uh, which we thought was safe, but what happened was a module was enabled, which during entity creation, which happens during migration, uh, [00:28:00] queues emails to be sent and it was fine for a while because this was a development system and that no one saw but um a one little ping on port 88 to that system caused Cron to run it was using the um, lazy cron or whatever you call it boom those Those, yeah, so those emails started going out and caused quite a stir.

[00:28:35] Mike Ryan: So yes, you, this, this is something you do need to be careful about is, um, effects, whatever effects the ultimate website might have beyond itself. Be careful that you control them within your development and testing system.

[00:28:58] Benji Fisher: Which is good advice in [00:29:00] general, not just for.

[00:29:01] Janez Urevc: Yes. And this is where, uh, I find DDEV, which is one of the projects that we are really excited about at Tag1 really useful because I believe that DDEV will reconfigure your, your development environment to redirect, uh, emails into like this. MailHog.

[00:29:22] Janez Urevc: MailHog. It basically redirects everything in there and it stays just in memory even, I think, so if you are inside DDEV with regards to mails, you can be pretty sure that no matter what's going on, you're safe.

[00:29:42] Mike Ryan: And, and it's handy too for testing your outgoing emails, testing the formatting or whatever.

[00:29:49] Janez Urevc: Yeah, exactly. That's probably the, the, the usual use case when it was created. Yeah. But it, you'll see as a side effect, it also provides a layer of security and peace of [00:30:00] mind.

[00:30:01] Benji Fisher: So, so, uh, I, there, there are two points here. I, I want to make sure we don't lose track of them. The first is that while you're developing, no emails will be sent out.

[00:30:10] Benji Fisher: But the second one is equally important. You have to look at the emails that did get captured by MailHog because those are the ones that will be sent out in real life when, when you're on production and not local.

[00:30:23] Mike Ryan: Yeah, make sure your tokens are being substituted, all that stuff.

[00:30:28] Janez Urevc: Yes. Benji, what about you?

[00:30:30] Janez Urevc: Do you have any, any interesting stories from the past?

[00:30:34] Benji Fisher: Um, yeah. And, you know, I, I'm really flattered that when Mike was going through the list of, uh, plugins in Migrate Plus the first ones he singled out were the DOM processing plugins, because that was, uh, one of my contributions. And, uh, And let me call out, this was on a project for Pega Systems, and I was working for Isovera at the time with Marco Villegas, and we [00:31:00] developed the first DOM processing plugins, and both Isovera and Pega were supportive of contributing that back to Migrate Plus And I guess the original problem I was trying to solve is that, um, as, as I said earlier, the, the node IDs were, were changing.

[00:31:18] Benji Fisher: So if we have separate entity reference fields, Drupal could already handle that and just using the migrate lookup plugin. And you could say that, uh, you know, the next article used to be node one, two, three in the migrated system. It's node four, five, six. You can do that translation, but what if you have a text field and inside that text field, there's an anchor link and the anchor href, goes to node slash one two three. How do you translate that to node four five six? And part of the answer was to use proper DOM processing. And so I, I realized, um, I guess I, I had the idea on, on an [00:32:00] earlier project and I didn't get to, um, make it happen until it was actually needed here. Um, everyone knows that you shouldn't be processing HTML with regular expressions.

[00:32:13] Benji Fisher: Uh, but people do it anyway. Yes.

[00:32:19] Benji Fisher: People do it anyway because it's, it's the tool they know it's convenient. And so, so, so the first step was to introduce some process plugins to make it easy to do proper DOM processing so that you have less overhead in creating that DOM document object and the XPath object and so forth. Once you've eliminated the overhead, it is often both simpler and more reliable to do the, uh, proper HTML processing rather than to do things with, with regular expressions.

[00:32:59] Benji Fisher: [00:33:00] In fact, if you look at the search API module, um, I just ran into a case where it's not only simpler and more reliable, it's also more performant to do processing and there's, uh, there's an open issue on the search API module that, uh, that, that handles that. So anyway, um, that was my original purpose for putting the, the DOM processing plugins in on that, that project with, with Pega.

[00:33:29] Benji Fisher: Um, since then, um, some other people have done work on that. There's a contrib module that builds on the DOM processing plugins, and it handles if you have the media module on your Drupal 7 module, on your Drupal 7 site rather, and you want to migrate that to the core media module in Drupal 10. Um, it sort of understands the, the tokens that the Drupal 7 media module used and, um, and handles [00:34:00] transforming your text fields.

[00:34:02] Benji Fisher: Um, I had a really complicated project where, um, we not only had to migrate the site, Um, from Drupal 7 to Drupal 8 at that point, I think it was. Um, we, we also had to import some really complicated XML documents into the site. And that project gave me a real appreciation for the expressive power of XPath, because that was the only way to manage these really complicated XML structures.

[00:34:43] Benji Fisher: And luckily we already had the DOM processing plugins available. Um, another complicated project I had was that we had these HTML text fields and, um, and each [00:35:00] text field had just some, some image tags. So basic HTML markup and we wanted to download the files from those, those image tags and save them as files and create media entities out of them and then just insert the media references. Into the text field. I did that with a custom PHP plugin. And I do want to point out that this is sort of thing where you can shoot yourself in the foot. It's not following the ETL paradigm. Um, you don't have a, a row creating each, each one of those media items. Um, and, and it does have certain disadvantages because it breaks the ETL paradigm, but it is a practical way to handle that sort of situation.

[00:35:54] Benji Fisher: Another weird one I had was... a, a single HTML [00:36:00] page, um, in the source site was Drupal 7 and you would think looking at this page, oh, this page is a view. It's, it's listing the person content type. But in fact, it was just a basic page and all the markup was just there in the body field. And we wanted to pick it apart and create person nodes and then create a view of the person nodes.

[00:36:24] Benji Fisher: Um, and so luckily, Um, the, the markup was consistent. It always started with an H3 tag. It had a title and that was immediately followed by an image tag. So there was that consistency that I could take advantage of. So, um, I extracted the title into a text field. I extracted the image, created a file media entity, and then, um, and then just stripped those from the body field. and [00:37:00] let an actual view in the Drupal, I think it was Drupal 9 at that point, um, let that put the pieces back together to make something like the original source site. And the, um, you know, last one that I noted down was again for Pega and we were importing documentation from an external XML based system. So this wasn't a site migration, this was a recurring migration that someone was writing the documentation in this external system, and we had to import it into the Drupal site and make it look like it fit the rest of the site.

[00:37:39] Benji Fisher: And. And that's where I did the sort of thing I talked about before, where you look for some consistent pattern of CSS classes and say, okay, we're going to replace that with something more semantic. And, um, and again, this, this used the DOM plugins. [00:38:00] And it also, uh, peeked at the current database, the, the destination database to see how the editor module was configured so that we could pick and choose the CSS classes that were that the current site editors would naturally be adding through the user interface.

[00:38:22] Benji Fisher: And we added those same CSS classes programmatically through the migration. So, so those are some of the more complicated cases I've had in the transform stage.

[00:38:36] Janez Urevc: Very nice. Um,

[00:38:39] Janez Urevc: I guess that Uh, this brings us to the end of today's episode, unless you have anything more to add.

[00:38:48] Benji Fisher: I'm good..

[00:38:52] Janez Urevc: Um, we have some great talks coming up. Um, our goal is to put one per week [00:39:00] over the next few months to support the community in the migration process. Uh, performance is something we care deeply about, Tag1, and we did touch performance in today's episode a little bit, um, because it applies to migrations.

[00:39:16] Janez Urevc: Um, When you're handling really large data sets, um, a full data migration can take 12 hours or even days. Um, we'll do a handful of talks on this topic, including how to profile and tune a migration, and, um, a talk about incremental migrations. Um, every project owner wants their migration to be a success.

[00:39:41] Janez Urevc: So we will dedicate an episode to discuss the most important factors for a successful Drupal 7 to Drupal 10 migration. Um, other topics includes porting custom code, um, the future of migrate tooling, how to port a theme, and so much more. [00:40:00] We hope that you'll tune in and enjoy our upcoming team talks.

[00:40:06] Janez Urevc: A huge thank you to the Tag1 team, Benji Fisher and Mike Ryan.

[00:40:11] Janez Urevc: Thank you for joining me. Um, make sure that you check out the other segments in this series. There will be links to them in the show notes, along with links to the modules and documentation and other things that we mentioned today. If you like this talk, please remember to upvote, subscribe and share it.

[00:40:33] Janez Urevc: Check out our past talks at Tag1.com/ttt. That's three T's for Tag1 Team Talks. As always, we'd love your feedback and any topic suggestions. You can always write to us at ttt@tag1.Com. Again, that's three T's for Tag1 Team Talks. Um, one more time, big thank you to our guests [00:41:00] and everybody who tuned in.

[00:41:02] Janez Urevc: Thanks for joining us. Thanks. Bye.

Drupal Migration Series

Transcript: Unraveling the ETL Data Migration Process - Understanding Transform