Series Overview & ToC | Previous Article | Next Article (coming May 22nd)


By default, the Drupal 7 to 10 upgrade path preserves entity IDs. In the previous article, we explained that this would cause problems if content or configuration already exists in the destination Drupal 10 site. Let’s explore this further and evaluate ways to work around the issue.

For brevity, we are going to provide examples of content entities, but what we cover can be applied to configuration entities as well. We’ll present three common scenarios that can produce entity ID conflicts and provide proven solutions.

Scenario 1: The Drupal 10 site had content before the migration process started

Let’s say that in the Drupal 10 site, node 1 was created as an article by user ID 2, tagged with taxonomy term ID 3, and had an image attachment with file ID 5. Now consider that in the Drupal 7 site, node 1 was created as an event by user ID 8, tagged with taxonomy term ID 13, and had an image attachment with file ID 21. Node 1 in Drupal 10 will be overwritten with data from Drupal 7: the node type, title, publication status, author, creation timestamp, any field that also exists in Drupal 7, and more.

Explicit and implicit relationships would also be overwritten. Explicit relationships are entity reference fields to nodes, users, taxonomy terms, files, media entities, paragraphs, groups, commerce products, etc. Implicit relationships are those established by base field definitions, including the content type and the user who created the node. The next article will further delve into explicit and implicit relationships.

For now, consider how disruptive the overwrite can be. Changing the content type would break field data attached to the node. The overwritten values for entity references might point to non-existent entities in Drupal 10 and, even if they existed, the data could be invalid. Imagine a node reference field that allows referencing basic pages, but the ID that is retrieved from Drupal 7 points to a node of type event.

Scenario 2: Content is created in Drupal 10 in between incremental migrations

Migrating from Drupal 7 to 10 is an iterative process and, depending on the complexity and scale of the project, it can span multiple months. It is common to break up the migration into multiple stages and let content editors interact with the Drupal 10 site along the way.

Whether for testing the migration or trying out features being developed in the new site, content eventually gets created in Drupal 10. Concurrently, the Drupal 7 site continues functioning and new content is added. The next time an incremental migration is executed, the new content from Drupal 7 will override content created in Drupal 10.

Scenario 3: Content model updates require changes to entity types

Now let’s look at another scenario where in Drupal 7 there was a speaker content type to collect information about those presenting at a conference. As part of the migration to Drupal 10, the speaker nodes want to be migrated as user entities. In Drupal, each entity type has its own auto-incremental counter. There might be users with ID matching the node IDs of the speaker nodes. Therefore, it is not possible to match the node ID to user ID as part of the entity type conversion.

There are other scenarios that can lead to potential ID conflicts. You can read more about them on the documentation page for known issues when upgrading from Drupal 6 or 7. For reference, some are:

  • The source site might have data that did not specify an entity ID, but one is required in Drupal 10.
  • A module, theme, or profile in Drupal 10 generates content upon installation.
  • Translations added in Drupal 10 might be overwritten as part of an incremental migration.

We’ve covered three common entity ID conflict scenarios that can occur in Drupal 7 to Drupal 10 data migrations. Now, let’s discuss the two primary ways to work around potential ID conflicts.

Contact Our Solutions Experts
Helping you navigate the next steps on your Drupal Migration Journey

Solution 1: Customize the migrations to not preserve entity IDs

While it is possible to preserve the IDs for some entity/bundle combinations and not others, it is better to take an all-or-nothing approach. That way, you treat all migrations the same and can apply the same techniques for handling entity relationships. Most of the time, this will leverage the migration_lookup process plugin to establish the relationships among entities.

Here are three key considerations for this option:

1. Public-facing URLs might depend on entity IDs.

While it is common to have URL aliases for most content, I have seen sites that include the entity ID as part of the path or in query parameters. In cases like this, redirects need to be added so the Drupal 7 URLs redirect to the URL with the updated ID in Drupal 10.

The URL alias migration that ships with Drupal 7 already accounts for changes in entity IDs. If using a custom migration for URL aliases, you would have to map the old entity IDs to the new ones yourself. The migration_lookup process plugin can help with this.

Also consider the impact that changing URLs might have on SEO and inbound traffic from external sites pointing to Drupal 7 URLs. If changing URLs is necessary, it is essential to have a marketing plan in place to expect and address dips to organic traffic post-launch.

2. External services might depend on entity IDs.

The Drupal 7 site might expose an API for external services to consume. If the API exposes entity IDs, those services might rely on them to properly function. In cases like this, I have seen a new field added to Drupal 10 to store the legacy Drupal 7 entity ID. This new field is exposed in Drupal 10’s API.

3. Content stored in rich text fields might depend on entity IDs.

Depending on the Drupal 7 configuration, some modules might embed entity IDs when storing references to files, images, or other types of media. Links to content on the same Drupal 7 site might also use the internal entity ID. Special handling is required to update the entity IDs as part of rich text fields.

For updates to media references, the Media Migration and Migrate Media Handler modules can assist. For updates to links in rich text fields, creating custom process plugins to update the references is generally the best approach.

While it is possible to write some complex process pipelines to accomplish the task, it is better for the migration to remain easy to read and maintain. If you still want to follow this approach, make sure to use DOM proposing instead of regular expressions to update HTML markup in rich text fields.

Solution 2: artificially inflate the auto-increment value for all content entities in Drupal 10

Let’s say that the current Drupal 7 site has 15,000 nodes. One could manipulate the counter for nodes in Drupal 10 to start at 100,000. Then, new content created directly in Drupal 10 will be assigned IDs high enough that they would not collide with the IDs of content imported from Drupal 7. This change would have to be applied to all content entities that can be created from migrated Drupal 7 data. We will demonstrate how to do this in a future article.

For now, remember to review these four considerations:

1. Don’t forget revisionable entities

If the entity is revisionable, you need to manipulate the auto-increment value for the tables that store the revision data.

2. Different entity types may require different values

Different entity types will likely require a different value for their auto-increment. That is, the number used for nodes will not be the same as that of user, files, taxonomy terms, paragraphs, etc.

3. Content entities created as part of the migration

Account for all content entities that might be created as part of the data migration. You might not have a dedicated migration for URL aliases, but those might still be imported as part of the node migrations. If the Pathauto module is configured in Drupal 10, URL alias entities might be created even if you did not explicitly ask for it. These entities should also be considered when determining where the auto-increment value should be manipulated.

4. When reusing old entity IDs is not possible

When changing entity types, it might not be possible to reuse the old entity ID. As mentioned in Scenario 3 above, if changing nodes to users, the node ID value in Drupal 7 might already exist as a user ID value in Drupal 10. Another example might be files being migrated as media entities. If the site allows remote videos as media entities, the file ID value in Drupal 7 might already exist as a media ID value in Drupal 10.

Conclusion

We’ve discussed three common conflict scenarios in Drupal 7 to Drupal 10 migrations and two solution pathways to avoid them. It is important to note that, in my experience, entity ID conflicts in a data migration project are likely to occur. This can happen during the testing phase or after an initial migration is completed and incremental runs are needed. It is not a matter of if they will happen, but of when they will happen.

Therefore, it is best to plan for them from the beginning and define a strategy to work around them. Otherwise, it can be quite costly, if possible at all, to resolve such ID conflicts among multiple databases. As part of the upcoming example project in this series, we will demonstrate how such conflicts can be avoided.


Series Overview & ToC | Previous Article | Next Article (coming May 22nd)


Image by Tom Hill from Pixabay