Series Overview & ToC | Previous Article | Next Article

In the previous article, we saw what a migration file looks like. We made some changes without going too deep into explaining the syntax or structure of the file. Today, we are exploring the language in which migration files are written and the different sections it contains.

YAML Syntax

Before looking at the structure of a migration file, let’s briefly discuss the language used for writing them. YAML stands for YAML Ain't Markup Language. It is a human-friendly data serialization language. Here is a basic example:

# This is a comment
a_number: 1 # Comments can also appear at the end on lines
another_number: 3.14159
a_string: 'Hello world'
an_array: [1, 2, 3]
  - 1
  - 2
  - 3
another_array: [1, two, 3.14159]
  one: 1
  two: 2
  three: 3
  numbers: [1, 2, 3]
    vowels: [a, e, i, o, u]
    consonants: [b, c, d, f, g, h, j, k, l, m, n, o, p, q, r, s, t, v, w, x, y, z]

Now we will break down its syntax. Anything after a pound sign (#) is considered a comment. YAML uses key-value pairs separated by a colon (:) with optional nesting of elements. While not a requirement by the language, Drupal uses lowercase letters and an underscore (_) to separate multiple words in a key. The value can be a scalar (like a number or string) or an array (with optional key names and/or nested elements). Drupal quotes string values when they contain more than one word. Array elements can be added in a single line when enclosed by square brackets ([]) and separated by commas (,). Alternatively, array elements can appear in their own line when indented and prefixed by a dash (-). Array elements can be of different types and they can also be nested.

YAML is sensitive to spaces and indentation. At least one space character is required to appear after the colon that separates the key and the value. In terms of indentation, the language does not impose how many space characters to use. To match Drupal's coding standards, we use two spaces for each indentation level.

Drupal uses the .yml extension for all files written in YAML syntax, but other projects might use a different one. For example, DDEV uses .yaml as the file extension. When writing Drupal migrations, make sure that your file name ends with .yml.

Technical note: Drupal uses the Symfony Yaml Component to load and dump YAML files. The YAML language supports more features than the ones presented above. While some features like multiline exports are supported by Drupal, others like parsing PHP constants and enums are not available at the moment. To keep things simple, we will focus on what Drupal is most relevant for writing migrations.

Structure of a migration file

Now, let's have a look at the structure of a migration file. We are going to analyze the modified taxonomy vocabulary migration from the previous article.

id: upgrade_d7_taxonomy_vocabulary
class: Drupal\migrate\Plugin\Migration
  - 'Drupal 7'
  - Configuration
  - taxonomy_vocabulary
  - tag1_configuration
label: 'Taxonomy vocabularies'
  key: migrate
  plugin: d7_taxonomy_vocabulary
      plugin: make_unique_entity_field
      source: machine_name
      entity_type: taxonomy_vocabulary
      field: vid
      length: 30
      migrated: true
      plugin: get
      source: name
      plugin: get
      source: name
      plugin: get
      source: description
      plugin: get
      source: weight
  plugin: 'entity:taxonomy_vocabulary'
  required: {  }
  optional: {  }

The three major parts of the migration are source, process, and destination. They correspond to the extract, transform, and load steps in ETL data migrations. In addition to these three, only the id and label keys are required. Other keys provide extra information to the system like dependencies among migrations. The keys presented above are the most frequently used. When working through the examples, we will highlight other keys as necessary.

Now we will review each key-value pair in the file.

The id key expects a string to be used as the internal identifier of the migration. Drupal and the Migrate API use this value to execute and keep track of the status of migration. It is customary for the id to match the migration file name, but without the .yml extension. The value should be alphanumeric characters, optionally using underscores to separate words.

As for the label key, it is a human-readable string used to name the migration in various interfaces.

The source key contains an associative array with the definition of the source plugin. The only required sub-key is plugin which indicates which source plugin to use. There are many plugins and each can contain different configuration options. Refer to this article for more information on source plugins. Our series focuses on migrating from Drupal 7 to Drupal 10, and this requires connecting to the Drupal 7 database. Internally, this uses the SqlBase abstract class which allows specifying a key to determine which database connection to use. This was explained in article 9. Some source plugins allow filtering of the data to migrate. For example, the d7_taxonomy_term used to migrate taxonomy term content entities uses the bundle key to limit which vocabularies to migrate terms from. Similarly, the d7_node plugin uses the node_type key to limit which content types to migrate nodes from.

The process key contains an associative array with the process pipeline the migration implements. This determines how source data will be processed and transformed to match the expected destination structure. That is a big enough topic to require a dedicated article, and so we will talk about this in the next article in our migration series.

The destination key contains an associative array with the definition of the destination plugin. The only required sub-key is plugin, which indicates which destination plugin to use. Similar to source plugins, there are many destination plugins and each can contain different configuration options. Refer to this article for more information on destination plugins. When migrating from Drupal 7 to Drupal 10, most plugins deal with creating entities. In general, the plugin id follows the pattern entity:[ENTITY_TYPE] where [ENTITY_TYPE] is the machine name of the entity being created. Some examples are entity:node_type to create content type configuration entities and entity:node to create node content entities. The Entity abstract class used for entity migrations allows setting the default_bundle to indicate which entity bundle this migration is responsible for creating. For content entities in particular, it is also possible to use the validate key to indicate if we want to perform entity validation on the data that is being migrated.

The class key determines which PHP migration plugin class is used to execute the migration. A migration plugin acts like a container for the information about a single migration such as the source, process and destination plugins. Drupal\migrate\Plugin\Migration is the default for this key and could be left out if the default were to be used. A few examples exist in Drupal core in which this class is extended for extra customizations. Those deal with field migrations, node translations, and user profile data. When working with generated migrations, this class key is one to be aware of but will rarely change.

The migration_tags key contains an array of tags for the current migration. Tags can have an arbitrary value and in most cases they do not have special logic attached to them. That being said, when executing migrations, you can read the tags present and perform actions based on that. For instance, if you have a project specific tag, you can filter out any migration that does not include it to keep the list of migrations short in commands like drush migrate:status. Also, using the --tag flag in drush migrate:import and drush migrate:rollback you can execute those operations on multiple migrations using a single command. As explained in the previous article, our example project adds two tags to each migration. One tag indicates what we are creating and the other whether we are creating content or configuration. In the snippet above, they are taxonomy_vocabulary and tag1_configuration respectively.

The migration_dependencies key contains an array with two keys (required and optional) listing the migrations that this migration depends on. The required migrations must be run first and completed successfully. The optional migrations will be executed if they are present. Note that if a dependency is added, all the records for that migration must be processed. Let's say we are doing an incremental migration. Our migration for the event content type declares a dependency on the migration for the venue content type. If new venue nodes were added in the source site since the last time we executed a full migration, those venue nodes need to be imported prior to importing the new event nodes. Otherwise, you will get when trying to execute the event migration indicating that the dependencies have not been met. As part of the examples, we will be updating the dependencies of many migrations. In some cases to remove duplicate entries. In others, to reflect content model changes.

All the keys covered so far are supported by Drupal core. As alluded in the previous article, more are available when using migration configuration entities with the Migrate Plus module. Refer to this article for more information on other root-level keys used in migration files.

Now that we have unpacked the YAML file structure and syntax, we are ready to tackle the next piece in the migration puzzle. In our upcoming article, we will dive into the migration process pipeline—where all these elements start to come together.

Contact Our Solutions Experts
Helping you navigate the next steps on your Drupal Migration Journey

Image by Adina Voicu from Pixabay