Drupal Migration - Onion Skin, or Per-node?

A few words about the Migrate module. Migrate module is a powerful tool that allows to move content from other CMS into Drupal. Migrate module allows you to map sources to Drupal destinations, and import them in a batch script. Currently, there is support for pages (nodes), categories and tags (taxonomy), media (images, audios and videos), addresses and geodata, and field collections. Some pieces of content work out of the box, some require additional modules (like migrate_extras), and some require patches (field collections). Migrate module is the most powerful of existing tools to handle complex sites and supports multiple source formats, most common being database and xml.

Now to the point. There are two basic ways to import complex data. One is an "onion-skin" migration, and another is a per-node type.

The difference between approaches comes from logic and convenience of migration process, but also from architecture and limitation for the migration module itself. Though quite powerful, Migration module comes from and idea of a 1:1 transfer from a mapped source item to a destination item, be it a node, file, or user. If you want to have a node with field collections, that have files - then you are stretching the the whole process already a bit too thin.

Onion-skin approach

How do you handle migration of complex content? Onion-skin approach is the default approach for Migrate. You do your migration bottom to top. First, you migrate the files, audios, videos, and images, using corresponding handlers, or writing ones if needed. Once the files have been migrated, you now migrate field collections, importing files into them, or using those files as a 'source migration', if there is an obvious relationship between them. Then the third step, you now import nodes, specifying the field collections as a 'source migration'.

With onion-skin approach, if you succeed, you end up having a set of dependent migrations.

Some benefits of this approach:

  1. One benefit of it is that you have the history and relationship or these migrations stored in your database. You can refer to migration tables and trace, which image went into which field collection, and which field collection went into which node. This is clearly a benefit, if you later want to do post-migration fixes or changes.
  2. Another benefit of this approach is that it lets you better utilize the potential of existing field handlers as you seek to import media files.
  3. If all field handlers work, everything goes smooth, and there are no complications, then this approach can be much faster and time-efficient.

However, there are some shortcomings of an onion-skin approach, that make it less preferable than the other:

  1. For complex sites, interdependent migrations become a dangerous knot of dependencies, that make it harder and harder to work with them.
  2. Complex migrations become very fragmented, you end up having up to a dozen of migrations per a single node, if you import every image, geofield, or video field as a migration. With larger and more complex setups, migrations and their interrelations become unmanageable very fast.
  3. If your data is not very standard - if you are having multivalue media fields, or complex field collections, or even worse - multivalue field collections, then you are stretching existing functionality thin very fast, and end up writing lots of code anyway.
  4. If you later find yourself in a not-too-uncommon position, where you must fix or reformat some migrated data (say, images got imported without some metadata due to client oversight, your oversight, third party developer oversight resulting in missing source export data - everything happens), then you find yourself in a position, where you can easier re-import than remediate. Why? Because all your functionality is scattered in handlers, and you have to either abstract it out, or write it yourself.
  5. And last but not least, because migrations are intertwined, you end up remigrating a lot and lot of related data to fix some obscure error. And then, you end up untangling the onion, crying.

Node-based approach

Node-based approach means migrating nodes. You single out pieces of content that will be nodes, and you map those fields that can be mapped safely. Then, you use the complete() Migrate module hook to import files, field collections, and all the rest. The main load of node data is processed in custom code, that handles field collections, files, media, and locations.

You can ease the process by taking the functionality from the existing migration field handlers and putting it in a base migration class of yours, adjusting it as you need, and making the node migrations utilize it's functionality as needed.

Some benefits of this approach are:

  1. You have a clear migration scheme, which gives you nodes on the output. The rest of data is migrated directly with code, without a need to migrate it separately.
  2. You have a better control of media import progress, with more flexible import data processing.
  3. Having processing code abstracted allows you to easier remediate the possible overlooks without re-migration of content.
  4. You can add multivalue fields or field collections as you like.
  5. You can handle complex sites without stretching anything else other than your skill.

Here are also some shortcomings of the node-based approach to migration:

  1. You end up having and managing your own reference table, where you reference the nodes and field collections, and, possible, also files, with their source content id's in case remediation script will later be written.
  2. It will take more time than onion-skin migration, though this may be mostly covered by abstracting code form migration handlers. Of course, for some complex sites, writing your own code can be faster, than trying to stretch the onion-skin style.
  3. You don't contribute to community as much. Having your custom code abstracted prevents you from applying patches upstream for those handlers you would have otherwise updated or extended.


So, these are the two approaches. Which one should you choose? Depending on site architecture and context - it's your call. My general experience is this - if you doubt that onions-skin functionality can be safely stretched, or if the site is complex, or if you have reasons to believe, that there may be a need for future remediations, then you may want to take the node-based approach.

Also, remember, that you don't have to follow either one approach to the utmost. You can use one approach to one node type, and another approach to another node type, as your developer's experience prompts.