Getting a grip on a large database migration

Michael is working on migrating a custom website with hundreds of database tables to Drupal, and he wanted to know if I had any advice for keeping track of table mappings and other migration tasks.

I’ve worked on small migration projects before (including migrating my own blog from lots of Planner-mode text files to WordPress!), but no large projects like the ones Michael described. But if I needed to do something like that, here’s what I’d probably do. I’d love to hear your tips!

I’d list all the tables and start mapping them to entities. What content types would I need to create? What fields would I need to define? How are the content types related to each other? An entity relationship diagram can help you get an overview of what’s going on in the database.

Then I’d start untangling the entities to see which ones I can migrate first. If you have entities with node references, it makes sense to migrate the data referred to before migrating the data that refers to them. If I can get a slice of the database – not all the records, just enough to flesh out the different relationships – that would make testing the migrations faster and easier. I would probably write a custom Drupal module to do the migrations, because it’s much easier to programmatically create nodes than it is to insert all the right entries into all the right tables.

I’d commit the custom module to source code control frequently. I’d write some code to migrate an entity type or two, test the migration, and commit the source code. As I migrated more and more of the relationships, I’d probably check them off or colour them differently in the diagram, making note of anything I’d need to revisit (circular references, etc.).

I might break the custom module up into steps to make it easier to rerun or test. That way, I’m not reconstructing the entire database in one request, too.

I’d take notes on design decisions. When you migrate data, you’ll probably come across data that challenges your initial assumptions. This might require redesigning your entities and revising your earlier migration code. When I make design decisions, I often write about the options I’m considering and the reasons for or against them. This makes those decisions easier to revisit when new data might invalidate my assumptions, because I can see what may need to be changed.

How would you handle a migration project that’s too large to hold in your head?

  • The manner is which you talk about entities and mapping them to the particular source data tables is spot on. That is exactly how I have approached my own sets of migrations. I try and see if there are any core migrations that don’t have dependencies that I could migrate before moving on to ones that have dependencies on the core entities and then to entities that are more complex (with node references and the like). When you have entities and relationships that are bi-directional, you have to think a little more on how that might be managed in an easy way. Also, while the module to handle the migration will be custom, I would look at any other projects that might help with the whole migration process.

    I hope I am not going off topic, but since we are specifically talking Drupal, the migrate module is already geared up to be able to handle various kinds of entities (all core entity types are supported with migrate extras coming in to try and fill the role for other entities such as media and rules and also be working and with non-core fields (and various other things)). It already has the ability to run a migration for X number of content (or X amount of time) with the ability to roll back on things. And something I have tried yet…but it has the ability to work with content that has references in both directions (so you can pull in the content for one bundle with their node references not yet being filled in, bring in the other bundle for import, and both will get the references mapped out correctly). As someone that has done migrations both via a custom module and via the migrate method, migrate is a far better way to go at it if it is feasible.

  • Hi,

    We have written a custom solution in perl for migrating data from D5->D6,D6->D7. It works with custom content types and core schema. It works directly at database schema level, proper mappings can be given in configuration so that it knows how to migrate custom types/fields into target system. The migration is extremely fast and very flexible and can be run from command line. Since we wanted something that we could re-use this was taken into account during design. I am sure we could adapt the solution to target sources other then Drupal CMS. Something else to consider during migration is data filtering/cleanup before it’s passed into the target system.


  • NicolasH

    All good points and I’d second Ashok Modi’s suggestion to use the migrate module. It’s got a fairly steep initial learning curve, but going through some of the handbook examples and especially some of the linked resources, it should only take a day or two of tinkering.

    From then on, it will make the whole process *a lot* saner. You can tell that the people who wrote it have done a lot of migrations and addressed all the common and not so common details and pitfalls.

    One hugely beneficial add-on you get with the migrate module is that it exposes all your mappings etc in the interface, where you can also assign notes etc. This makes it possible to involve non-technical people in the process right from the start without having to write tons of documentation – to an extend it’s self-documenting. You can also hook it to external ticketing systems…I have not used that functionality yet.

  • Ooh, that’s even awesomer. I love blogging these notes – I always discover better ways to do things from people like you! =)