July 2, 2011

Getting a grip on a large database migration

July 2, 2011 - Categories: drupal, geek

Michael is working on migrating a custom website with hundreds of database tables to Drupal, and he wanted to know if I had any advice for keeping track of table mappings and other migration tasks.

I’ve worked on small migration projects before (including migrating my own blog from lots of Planner-mode text files to WordPress!), but no large projects like the ones Michael described. But if I needed to do something like that, here’s what I’d probably do. I’d love to hear your tips!

I’d list all the tables and start mapping them to entities. What content types would I need to create? What fields would I need to define? How are the content types related to each other? An entity relationship diagram can help you get an overview of what’s going on in the database.

Then I’d start untangling the entities to see which ones I can migrate first. If you have entities with node references, it makes sense to migrate the data referred to before migrating the data that refers to them. If I can get a slice of the database – not all the records, just enough to flesh out the different relationships – that would make testing the migrations faster and easier. I would probably write a custom Drupal module to do the migrations, because it’s much easier to programmatically create nodes than it is to insert all the right entries into all the right tables.

I’d commit the custom module to source code control frequently. I’d write some code to migrate an entity type or two, test the migration, and commit the source code. As I migrated more and more of the relationships, I’d probably check them off or colour them differently in the diagram, making note of anything I’d need to revisit (circular references, etc.).

I might break the custom module up into steps to make it easier to rerun or test. That way, I’m not reconstructing the entire database in one request, too.

I’d take notes on design decisions. When you migrate data, you’ll probably come across data that challenges your initial assumptions. This might require redesigning your entities and revising your earlier migration code. When I make design decisions, I often write about the options I’m considering and the reasons for or against them. This makes those decisions easier to revisit when new data might invalidate my assumptions, because I can see what may need to be changed.

How would you handle a migration project that’s too large to hold in your head?