August 8, 2008

Bulk view

Drupal 5: Migrating a production database to a QA server

Building on the configuration management strategy I described last time, I wrote some scripts to make it easier for other developers to migrate the production database to the QA server or to get a copy of the production database for their local system. I needed to consider the following factors:

  • Domain name changes: Because we use Domain Access to serve multiple subdomains using a single Drupal installation and shared sign-on, we needed to make sure that all instances of the domain root are replaced and the correct domain_root is set. For example, the site URLs might be:
    Production QA Local
    example.com qa.example.com local.example.com
    foo.example.com foo.qa.example.com foo.local.example.com
    bar.example.com bar.qa.example.com bar.local.example.com
  • Privacy concerns: The QA database and local copies of the database should not contain sensitive user information. All users aside from the administrator should be removed from the system, and all content on site should be limited to editorial content.
  • Efficiency: We don’t need data like access logs or watchdog logs in QA or for local testing. This saves quite a lot of time and space during the data migration process.

Here’s how I did it:

  1. I identified tables I needed to copy over and tables that I could just leave empty. I did this by going through the output of “SHOW TABLES” in a MySQL console.
  2. In my Makefile, I declared a DB_DATA variable that contained a list of tables I wanted to copy.
  3. I wrote a backup_partial target in my Makefile that issued a series of commands:
    mysqldump -u ${DB_USER} --password=${DB_PASSWORD} ${DB} --no-data > partial.dump  # export the schema
    mysqldump -u ${DB_USER} --password=${DB_PASSWORD} ${DB} --opt --complete-insert ${DB_DATA} >> partial.dump # Dump some of the data
    mysqldump -u ${DB_USER} --password=${DB_PASSWORD} ${DB} --opt --complete-insert --where='uid< =1' users users_roles >> partial.dump # Dump the admin and anonymous users
    echo "UPDATE node SET uid=1;" >> partial.dump # Set all the node authors to admin
    echo "REPLACE INTO search_index_queue (nid, module, timestamp)  select nid, type, unix_timestamp(now()) FROM node;" >> partial.dump # Prepare for reindexing
    
  4. I wrote a shell script on an internal server that accepted an argument (the domain to translate to) and performed the following steps:
    1. ssh into the production server and run make with my backup_partial target, compressing the resulting partial.dump
    2. scp the partial.dump.gz from the production server onto the internal server
    3. unpack partial.dump.gz
    4. figure out what $DOMAIN is supposed to be based on the arguments
    5. run perl -pi -e "s/example.com/$DOMAIN/" partial.dump
    6. load partial.dump into my database
    7. run cron.php if it can
  5. I added two buttons to my web-based deploy script: one button to migrate the production database to the QA server, one button to make a copy of the production database for the domain “local.example.com”. Both buttons call
  6. I created multisite settings.php in my Drupal directory (ex: sites/local.example.com and sites/qa.example.com). The production settings go in default/settings.php, and the multisite settings.php override it like this:
    $conf = array(
      'domain_root' => 'local.example.com',
    );
    $cookie_domain = '.' . $conf['domain_root'];
    

    $conf allows you to override Drupal variables returned by variable_get.

So now, I can click on a button to migrate a sanitized copy of the production database to the QA server or to my local system. Sweet!

Deploying to servers

I’m heading to the Philippines tomorrow, and to make life easer for the two other Windows-based PHP developers on my team, I updated the web-based deployment script I mentioned in
Development kaizen: deployment and testing. I added the ability to push a specified revision to the production server. It took me less time than I thought it would (I love it when things Just Work!), so I decided to spend time documenting it just in case I ever need to do it again (almost certainly) or just in case it breaks while I’m away (hope not).

Behind the scenes, there are a number of moving parts:

  • Key-based authentication. Because I need to copy files and run commands on the production and QA servers non-interactively, I needed to set up key-based authentication using SSH. I’m somewhat nervous about using a passphrase-less key, but I couldn’t see a way to work around this.
  • Rsync. I use rsync over ssh to transfer files to the remote system. It’s good at efficiently transferring changed files. I couldn’t use –delete to get rid of old files, though, as our source tree does not include the complete system.
  • A shell script with the suid bit. The shell script is responsible for exporting the requested revision to a temporary directory, rsyncing it over to the selected host, and running a few commands on the server in order to reset file permissions and clear the cache. The suid bit is there so that it takes my identity and uses the key that I set up. I resorted to suid because I couldn’t figure out how to make sure that Apache had its own key. I tried associating it with the user that Apache ran as, but I kept running into “no tty”-type errors. The suid workaround solved the problem quickly.
  • A PHP script that displays a form and the last 20 revisions. The form includes a drop-down box of the revisions displayed, a button for deploying to QA, and a button for deploying to the server. When submitted, the script does some error-checking, then uses system to call the relevant shell script. The script determines the list of revisions by using shell_exec to store the output of svn log … -limit 20 in a string, then using preg_match_all to match all instances of /r([0-9]+)/. Seems to work.