category - drupal :: Sacha Chua

Posted: Aug 4, 2011 - Modified: Feb 19, 2012| drupal, geek, work

Drupal 6’s drupal_json method encodes ampersands incorrectly for JQuery 1.5, causing the rather cryptic error:

Uncaught Syntax error, unrecognized expression: ...

(If you’re lucky.)

The way to fix this is to borrow the JSON-handling code from Drupal 7. Here’s something you might be able to use:

function yourmodule_json_encode($var) {
  return str_replace(array('<', '>', '&'), array('\u003c', '\u003e', '\u0026'), $var);
}

// Fix Drupal JSON problems from http://witti.ws/blog/2011/03/14/jquery-15-json-parse-error
function yourmodule_json($var) {
  drupal_set_header('Content-Type: text/javascript; charset=utf-8');
  if (isset($var)) {
    echo yourmodule_json_encode(json_encode($var));
  }
}

Use yourmodule_json instead of drupal_json wherever applicable.

Hat tip to Greg Payne (Witti) for pointing me in the right direction!

2011-08-04 Thu 14:01

You can view 4 comments or e-mail me at sacha@sachachua.com.

Hacking Drupal views and taxonomy: looking for 100% matching of terms

Jul 6, 2011| drupal, geek

I’m working on a Drupal 6 site that helps match volunteers to speaking opportunities, or sessions. I use Taxonomy to keep track of the qualifications so that I can maintain the qualification hierarchy. Given a list of subject areas that a person is interested in, I need to find all sessions that match any of those subject areas. The quirk: the session must have at least one of the person’s terms, and the person must also have all the session’s terms.

Let’s say that our volunteer is interested in speaking about biology and physics. I couldn’t use a straightforward AND search. If I searched for biology AND physics, I wouldn’t get sessions for just biology. It also means I can’t use a straightforward OR search, because I shouldn’t list sessions that require both biology AND another subject the person hadn’t listed, such as chemistry.

Views didn’t seem to have a built-in way to do it. I couldn’t think of a standard-ish way to describe my challenge in order to find relevant posts on drupal.org. Content recommendation modules seemed similar, but I wasn’t familiar with any of them enough to know which one would be the closest to hack for my cross-type comparisons and 100% match requirements. So it was time to hack my Views query.

After several attempts, I settled on the approach of precalculating how many terms were associated with each session node. I created a table with the information and used the following query to populate it in my install file.

db_query("INSERT INTO {node_term_count} 
  SELECT nid, vid, count(tid) AS term_count 
  FROM {term_node} GROUP BY nid, vid");

I also used hook_nodeapi to update the table on insert, update, and delete operations.

Then I started experimenting through the SQL console. I used COUNT and GROUP BY to find out how many terms the session had in common with the person. Selecting from that MySQL subquery let me filter the list to the nodes where the total number of terms equaled the number of terms the session had. I ended up with a query that looked like this:

SELECT nid, vid FROM (SELECT tns.nid, tns.vid, 
  COUNT(tns.tid) AS match_count, 
  c.term_count FROM term_node tns 
  INNER JOIN node_term_count c ON tns.vid=c.vid 
  WHERE tns.tid in (55, 56, 42, 39, 41) 
  GROUP BY tns.vid) AS result 
WHERE term_count = match_count;

When I was happy with the query, I used hook_views_pre_execute to change my $view->build_info['query'] and $view->build_info['count_query']. With all the other filters I needed, it eventually looked like this:

    $view->build_info['query'] = "SELECT * FROM (
SELECT tns.nid, tns.vid, count(tns.tid) AS match_count, c.term_count, workflow_node.sid FROM node n 
INNER JOIN term_node tns ON (n.vid=tns.vid AND n.nid=tns.nid)
LEFT JOIN workflow_node workflow_node ON n.nid = workflow_node.nid 
INNER JOIN node_term_count c ON tns.vid=c.vid
INNER JOIN content_type_session session ON (n.nid=session.nid AND n.vid=session.vid)
INNER JOIN node school_node ON (session.field_session_school_nid=school_node.nid)
INNER JOIN content_type_school school ON (school_node.nid=school.nid AND school_node.vid=school.vid)
INNER JOIN content_field_session_dates date ON (n.nid=date.nid AND n.vid=date.vid AND date.delta=0)
WHERE (n.type in ('%s')
AND workflow_node.sid=%d
AND session.field_session_request_mode_value = '%s'
AND (n.status <> %d) 
AND (DATE_FORMAT(ADDTIME(date.field_session_dates_value, SEC_TO_TIME(-14400)), '%Y-%m-%%d') >= '" . date('Y-m-d') . "')
AND school.field_school_district_nid IN ($district_where)
AND tns.tid in ($tid_where))
GROUP BY tns.vid
) as result WHERE term_count = match_count AND match_count > 0";

I used variables like $tid_where and $district_where to simplify the query. They use array_fill to create placeholders for the arguments.

Result: I think it works the way it’s supposed to. It passes my unit tests and manual testing, anyway. If performance becomes an issue, I might precalculate the results and store them in a table. I hope I don’t have to do that, though.

Views 3 is supposed to have arbitrary data stores that let you write views on top of any sort of query or function, but I’m going to stay with Views 2 for now.

Whenever I write about stuff we’re doing with Drupal, I often hear about even awesomer ways to do things. =) Is this one of those times? Is there a little-known module that Does the Right Thing?

You can view 5 comments or e-mail me at sacha@sachachua.com.

Getting a grip on a large database migration

Posted: Jul 2, 2011 - Modified: Jul 1, 2011| drupal, geek

Michael is working on migrating a custom website with hundreds of database tables to Drupal, and he wanted to know if I had any advice for keeping track of table mappings and other migration tasks.

I’ve worked on small migration projects before (including migrating my own blog from lots of Planner-mode text files to WordPress!), but no large projects like the ones Michael described. But if I needed to do something like that, here’s what I’d probably do. I’d love to hear your tips!

I’d list all the tables and start mapping them to entities. What content types would I need to create? What fields would I need to define? How are the content types related to each other? An entity relationship diagram can help you get an overview of what’s going on in the database.

Then I’d start untangling the entities to see which ones I can migrate first. If you have entities with node references, it makes sense to migrate the data referred to before migrating the data that refers to them. If I can get a slice of the database – not all the records, just enough to flesh out the different relationships – that would make testing the migrations faster and easier. I would probably write a custom Drupal module to do the migrations, because it’s much easier to programmatically create nodes than it is to insert all the right entries into all the right tables.

I’d commit the custom module to source code control frequently. I’d write some code to migrate an entity type or two, test the migration, and commit the source code. As I migrated more and more of the relationships, I’d probably check them off or colour them differently in the diagram, making note of anything I’d need to revisit (circular references, etc.).

I might break the custom module up into steps to make it easier to rerun or test. That way, I’m not reconstructing the entire database in one request, too.

I’d take notes on design decisions. When you migrate data, you’ll probably come across data that challenges your initial assumptions. This might require redesigning your entities and revising your earlier migration code. When I make design decisions, I often write about the options I’m considering and the reasons for or against them. This makes those decisions easier to revisit when new data might invalidate my assumptions, because I can see what may need to be changed.

How would you handle a migration project that’s too large to hold in your head?

You can view 4 comments or e-mail me at sacha@sachachua.com.

Context-switching and a four-project day

Jun 30, 2011| drupal, geek, rails, work

Context-switching among multiple projects can be tough. I’m currently:

working full-time on one project (a Drupal 6 non-profit website)
consulting on another (helping an educational institution with Drupal 7 questions)
supporting a third (Ruby on Rails site I built for a local nonprofit, almost done), and
trying to wrap up on a fourth (PHP/AJAX dashboard for a call center in the US).

I’m doing the Drupal 6 development in a virtual machine on my system, with an integration server set up externally. Consulting for the second project is done on-site or through e-mail. The Rails site is on a virtual server. The dashboard project is now on the company’s servers (IIS6/Microsoft SQL Server), which I can VPN into and use Remote Desktop to access. I’m glad I have two computers and a standing desk (read: kitchen counter) that makes it easy to use both!

Today was one of those days. I helped my new team member set up his system so that he could start working on our project. He’s on Mac OS X. It took us some time to figure out some of the quirky behaviour, such as MySQL sockets not being where PHP expected them to be. Still, we got his system sorted out, so now he can explore the code while I’m on vacation tomorrow.

In between answering his questions, I replied to the consulting client’s questions about Drupal and the virtual image we set up yesterday. That mainly required remembering what we did and how we set it up. Fortunately, that part was fairly recent, so it was easy to answer her questions.

Then I got an instant message from the person I worked with on the fourth project, the call-center dashboard. He asked me to join a conference call. They were having big problems: the dashboard wasn’t refreshing, so users couldn’t mark their calls as completed. It was a little nerve-wracking trying to identify and resolve the problem on the phone. There were two parts to the problem: IIS was unresponsive, and Microsoft SQL Server had stopped replicating. The team told me that there had been some kind of resource-related problem that morning, too, so the lack of system resources might’ve cascaded into this. After some hurried searching and educated guesses about where to nudge the servers, I got the database replication working again, and I set IIS back to using the shared application pool. I hope that did the trick. I can do a lot of things, but I’m not as familiar with Microsoft server administration as I am with the Linux/Apache/MySQL or Linux/Apache/PostgreSQL combinations.

I felt myself starting to stress out, so I deliberately slowed down while I was making the changes, and I took a short nap afterwards to reset myself. (Coding or administering systems while stressed is a great way to give yourself even more work and stress.)

After the nap, I was ready to take on the rest. The client for the Rails project e-mailed me a request to add a column of output to the report. I’d archived my project-related virtual machine already, so I (very carefully) coded it into the site in a not-completely-flexible manner. I found and fixed two bugs along the way, so it was a good thing I checked.

Context-switching between Drupal 6, Drupal 7, and Rails projects is weird. Even Drupal 6 and Drupal 7 differ significantly in terms of API, and Rails is a whole ‘nother kettle of fish. I often look things up, because it’s faster to do that than to rely on my assumptions and debug them when I’m wrong. Clients and team members watching me might think I don’t actually know anything by myself and I’m looking everything up as I go along. Depending on how scrambled my brain is, I’d probably suck in one of those trying-to-be-tough job interviews where you have to write working code without the Internet. But it is what it is, and this helps me build things quickly.

On the bright side, it’s pretty fun working with multiple paradigms. Rails uses one way of thinking, Drupal uses another, and so on. I’ve even mixed in Java before. There were a few weeks I was switching between enterprise Java, Drupal, Rails, and straight PHP. It’s not something I regularly do, but when the company needs it, well… it’s good exercise. Mental gymnastics. (And scheduling gymnastics, too.)

I like having one-project days. Two-project days are kinda okay too. Four-project days – particularly ones that involve solving a problem in an unfamiliar area while people are watching! – are tough, but apparently survivable as long as I remember to breathe. =)

Here are tips that help me deal with all that context-switching. Maybe they’ll help you!

Look things up. It’s okay. I find myself looking up even basic things all the time. For example, did you know that Ruby doesn’t have a straightforward min/max function the way PHP does? The canonical way to do it is to create an array (or other enumerable) and call the min or max member function, like this: [x,y].max. Dealing with little API/language quirks like that is part of the context-switching cost. Likewise, I sometimes find myself wishing I could just use something like rails console in my Drupal sites… =)

Take extensive notes. Even if you’re fully focused on one project and have no problems remembering it now, you might need to go back to something you thought you already finished.

Slow down and take breaks. Don’t let stress drag you into making bad decisions. I felt much more refreshed after a quick nap, and I’m glad I did that instead of trying to force my way through the afternoon. This is one of the benefits of working at home – it’s easy to nap in an ergonomic and non-embarrassing way, while still getting tons of stuff done the rest of the day.

Clear your brain and focus on the top priority. It’s hard to juggle multiple projects. I made sure my new team member had things to work on while I focused on the call center dashboard project so that I wouldn’t be tempted to switch back and forth. Likewise, I wrote the documentation I promised for that project before moving on to the Rails project.

Breathe. No sense in stressing out and getting overwhelmed. Make one good decision at a time. Work step by step, and you’ll find that you’ll get through everything you need to do. Avoid multi-tasking. Single-task and finish as much as you can of your top priority first.

I prefer having one main project, maybe two projects during the transition periods. This isn’t always possible. Programming competitions helped me learn how to deal with multiple chunks of work under time pressure, and I’m getting better at it the more that work throws at me.

What are your tips for dealing with simultaneous projects?

2011-06-30 Thu 16:19

You can view 2 comments or e-mail me at sacha@sachachua.com.

Drupal notes from helping a client improve her development environment

Jun 28, 2011| drupal, geek

Keeping a to-do list helps you keep sane. If you don’t have a full-scale issue tracker, use a wiki page, text file, or something like that. It’s really useful to be able to get the list of things you’re working on or waiting for out of your head and into a form you can review.

*Drupal Features help you export configuration into code.* This is much better than creating an installation profile because you can update your features with new settings and apply them to existing sites. Invaluable when working with multi-sites that may need to be updated. You may need to clear your Drupal cache before you see changes applied.

Version control is really handy even when you’re working on your own. The ability to go back in time to a working setup (code + database) can help you experiment more freely and avoid late nights spent recovering from mistakes.

*Drush (Drupal shell) is awesome.* It’s a big timesaver. We use it to download and enable modules (dl and en), clear the cache (cc), run database updates (updatedb), launch a SQL console (sqlc), execute PHP (php-eval), run tests, and so on. I use it a lot because I hate clicking around.

Even more powerful with a little bit of xargs magic so that it’s easy to run a drush command against all the sites. Like this:

cat sites.txt | xargs -n 1 -I {} drush -l {} somecommand

Design decisions: Multisite without shared tables; services or syndication for sharing content between sites; central authentication for admin users…

Bash script to create or clone multisites makes tedious things a little bit simpler. Tasks:

Create a database and give access to a user.
Create the site and files directory.
Create the settings.php with the database settings.
Copy the base database into the new database.
Create a symbolic link.

2011-06-28 Tue 08:55

You can view 6 comments or e-mail me at sacha@sachachua.com.

Managing configuration changes in Drupal

Jun 10, 2011| development, drupal, geek, work

One of our clients asked if we had any tips for documenting and managing Drupal configuration, modules, versions, settings, and so on. She wrote, “It’s getting difficult to keep track of what we’ve changed, when, for that reason, and what settings are in that need to be moved to production versus what settings are there for testing purposes.” Here’s what works for us.

Version control: A good distributed version control system is key. This allows you to save and log versions of your source code, merge changes from multiple developers, review differences, and roll back to a specified version. I use Git whenever I can because it allows much more flexibility in managing changes. I like the way it makes it easy to branch code, too, so I can start working on something experimental without interfering with the rest of the code.

Issue tracking: Use a structured issue-tracking or trouble-ticketing system to manage your to-dos. That way, you can see the status of different items, refer to specific issues in your version control log entries, and make sure that nothing gets forgotten. Better yet, set up an issue tracker that’s integrated with your version control system, so you can see the changes that are associated with an issue. I’ve started using Redmine, but there are plenty of options. Find one that works well with the way your team works.

Local development environments and an integration server: Developers should be able to experiment and test locally before they share their changes, and they shouldn’t have to deal with interference from other people’s changes. They should also be able to refer to a common integration server that will be used as the basis for production code.

I typically set up a local development environment using a Linux-based virtual machine so that I can isolate all the items for a specific project. When I’m happy with the changes I’ve made to my local environment, I convert them to code (see Features below) and commit the changes to the source code repository. Then I update the integration server with the new code and confrm that my changes work there. I periodically load other developers’ changes and a backup of the integration server database into my local environment, so that I’m sure I’m working with the latest copy.

Database backups: I use Backup and Migrate for automatic trimmed-down backups of the integration server database. These are regularly committed to the version control repository so that we can load the changes in our local development environment or go back to a specific point in time.

Turning configuration into code: You can use the Features module to convert most Drupal configuration changes into code that you can commit to your version control repository.

There are some quirks to watch out for:

Features aren’t automatically enabled, so you may want to have one overall feature that depends on any sub-features you create. If you are using Features to manage the configuration of a site and you don’t care about breaking Features into smaller reusable components, you might consider putting all of your changes into one big Feature.
Variables are under the somewhat unintuitively named category of Strongarm.
Features doesn’t handle deletion of fields well, so delete fields directly on the integration server.
Some changes are not exportable, such as nodequeue. Make those changes directly on the integration server.

You want your integration server to be at the default state for all features. On your local system, make the changes you want, then create or update features to encapsulate those changes. Commit the features to your version control repository. You can check if you’ve captured all the changes by reverting your database to the server copy and verifying your functionality (make a manual backup of your local database first!). When you’re happy with the changes, push the changes to the integration server.

Using Features with your local development environment should minimize the number of changes you need to directly make on the server.

Documenting specific versions or module sources: You can use Drush Make to document the specific versions or sources you use for your Drupal modules.

Testing: In development, there are few things as frustrating as finding you’ve broken something that was working before. Save yourself lots of time and hassle by investing in automated tests. You can use Simpletest to test Drupal sites, and you can also use external testing tools such as Selenium. Tests can help you quickly find and compare working and non-working versions of your code so that you can figure out what went wrong.

What are your practices and tips?

2011-06-09 Thu 12:25

You can view 3 comments or e-mail me at sacha@sachachua.com.

Drush, Simpletest, and continuous integration for Drupal using Jenkins (previously Hudson)

Posted: Jun 9, 2011 - Modified: Nov 15, 2011| drupal, geek

One of my development goals is to learn how to set up continuous integration so that I’ll always remember to run my automated tests. I picked up the inspiration to use Hudson from Stuart Robertson, with whom I had the pleasure of working on a Drupal project before he moved to BMO. He had set up continuous integration testing with Hudson and Selenium on another project he’d worked on, and they completed user acceptance testing without any defects. That’s pretty cool. =)

I’m a big fan of automated testing because I hate doing repetitive work. Automated tests also let me turn software development into a game, with clearly defined goalposts and a way to keep score. Automated tests can be a handy way of creating lots of data so that I can manually test a site set up the way I want it to be. I like doing test-driven development: write the test first, then write the code that passes it.

Testing was even better with Rails. I love the Cucumber testing framework because I could define high-level tests in English. The Drupal equivalent (Drucumber?) isn’t quite there yet. I could actually use Cucumber to test my Drupal site, but it would only be able to test the web interface, not the code, and I like to write unit tests in addition to integration tests. Still, some automated testing is better than no testing, and I’m comfortable creating Simpletest classes.

Jenkins (previously known as Hudson) is a continuous integration server that can build and test your application whenever you change the code. I set it up on my local development image by following Jenkins’ installation instructions. I enabled the Git plugin (Manage Jenkins – Manage Plugins – Available).

Then I set up a project with my local git repository. I started with a placeholder build step of Execute shell and pwd, just to see where I was. When I built the project, Hudson checked out my source code and ran the command. I then went into the Hudson workspace directory, configured my Drupal settings.php to use the database and URL I created for the integration site, and configured permissions and Apache with a name-based virtual host so that I could run web tests.

For build steps, I used Execute shell with the following settings:

mysql -u integration integration < sites/default/files/backup_migrate/scheduled/site-backup.mysql
/var/drush/drush test PopulateTestUsersTest
/var/drush/drush test PopulateTestSessionsTest
/var/drush/drush testre MyProjectName --error-on-fail

This loads the backup file created by Backup and Migrate, sets up my test content, and then uses my custom testre command.

Code below (c) 2011 Sacha Chua (sacha@sachachua.com), available under GNU General Public License v2.0 (yes, I should submit this as a patch, but there’s a bit of paperwork for direct contributions, and it’s easier to just get my manager’s OK to blog about something…)

// A Drush command callback.
function drush_simpletest_test_regular_expression($test_re='') {
  global $verbose, $color;
  $verbose = is_null(drush_get_option('detail')) ? FALSE : TRUE;
  $color = is_null(drush_get_option('color')) ? FALSE : TRUE;
  $error_on_fail = is_null(drush_get_option('error-on-fail')) ? FALSE : TRUE;
  if (!preg_match("/^\/.*\//", $test_re)) {
    $test_re = "/$test_re/";
  }
  // call this method rather than simpletest_test_get_all() in order to bypass internal cache
  $all_test_classes = simpletest_test_get_all_classes();

  // Check that the test class parameter has been set.
  if (empty($test_re)) {
    drush_print("\nAvailable test groups & classes");
    drush_print("-------------------------------");
    $current_group = '';
    foreach ($all_test_classes as $class => $details) {
      if (class_exists($class) && method_exists($class, 'getInfo')) {
        $info = call_user_func(array($class, 'getInfo'));
        if ($info['group'] != $current_group) {
          $current_group = $info['group'];
          drush_print('[' . $current_group . ']');
        }
        drush_print("\t" . $class . ' - ' . $info['name']);
      }
    }
    return;
  }

  // Find test classes that match
  foreach ($all_test_classes as $class => $details) {
    if (class_exists($class) && method_exists($class, 'getInfo')) {
      if (preg_match($test_re, $class)) {
        $info = call_user_func(array($class, 'getInfo'));
        $matching_classes[$class] = $info;
      }
    }
  }

  // Sort matching classes by weight
  uasort($matching_classes, '_simpletest_drush_compare_weight');

  foreach ($matching_classes as $class => $info) {
    $main_verbose = $verbose;
    $results[$class] = drush_simpletest_run_single_test($class, $error_on_fail);
    $verbose = $main_verbose;
  }

  $failures = $successes = 0;
  foreach ($results as $class => $status) {
    print $status . "\t" . $class . "\n";
    if ($status == 'fail') {
      $failures++;
    } else {
      $successes++;
    }
  }
  print "Failed: " . $failures . "/" . ($failures + $successes) . "\n";
  print "Succeeded: " . $successes . "/" . ($failures + $successes) . "\n";
  if ($failures > 0) {
    return 1;
  }
}

I didn’t bother hacking Simpletest output to match the Ant/JUnit output so that Jenkins could understand it better. I just wanted a pass/fail status, as I could always look at the results to find out which test failed.

What does it gain me over running the tests from the command-line? I like having the build history and being able to remember the last successful build.

I’m going to keep this as a local build server instead of setting up a remote continuous integration server on our public machine, because it involves installing quite a number of additional packages. Maybe the other developers might be inspired to set up something similar, though!

2011-06-09 Thu 09:51