Categories: geek » drupal

RSS - Atom - Subscribe via email

Drupal in the trenches: MySQL and DELETE joins; multiple tests

Posted: - Modified: | drupal

Today was the first time I used a DELETE statement that joined multiple tables. I didn’t even know you could do that. =)

I needed to look into this because my update function was just taking so much time and memory. We have a little over 3,000 nodes, most of which are usernodes automatically created by the usernode module. I needed to rebuild the node access entries for a few hundred content nodes to take into account the new partner access module I wrote, but node_access_rebuild took quite a while.

The first approach I thought of taking was to delete usernode because we didn’t seem to need it any more. In order to do that, though, I would still have to delete all the user nodes, though, and invoking all the hooks for each of the 2000+ user nodes took a while as well.

The second approach was to delete just the node access records related to the nodes I needed to update. I found a way to do it even faster – instead of querying the node IDs and then deleting individual rows, I could do something like this instead:

db_query("DELETE FROM na USING {node} n
          INNER JOIN {node_access} na ON (n.nid=na.nid) WHERE n.type='%s'", $type);
$rs = db_query("SELECT nid FROM {node} WHERE type='%s'", $type);
while ($row = db_fetch_array($rs)) {
  $node = node_load($row['nid'], NULL, TRUE);
  if (!empty($node)) {
    node_access_acquire_grants($node);
  }
}

(UPDATE: Fixed MySQL statement)

That was interesting. =)

I was also running into memory problems trying to run all the unit tests in one go. I broke it into individual tests with:

drush test list | grep KEYWORD | xargs -n 1 drush test run | tee out

where KEYWORD is the keyword I use to filter the list for just the tests I wanted to run, and out is the file that stores a copy of the results. It’s not neatly summarized, but oh well. =) It works.

Drupal from the trenches: This is my game

Posted: - Modified: | drupal, geek, goodkarma

I’ve been coaching a senior architect on a Drupal site he’s developing on a tight schedule. With a little bit of help, he was able to build all the functionality needed and keep up with constantly changing requirements. Now it was time to theme the site. As I was walking through how to modify the Zen theme to use the HTML, CSS, and images that he received from the designer, flipping between Vim editors in two Putty sessions connected to the web server, I saw his eyes start to glaze over. Hmm. He was definitely interested in learning how to do it, but I knew he’d enjoy learning it more if he had most of the framework already in place.

I offered to get things started. The senior architect asked me how much time I thought it would take. “Two hours,” I said, which was the first number that came to mind.

After lunch, I headed to the senior architect’s desk with my laptop and wireless mouse. I thought about asking him to change his password to something I could easily type, just in case I needed to start multiple sessions. Then I realized a much better way to do it would be to use my Emacs environment, which is already set up for doing really cool things with Drupal. So I switched my keyboard layout to QWERTY, used ssh-copy-id to copy my authentication ID to the server, and then opened the directory in Emacs using the location /ssh:user@host:/usr/share/drupal6.

Emacs worked like a charm. I edited files on the server as easily as those on my own computer, with all the syntax highlighting and keyboard shortcuts I’d gotten used to. I split windows, moved windows around, copied and pasted regions, and even did a little autocompleting.

I think I made the senior architect’s jaw drop.

I finished almost all the basic theming (minus a few quirky CSS things) in one hour and fifty minutes, ten minutes less than my thumb-in-the-air estimate. The senior architect said it would’ve probably taken him 16 hours over the weekend.

While we were chatting about the changes he’d need to make and the other things he could learn, the senior architect asked me if I played any games. I told him that I play one computer game–Nethack (an old text-based roleplaying game)–and I only play it in airports. I pointed to my laptop and said, “This is my game.” Programming has its own major challenges and minor opponents, it has progress, it has points, it has that adrenaline rush of trial and triumph. Programming is my game. Life is my game.

And it’s tons of fun. =)

Drupal in the trenches: AJAX history makes my brain hurt

| drupal

Many websites use asynchronous Javascript and XML (AJAX) to provide all sorts of whizbang improvements, such as smooth interaction without page reloads. It took me a week to figure out how to do all the effects specified in our information architecture document: callouts, modal dialogs, in-page calendar navigation, and so on. I was pretty happy with what I did, considering it was my first serious work with JQuery.

Then my project manager said, “If I go to the event details page from the month view and hit back, it takes me to the day view instead.”

I said, “Welcome to the wonderful world of AJAX. This might be tough to fix.”

Making the back button work in AJAX applications requires a lot of effort. It doesn’t look like people have a nice and clean solution for it yet, although there are a number of libraries that try to address the situation.

Following the principle of progressive enhancement, I had built all the plain HTML functionality first, then layered the Javascript on top of it using jQuery callbacks. In addition to ensuring that the site still works even if Javascript is disabled, this approach also helps make sure that I have proper URLs for almost all the webpages involved. (I didn’t bother with explicitly transient pages like the year navigator or the day pop-up.)

I started with this Hijax-based approach, because it had the most documentation. I had problems getting it to behave, though, because my AJAX pages have other AJAX links that fail with the history-remote plugin. The history_remote plugin works by replacing all the links with the current page and a code (#remote-1, for example). When the back button is pressed, the library looks for the appropriate link and triggers the click event. This breaks down when the link isn’t actually on the first page. For example, when a user switches from a week to a month view, then goes to the next month, the plugin can’t find the link to the next month on the week view’s page, which is where the user started.

What I really needed to do is encode more information in the URL. If I encode information in the anchor portion of the URL (#…), I can use that to request the page and put that into the appropriate div. For example, if I pass in #transitions_content?new_path=connect/calendar/2009/05 , I might be able to parse that and put the content of new_path into the transitions_content div.

I started going down that rabbit-hole, and then I got myself thoroughly confused, so I decided that the best way would be to just rip out the major AJAX navigation and go back to the simple stuff (which fortunately still works).

Gah. Brain hurts!

Does anyone have a clean way to do this?

Drupal: Timezones and places

| drupal, geek

The Drupal date_timezone module (part of Date) lets you use city names instead of timezone offsets in order to select a timezone, and that picks up Daylight Savings Time rules in a reasonably good manner, too.

A long list of cities can be hard to work with, though. This list is equally long, but it’s organized by GMT offset, which people are also likely to know.

Index: date_api.module
===================================================================
--- date_api.module	(revision 2404)
+++ date_api.module	(working copy)
@@ -490,6 +490,14 @@
           }
         }
       }
+
+      // Now reformat the zonenames so that they're of the form (GMT+0800) Asia/Manila
+      foreach ($zonenames as $name => $zone) {
+        $x = date_make_date('now', $name);
+        $list[$name] = '(GMT' . date_format($x, 'P') . ') ' . str_replace(' ', '_', $zone);
+      }
+      asort($list);
+      $zonenames = $list;
       if (!empty($zonenames)) {
         cache_set('date_timezone_identifiers_list', 'cache', serialize($zonenames));
       }

There’s probably a much more efficient way to do this, but hey, it works.

Drupal from the trenches: Database glitches

| drupal, geek

The problem: Our Drupal site stopped displaying nodes on our fuzzysearch-based pages.

I started breaking down the SQL query to find out which joined table didn’t have the data that was supposed to be there. At first, I thought it might be that the publication dates or expiration dates were incorrect, but that wasn’t it. All of the nodes were set to ‘published’ (status = 1), and all remained correctly associated with the domain through the Domain module.

I checked the search_fuzzy_index table. Hmm. The table was smaller than it should be. When I queried it for a node that should’ve been there, I got zero rows.

I tried forcing the system to reindex the node. Still nothing.

I printed out the SQL statements and ran them myself. ERROR 1062 (23000): Duplicate entry '2147483647' for key 1.

Wha?

That looked like a suspiciously large number. I popped it into Google and found a lot of people with the same problem. Turns out that the autoincrement sequence fuzzysearch was using had jumped all the way to the end, for some reason or another (we couldn’t have been reindexing _that_ much!). I changed the column to a BIGINT, reindexed all the nodes using Drush commands like:

drush eval "\$rs=db_query(\"select nid from node where type='transitions_job'\");while(\$nid=db_fetch_array(\$rs)){fuzzysearch_reindex(\$nid['nid'], 'fuzzysearch');} fuzzysearch_cron();"

(using my custom eval command),

That worked. I reported the results to my team members in our group chat.

The IT architect said we were still having disk space issues. We’d removed all the unnecessary files and stale backups already, and I didn’t think there was much more that we could trim. I used du --max-depth=2 /var | sort -n to look at the disk usage in the /var tree, which was where we were having those problems. The three biggest directories were /var/www, /var/lib/mysql, and /var/spool/mail, and we’d already scrubbed as much as we could.

I used ls -lSr to look at the contents of /var/lib/mysql, and noticed that our watchdog table was 2.3 GBs. Gah! Turned out that Drupal had logged each of the PHP errors raised when we were trying to fix the previous problem. We didn’t want to delete all the logs, so we just deleted the logs of type ‘php’. After the IT architect ran DELETE FROM watchdog WHERE type='php'" in the MySQL client, though, we still didn't have free space.

I guessed that the lack of free space could be solved by compacting the MySQL database, which I did with OPTIMIZE TABLE watchdog. That solved it!

That was an interesting day. =)

Drupal: I’m learning how to be a JQuery/Date+Calendar ninja! =)

| drupal

Our customized Date+Calendar-based Drupal event calendar is coming along quite nicely. The information architect’s design called for extensive customizations, such as:

  • hiding the year view
  • creating a context-sensitive year navigator that displays the entire year, and linking that to the title of the image
  • adding AJAX effects
  • adding a pop-up callout with three of the day’s events
  • displaying times in the user’s timezone generally, and in the user’s timezone and the event’s timezone on the actual event page
  • including dates for the previous month in the month view
  • allowing people to subscribe for notifications for new events in their interest group
  • allowing people to get iCal feeds for their events, all the events, or a group’s events, and these feeds should work without login
  • allowing people to sign up for e-mail reminders

We went with Date+Calendar instead of Event because Date+Calendar seemed more up to date, and its integration with Views meant that it was easy to add in domain access and other constraints. I learned quite a lot of new things in the process of implementing these features, such as:

  • writing a lot of Javascript using jQuery in order to bind events and do AJAX calls
  • writing test cases to check event subscription, event notification, timezone handling, and so on
  • programmatically creating a CCK node type with a date field
  • overriding calendar.inc to modify the way Drupal prepares the calendar
  • overriding calendar.theme to modify the way Drupal presents the calendar
  • making my own set of functions to generate the year navigator, based on the year view in calendar.inc

It took me a bit of time to figure out how this Date+Calendar AJAX patch worked, and I ended up modifying it extensively. I had been getting confused by mini= and view=ajax and all the other parameters floating around. I tried different approaches, including creating a callback function that generated just the HTML for the block, but then I found myself passing in too many parameters to control the URLs for the links.

My aha! moment was when I realized that the way the patch was handling the AJAX was to generate the entire page. When it got to processing the calendar block in the sidebar, the code checked for the $_GET[‘view’] parameter, and if an AJAX view was requested, it would print out the block and exit without printing the rest of the page. While that worked for the general case, we needed to modify our code so that the calendar blocks don’t appear on the calendar detail pages, so I wrote some Javascript code that requested a page within the right context.

This approach of generating the whole page didn’t quite work when it came to the subscription form that we embedded in event node page templates, though, because it printed out the node content before it generated the form. I used jQuery to retrieve the entire page, and then I extracted just the DIV I wanted to keep.

I still don’t like fussing with CSS (particularly when it comes to collapsing borders or dealing with browser issues), so I’ll leave that in the capable hands of our information architect. But now I’m the jQuery ninja on our team, too, and I know I can rock CCK+Views and calendars for future projects. =D

(p.s. Left out details, but if you’re curious about any of the bullet points, comment and I might flesh it out into its own blog post!)

Drupal staging and deployment tips: It’s all code

Posted: - Modified: | drupal

As I talk to more and more developers about practices for working with Drupal, I get the idea that the staging and deployment process adopted by my team isn’t widespread.

Many developers make their changes directly through the web-based interface of a testing server, or even on the production site itself. I think that’s both tedious and scary. =)

Putting all of our behavior-related changes in update function in the install file makes it easier to merge changes and repeatedly test upgrades. Editorial changes (fixing typos, etc.) can happen on the site, but if it’s behavior-related code, it should be in the code repository. In moments of weakness, we’ve made web-based changes to our site, and we almost always regret those–either right away because things broke, or when we try to reconstruct our changes.

The Module Developer’s Guide documents how to write .install files for Drupal 5 and Drupal 6, but doesn’t go into much detail about what else you can put in the update functions. Maybe that’s why many developers use install files only for database-related changes. But you can do so much more: creating nodes, adding permissions, enabling other modules, and so on. 

Watch out for these potential pitfalls:

  • $user is set to the superuser (uid = 1) if you run update.php after logging in as the superuser, but if $access_check is set to FALSE in update.php, then $user will be null. That means that if you’re creating nodes or doing other things that check user_access or node_access, you should temporarily switch to the superuser.
  • If your update works if you apply them one at a time, but not if you apply all of them in one go, look for functions that cache information in static variables. You may need to modify the source code to add an argument that allows you to reset the cache or find another way to deal with the static variables. Static variables are a pain.
  • Another reason why batch updates may fail while incremental updates work has to do with the order that the update functions are called in. Modules are processed according to weight and alphabetical order, and all applicable update functions within one module are run before moving onto the next. If you have a set of related modules, put update functions that affect the related modules into the module with the heaviest weight.
  • .install files may get really long if you create _update_N functions for every change and you frequently deploy to a test server. You can refactor your update functions if other developers know that they should test the updates from a fresh copy of the production database instead of from incremental updates to their current system. Make sure you don’t add code below the update level on the production server.
  • Don’t forget to return an array containing results, or even just array().
  • Here are some general tips on how to find out the programmatic equivalent of a web-based action:

    • Find the form or form_submit function that processes the action and see if there are any API functions you can call to produce the same effect. If you can’t find an API function, consider writing one.
    • If you can’t write an API function, use the Macro module (part of the Devel module for Drupal 5) to record the form submission, and then use drupal_execute to run the recorded macro, OR
    • Directly manipulate the database, making sure to call any hooks necessary.

    And here are some examples of programmatically doing things:

    Setting a variable
    variable_set(‘yourvariable’, ‘yourvalue’);

    Enabling modules
    include_once(‘includes/install.inc’);
    module_rebuild_cache();
    drupal_install_modules(array(‘module1’, ‘module2’));

    Disabling modules
    db_query(“UPDATE {system} SET status=0 WHERE type=’module’ AND name=’%s'”, ‘modulename’);

    Creating nodes 
    global $user;
    $old_user = $user;
    $user = user_load(array(‘uid’ => 1));
    $session = session_save_session();
    $session_save_session(FALSE);
    // Do the work
    $node = new stdClass();
    $node->type = ‘page’;
    $node->title = ‘Title’;
    $node->body = ‘Body’;
    node_save($node);
    // Restore the user
    $user = $old_user;
    session_save_session($session);

    Deleting nodes
    global $user;
    $old_user = $user;
    $user = user_load(array(‘uid’ => 1));
    $session = session_save_session();
    session_save_session(FALSE);
    // Do the work
    node_delete($nid);
    // Restore the user
    $user = $old_user;
    session_save_session($session);

    Updating the list of blocks
    global $theme_key;
    $theme_key = ‘yourtheme’;
    _block_rehash();

    Convenience functions for working with permissions

    function _add_permissions($roles, $permissions) {
      $ret = array();
      foreach ($roles as $rid) {
        if (is_numeric($rid)) {
          $role = db_fetch_array(db_query("SELECT rid, name FROM {role} WHERE rid=%d",
    				      $rid));
        }
        else {
          $role = db_fetch_array(db_query("SELECT rid, name FROM {role} WHERE name='%s'",
    				      $rid));
        }
        $role_permissions =
          explode(', ',
    	      db_result(db_query('SELECT perm FROM {permission} WHERE rid=%d',
    				 $role['rid'])));
        $role_permissions = array_unique(array_merge($role_permissions, $permissions));
        db_query('DELETE FROM {permission} WHERE rid = %d', $role['rid']);
        db_query("INSERT INTO {permission} (rid, perm) VALUES (%d, '%s')",
    	     $role['rid'],
    	     implode(', ', $role_permissions));
        $ret[] = array('success' => true,
    		   'query' => "Added " . implode(', ', $permissions)
    		   . ' permissions for ' . $role['name']);
      }
      return $ret;
    }
    
    function _remove_permissions($roles, $permissions) {
      $ret = array();
      foreach ($roles as $rid) {
        if (is_numeric($rid)) {
          $role = db_fetch_array(db_query("SELECT rid, name FROM {role} WHERE rid=%d",
    				      $rid));
        }
        else {
          $role = db_fetch_array(db_query("SELECT rid, name FROM {role} WHERE name='%s'",
    				      $rid));
        }
        $role_permissions =
          explode(', ',
    	      db_result(db_query('SELECT perm FROM {permission} WHERE rid=%d',
    				 $role['rid'])));
        $role_permissions = array_diff($role_permissions, $permissions);
        db_query('DELETE FROM {permission} WHERE rid = %d', $role['rid']);
        db_query("INSERT INTO {permission} (rid, perm) VALUES (%d, '%s')",
    	     $role['rid'],
    	     implode(', ', $role_permissions));
        $ret[] = array('success' => true,
    		   'query' => "Removed " . implode(', ', $permissions)
    		   . ' permissions for ' . $role['name']);
      }
      return $ret;
    }
    

    Use update functions for all of your behavioral changes. Use source code control. Write regression tests. These practices won’t take all the challenges out of Drupal development, but they certainly make it less stressful–and more fun. =)