Category Archives: drupal

On this page:

Drupal, HTML Purifier, and embedding IFRAMES from YouTube

I know, I know. I shouldn’t allow IFRAMEs at all. But the client’s prospective users were really excited about images and video, and Drupal’s Media module wasn’t going to be quite enough. So I’ve been fighting with CKEditor, IMCE, and HTML Purifier to figure out how to make it easier. I’m hoping that this will be like practically all my other Drupal posts and someone will comment with a much better way to do things right after I describe what I’ve done. =)

First: images. There doesn’t seem to be a cleaner way than the “Browse server” – “Upload” combination using CKEditor and IMCE. I tried using WYSIWYG, TinyMCE and IMCE. I tried ImageBrowser, but I couldn’t get it to work. I tried FCKEditor, which looked promising, but I got tangled in figuring out how to control other parts of it. I’m just going to leave it as CKEditor and IMCE at the moment, and we can come back to that if it turns out to be higher priority than all the other things I’m working on. This is almost certainly my limitation rather than the packages’ limitations, but I don’t have the time to exhaustively tweak this until it’s right. Someday I may finally learn how to make a CKEditor plugin, but it will not be in the final week of this Drupal project.

Next: HTMLPurifier and Youtube. You see, Youtube switched to using IFRAMEs instead of Flash embeds. Allowing IFRAMEs is like allowing people to put arbitrary content on your webpage, because it is. The HTML Purifier folks seem firmly against it because it’s a bad idea, which it also is. But you’ve got to work around what you’ve got to workaround. Based on the Allow iframes thread in the HTMLPurifier forum, this is what I came up with:

Step 1. Create a custom filter in htmlpurifier/library/myiframe.php.

<?php
// Iframe filter that does some primitive whitelisting in a
// somewhat recognizable and tweakable way
class HTMLPurifier_Filter_MyIframe extends HTMLPurifier_Filter
{
  public $name = 'MyIframe';
  public function preFilter($html, $config, $context) {
    $html = preg_replace('/<iframe/i', '<img class="MyIframe"', $html);
    $html = preg_replace('#</iframe>#i', '', $html);
    return $html;
  }
  public function postFilter($html, $config, $context) {
    $post_regex = '#<img class="MyIframe"([^>]+?)>#';
    return preg_replace_callback($post_regex, array($this, 'postFilterCallback'), $html);
  }
  protected function postFilterCallback($matches) {
    // Whitelist the domains we like
    $ok = (preg_match('#src="http://www.youtube.com/#i', $matches[1]));
    if ($ok) {
      return '<iframe ' . $matches[1] . '></iframe>';
    } else {
      return '';
    }
  }
}

Step 2. Include the filter in HTMLPurifier_DefinitionCache_Drupal.php. I don’t know if this is the right place, but I saw it briefly mentioned somewhere.

// ... rest of file
require_once 'myiframe.php';

Step 3. Create the HTML Purifier config file. In this case, I was changing the config for “Filtered HTML”, which had the input format ID of 1. I copied config/sample.php to config/1.php and set the following:

function htmlpurifier_config_1($config) {
  $config->set('HTML.SafeObject', true);
  $config->set('Output.FlashCompat', true);
  $config->set('URI.DisableExternalResources', false);
  $config->set('Filter.Custom', array(new HTMLPurifier_Filter_MyIframe()));
}

Now I can switch to the source view in CKEditor, paste in my IFRAME code from Youtube, and view the results. Mostly. I still need to track down why I sometimes need to refresh the page in order to see it, but this is promising.

2011-08-05 Fri 16:34

Drupal and JQuery 1.5: Fixing the JSON encoding of ampersands

Drupal 6′s drupal_json method encodes ampersands incorrectly for JQuery 1.5, causing the rather cryptic error:

Uncaught Syntax error, unrecognized expression: ...

(If you’re lucky.)

The way to fix this is to borrow the JSON-handling code from Drupal 7. Here’s something you might be able to use:

function yourmodule_json_encode($var) {
  return str_replace(array('<', '>', '&'), array('\u003c', '\u003e', '\u0026'), $var);
}

// Fix Drupal JSON problems from http://witti.ws/blog/2011/03/14/jquery-15-json-parse-error
function yourmodule_json($var) {
  drupal_set_header('Content-Type: text/javascript; charset=utf-8');
  if (isset($var)) {
    echo yourmodule_json_encode(json_encode($var));
  }
}

Use yourmodule_json instead of drupal_json wherever applicable.

Hat tip to Greg Payne (Witti) for pointing me in the right direction!

2011-08-04 Thu 14:01

Hacking Drupal views and taxonomy: looking for 100% matching of terms

I’m working on a Drupal 6 site that helps match volunteers to speaking opportunities, or sessions. I use Taxonomy to keep track of the qualifications so that I can maintain the qualification hierarchy. Given a list of subject areas that a person is interested in, I need to find all sessions that match any of those subject areas. The quirk: the session must have at least one of the person’s terms, and the person must also have all the session’s terms.

Let’s say that our volunteer is interested in speaking about biology and physics. I couldn’t use a straightforward AND search. If I searched for biology AND physics, I wouldn’t get sessions for just biology. It also means I can’t use a straightforward OR search, because I shouldn’t list sessions that require both biology AND another subject the person hadn’t listed, such as chemistry.

Views didn’t seem to have a built-in way to do it. I couldn’t think of a standard-ish way to describe my challenge in order to find relevant posts on drupal.org. Content recommendation modules seemed similar, but I wasn’t familiar with any of them enough to know which one would be the closest to hack for my cross-type comparisons and 100% match requirements. So it was time to hack my Views query.

After several attempts, I settled on the approach of precalculating how many terms were associated with each session node. I created a table with the information and used the following query to populate it in my install file.

db_query("INSERT INTO {node_term_count} 
  SELECT nid, vid, count(tid) AS term_count 
  FROM {term_node} GROUP BY nid, vid");

I also used hook_nodeapi to update the table on insert, update, and delete operations.

Then I started experimenting through the SQL console. I used COUNT and GROUP BY to find out how many terms the session had in common with the person. Selecting from that MySQL subquery let me filter the list to the nodes where the total number of terms equaled the number of terms the session had. I ended up with a query that looked like this:

SELECT nid, vid FROM (SELECT tns.nid, tns.vid, 
  COUNT(tns.tid) AS match_count, 
  c.term_count FROM term_node tns 
  INNER JOIN node_term_count c ON tns.vid=c.vid 
  WHERE tns.tid in (55, 56, 42, 39, 41) 
  GROUP BY tns.vid) AS result 
WHERE term_count = match_count;

When I was happy with the query, I used hook_views_pre_execute to change my $view->build_info['query'] and $view->build_info['count_query']. With all the other filters I needed, it eventually looked like this:

    $view->build_info['query'] = "SELECT * FROM (
SELECT tns.nid, tns.vid, count(tns.tid) AS match_count, c.term_count, workflow_node.sid FROM node n 
INNER JOIN term_node tns ON (n.vid=tns.vid AND n.nid=tns.nid)
LEFT JOIN workflow_node workflow_node ON n.nid = workflow_node.nid 
INNER JOIN node_term_count c ON tns.vid=c.vid
INNER JOIN content_type_session session ON (n.nid=session.nid AND n.vid=session.vid)
INNER JOIN node school_node ON (session.field_session_school_nid=school_node.nid)
INNER JOIN content_type_school school ON (school_node.nid=school.nid AND school_node.vid=school.vid)
INNER JOIN content_field_session_dates date ON (n.nid=date.nid AND n.vid=date.vid AND date.delta=0)
WHERE (n.type in ('%s')
AND workflow_node.sid=%d
AND session.field_session_request_mode_value = '%s'
AND (n.status <> %d) 
AND (DATE_FORMAT(ADDTIME(date.field_session_dates_value, SEC_TO_TIME(-14400)), '%Y-%m-%%d') >= '" . date('Y-m-d') . "')
AND school.field_school_district_nid IN ($district_where)
AND tns.tid in ($tid_where))
GROUP BY tns.vid
) as result WHERE term_count = match_count AND match_count > 0";

I used variables like $tid_where and $district_where to simplify the query. They use array_fill to create placeholders for the arguments.

Result: I think it works the way it’s supposed to. It passes my unit tests and manual testing, anyway. If performance becomes an issue, I might precalculate the results and store them in a table. I hope I don’t have to do that, though.

Views 3 is supposed to have arbitrary data stores that let you write views on top of any sort of query or function, but I’m going to stay with Views 2 for now.

Whenever I write about stuff we’re doing with Drupal, I often hear about even awesomer ways to do things. =) Is this one of those times? Is there a little-known module that Does the Right Thing?

Getting a grip on a large database migration

Michael is working on migrating a custom website with hundreds of database tables to Drupal, and he wanted to know if I had any advice for keeping track of table mappings and other migration tasks.

I’ve worked on small migration projects before (including migrating my own blog from lots of Planner-mode text files to WordPress!), but no large projects like the ones Michael described. But if I needed to do something like that, here’s what I’d probably do. I’d love to hear your tips!

I’d list all the tables and start mapping them to entities. What content types would I need to create? What fields would I need to define? How are the content types related to each other? An entity relationship diagram can help you get an overview of what’s going on in the database.

Then I’d start untangling the entities to see which ones I can migrate first. If you have entities with node references, it makes sense to migrate the data referred to before migrating the data that refers to them. If I can get a slice of the database – not all the records, just enough to flesh out the different relationships – that would make testing the migrations faster and easier. I would probably write a custom Drupal module to do the migrations, because it’s much easier to programmatically create nodes than it is to insert all the right entries into all the right tables.

I’d commit the custom module to source code control frequently. I’d write some code to migrate an entity type or two, test the migration, and commit the source code. As I migrated more and more of the relationships, I’d probably check them off or colour them differently in the diagram, making note of anything I’d need to revisit (circular references, etc.).

I might break the custom module up into steps to make it easier to rerun or test. That way, I’m not reconstructing the entire database in one request, too.

I’d take notes on design decisions. When you migrate data, you’ll probably come across data that challenges your initial assumptions. This might require redesigning your entities and revising your earlier migration code. When I make design decisions, I often write about the options I’m considering and the reasons for or against them. This makes those decisions easier to revisit when new data might invalidate my assumptions, because I can see what may need to be changed.

How would you handle a migration project that’s too large to hold in your head?

Context-switching and a four-project day

Context-switching among multiple projects can be tough. I’m currently:

  • working full-time on one project (a Drupal 6 non-profit website)
  • consulting on another (helping an educational institution with Drupal 7 questions)
  • supporting a third (Ruby on Rails site I built for a local nonprofit, almost done), and
  • trying to wrap up on a fourth (PHP/AJAX dashboard for a call center in the US).

I’m doing the Drupal 6 development in a virtual machine on my system, with an integration server set up externally. Consulting for the second project is done on-site or through e-mail. The Rails site is on a virtual server. The dashboard project is now on the company’s servers (IIS6/Microsoft SQL Server), which I can VPN into and use Remote Desktop to access. I’m glad I have two computers and a standing desk (read: kitchen counter) that makes it easy to use both!

Today was one of those days. I helped my new team member set up his system so that he could start working on our project. He’s on Mac OS X. It took us some time to figure out some of the quirky behaviour, such as MySQL sockets not being where PHP expected them to be. Still, we got his system sorted out, so now he can explore the code while I’m on vacation tomorrow.

In between answering his questions, I replied to the consulting client’s questions about Drupal and the virtual image we set up yesterday. That mainly required remembering what we did and how we set it up. Fortunately, that part was fairly recent, so it was easy to answer her questions.

Then I got an instant message from the person I worked with on the fourth project, the call-center dashboard. He asked me to join a conference call. They were having big problems: the dashboard wasn’t refreshing, so users couldn’t mark their calls as completed. It was a little nerve-wracking trying to identify and resolve the problem on the phone. There were two parts to the problem: IIS was unresponsive, and Microsoft SQL Server had stopped replicating. The team told me that there had been some kind of resource-related problem that morning, too, so the lack of system resources might’ve cascaded into this. After some hurried searching and educated guesses about where to nudge the servers, I got the database replication working again, and I set IIS back to using the shared application pool. I hope that did the trick. I can do a lot of things, but I’m not as familiar with Microsoft server administration as I am with the Linux/Apache/MySQL or Linux/Apache/PostgreSQL combinations.

I felt myself starting to stress out, so I deliberately slowed down while I was making the changes, and I took a short nap afterwards to reset myself. (Coding or administering systems while stressed is a great way to give yourself even more work and stress.)

After the nap, I was ready to take on the rest. The client for the Rails project e-mailed me a request to add a column of output to the report. I’d archived my project-related virtual machine already, so I (very carefully) coded it into the site in a not-completely-flexible manner. I found and fixed two bugs along the way, so it was a good thing I checked.

Context-switching between Drupal 6, Drupal 7, and Rails projects is weird. Even Drupal 6 and Drupal 7 differ significantly in terms of API, and Rails is a whole ‘nother kettle of fish. I often look things up, because it’s faster to do that than to rely on my assumptions and debug them when I’m wrong. Clients and team members watching me might think I don’t actually know anything by myself and I’m looking everything up as I go along. Depending on how scrambled my brain is, I’d probably suck in one of those trying-to-be-tough job interviews where you have to write working code without the Internet. But it is what it is, and this helps me build things quickly.

On the bright side, it’s pretty fun working with multiple paradigms. Rails uses one way of thinking, Drupal uses another, and so on. I’ve even mixed in Java before. There were a few weeks I was switching between enterprise Java, Drupal, Rails, and straight PHP. It’s not something I regularly do, but when the company needs it, well… it’s good exercise. Mental gymnastics. (And scheduling gymnastics, too.)

I like having one-project days. Two-project days are kinda okay too. Four-project days – particularly ones that involve solving a problem in an unfamiliar area while people are watching! – are tough, but apparently survivable as long as I remember to breathe. =)

Here are tips that help me deal with all that context-switching. Maybe they’ll help you!

Look things up. It’s okay. I find myself looking up even basic things all the time. For example, did you know that Ruby doesn’t have a straightforward min/max function the way PHP does? The canonical way to do it is to create an array (or other enumerable) and call the min or max member function, like this: [x,y].max. Dealing with little API/language quirks like that is part of the context-switching cost. Likewise, I sometimes find myself wishing I could just use something like rails console in my Drupal sites… =)

Take extensive notes. Even if you’re fully focused on one project and have no problems remembering it now, you might need to go back to something you thought you already finished.

Slow down and take breaks. Don’t let stress drag you into making bad decisions. I felt much more refreshed after a quick nap, and I’m glad I did that instead of trying to force my way through the afternoon. This is one of the benefits of working at home – it’s easy to nap in an ergonomic and non-embarrassing way, while still getting tons of stuff done the rest of the day.

Clear your brain and focus on the top priority. It’s hard to juggle multiple projects. I made sure my new team member had things to work on while I focused on the call center dashboard project so that I wouldn’t be tempted to switch back and forth. Likewise, I wrote the documentation I promised for that project before moving on to the Rails project.

Breathe. No sense in stressing out and getting overwhelmed. Make one good decision at a time. Work step by step, and you’ll find that you’ll get through everything you need to do. Avoid multi-tasking. Single-task and finish as much as you can of your top priority first.

I prefer having one main project, maybe two projects during the transition periods. This isn’t always possible. Programming competitions helped me learn how to deal with multiple chunks of work under time pressure, and I’m getting better at it the more that work throws at me.

What are your tips for dealing with simultaneous projects?

2011-06-30 Thu 16:19

Drupal notes from helping a client improve her development environment

Keeping a to-do list helps you keep sane. If you don’t have a full-scale issue tracker, use a wiki page, text file, or something like that. It’s really useful to be able to get the list of things you’re working on or waiting for out of your head and into a form you can review.

*Drupal Features help you export configuration into code.* This is much better than creating an installation profile because you can update your features with new settings and apply them to existing sites. Invaluable when working with multi-sites that may need to be updated. You may need to clear your Drupal cache before you see changes applied.

Version control is really handy even when you’re working on your own. The ability to go back in time to a working setup (code + database) can help you experiment more freely and avoid late nights spent recovering from mistakes.

*Drush (Drupal shell) is awesome.* It’s a big timesaver. We use it to download and enable modules (dl and en), clear the cache (cc), run database updates (updatedb), launch a SQL console (sqlc), execute PHP (php-eval), run tests, and so on. I use it a lot because I hate clicking around.

Even more powerful with a little bit of xargs magic so that it’s easy to run a drush command against all the sites. Like this:

cat sites.txt | xargs -n 1 -I {} drush -l {} somecommand

Design decisions: Multisite without shared tables; services or syndication for sharing content between sites; central authentication for admin users…

Bash script to create or clone multisites makes tedious things a little bit simpler. Tasks:

  • Create a database and give access to a user.
  • Create the site and files directory.
  • Create the settings.php with the database settings.
  • Copy the base database into the new database.
  • Create a symbolic link.

2011-06-28 Tue 08:55