Hacking Drupal views and taxonomy: looking for 100% matching of terms

| drupal, geek

I’m working on a Drupal 6 site that helps match volunteers to speaking opportunities, or sessions. I use Taxonomy to keep track of the qualifications so that I can maintain the qualification hierarchy. Given a list of subject areas that a person is interested in, I need to find all sessions that match any of those subject areas. The quirk: the session must have at least one of the person’s terms, and the person must also have all the session’s terms.

Let’s say that our volunteer is interested in speaking about biology and physics. I couldn’t use a straightforward AND search. If I searched for biology AND physics, I wouldn’t get sessions for just biology. It also means I can’t use a straightforward OR search, because I shouldn’t list sessions that require both biology AND another subject the person hadn’t listed, such as chemistry.

Views didn’t seem to have a built-in way to do it. I couldn’t think of a standard-ish way to describe my challenge in order to find relevant posts on drupal.org. Content recommendation modules seemed similar, but I wasn’t familiar with any of them enough to know which one would be the closest to hack for my cross-type comparisons and 100% match requirements. So it was time to hack my Views query.

After several attempts, I settled on the approach of precalculating how many terms were associated with each session node. I created a table with the information and used the following query to populate it in my install file.

db_query("INSERT INTO {node_term_count} 
  SELECT nid, vid, count(tid) AS term_count 
  FROM {term_node} GROUP BY nid, vid");

I also used hook_nodeapi to update the table on insert, update, and delete operations.

Then I started experimenting through the SQL console. I used COUNT and GROUP BY to find out how many terms the session had in common with the person. Selecting from that MySQL subquery let me filter the list to the nodes where the total number of terms equaled the number of terms the session had. I ended up with a query that looked like this:

SELECT nid, vid FROM (SELECT tns.nid, tns.vid, 
  COUNT(tns.tid) AS match_count, 
  c.term_count FROM term_node tns 
  INNER JOIN node_term_count c ON tns.vid=c.vid 
  WHERE tns.tid in (55, 56, 42, 39, 41) 
  GROUP BY tns.vid) AS result 
WHERE term_count = match_count;

When I was happy with the query, I used hook_views_pre_execute to change my $view->build_info['query'] and $view->build_info['count_query']. With all the other filters I needed, it eventually looked like this:

    $view->build_info['query'] = "SELECT * FROM (
SELECT tns.nid, tns.vid, count(tns.tid) AS match_count, c.term_count, workflow_node.sid FROM node n 
INNER JOIN term_node tns ON (n.vid=tns.vid AND n.nid=tns.nid)
LEFT JOIN workflow_node workflow_node ON n.nid = workflow_node.nid 
INNER JOIN node_term_count c ON tns.vid=c.vid
INNER JOIN content_type_session session ON (n.nid=session.nid AND n.vid=session.vid)
INNER JOIN node school_node ON (session.field_session_school_nid=school_node.nid)
INNER JOIN content_type_school school ON (school_node.nid=school.nid AND school_node.vid=school.vid)
INNER JOIN content_field_session_dates date ON (n.nid=date.nid AND n.vid=date.vid AND date.delta=0)
WHERE (n.type in ('%s')
AND workflow_node.sid=%d
AND session.field_session_request_mode_value = '%s'
AND (n.status <> %d) 
AND (DATE_FORMAT(ADDTIME(date.field_session_dates_value, SEC_TO_TIME(-14400)), '%Y-%m-%%d') >= '" . date('Y-m-d') . "')
AND school.field_school_district_nid IN ($district_where)
AND tns.tid in ($tid_where))
GROUP BY tns.vid
) as result WHERE term_count = match_count AND match_count > 0";

I used variables like $tid_where and $district_where to simplify the query. They use array_fill to create placeholders for the arguments.

Result: I think it works the way it’s supposed to. It passes my unit tests and manual testing, anyway. If performance becomes an issue, I might precalculate the results and store them in a table. I hope I don’t have to do that, though.

Views 3 is supposed to have arbitrary data stores that let you write views on top of any sort of query or function, but I’m going to stay with Views 2 for now.

Whenever I write about stuff we’re doing with Drupal, I often hear about even awesomer ways to do things. =) Is this one of those times? Is there a little-known module that Does the Right Thing?

You can comment with Disqus or you can e-mail me at sacha@sachachua.com.