Over ten years, my WordPress blog had ballooned to more than 500 categories. Part of it was because Org2Blog makes creating new categories super-easy, so I just piled them on (occasionally mispelling a few). Part of it was because I don’t really know what I’ll write a lot about until I write, so I had categories with one or two posts and then I moved on. Part of it was because I hadn’t decided what I’m going to use categories for and what I’m going to use tags for–yes, even after all those years.
I wanted to revamp my categories so that the Life, Geek, and Visual categories and their corresponding feeds might be useful to people who find my daily posts awesome-but-overwhelming and who would prefer a slice of my blog tailored to their interests. (There are even more categories on my archive page.) This meant organizing the categories into a hierarchy, but first I wanted to cut the number down to something more manageable.
The WordPress interface for managing categories leaves much to be desired when it comes to bulk actions. Fortunately, the Term Management Tools plugin makes it easy to merge categories or convert them to tags using additional Bulk Actions on the standard Categories screen.
I merged a few of the common typos, then converted any category with fewer than 10 posts into a tag. The original version failed silently when converting a category if a tag with the same name already existed, so I patched my version to silently merge the terms.
The Screen Options menu let me change the number displayed on screen to 100 items, which made it much easier for me to weed out most of my categories. Term Management Tools also provides a bulk action for setting a category parent, which was great for quickly reorganizing my categories into a hierachy.
I still had almost 3,000 uncategorized posts. Since I haven’t quite found or written an automatic N-gram text classifier for WordPress posts (if you have one, please share!), I decided to see if I could make a dent in this manually. I started by prioritizing the posts with comments. I assigned categories using the Posts screen, but that took a while and too many mouseclicks. The Categorized plugin automatically unchecks the default category once you select a different one, which saved me one click per post, but it still wasn’t enough. I ended up extracting a list of posts from my database with the following SQL command:
SELECT p.id, p.post_title, p.post_date, p.comment_count FROM wp_posts p INNER JOIN wp_term_relationships r ON (p.id=r.object_id AND r.term_taxonomy_id=1) WHERE p.post_type='post' AND p.post_status='publish' into outfile '/tmp/published.txt';
and another list of terms and taxonomy IDs:
SELECT t.*, tt.term_taxonomy_id FROM wp_terms t INNER JOIN wp_term_taxonomy tt ON (t.term_id=tt.term_id AND tt.taxonomy='category') INTO OUTFILE '/tmp/terms.txt';
After a little spreadsheet manipulation involving
VLOOKUP-ing the category name that I manually entered for each one, I copied the term taxonomy ID and post IDs into an Emacs buffer and used a keyboard macro to change it into the form:
UPDATE wp_term_relationships SET term_taxonomy_id=? WHERE object_id=? AND term_taxonomy_id=1;
where 1 was the
term_taxonomy_id corresponding to
Since I was on a roll, I decided to categorize everything from 2007 onwards, which is farther back than my manual index goes. That got me through about a thousand items before I decided it was enough filing for one day. As of the time of writing, there were 6512 posts on my blog. 4,536 posts (70%) belong to various categories, while 1,976 are still uncategorized.
I hope this work pays off! =) I expect that it will make my blog a little easier to browse.