6082 comments
2357 subscribers
6221 on Twitter
Subscribe! Feed reader E-mail

"An Easy Method for Beginners in Latin" and macron-insensitive search for Tiddlywiki

As previously mentioned, W- and I are re-typing parts of Albert Harkness’ 1822 textbook "An Easy Method for Beginners in Latin", which was digitized and uploaded to Google Books as a PDF of images. The non-searchable book was driving W- mad, so we’re re-typing up lessons. It’s a decent way to review, and I’m sure it will be a great resource for other people too.

Here’s what we have so far: An Easy Method for Beginners in Latin, Lessons 1-9

We’re starting off using Tiddlywiki because it’s a wiki system that W-’s been using a lot for his personal notes. He’s familiar with the markup. It’s not ideal because Google doesn’t index it, the file size is bigger than it needs to be (0.5MB!), and it’s Javascript-based. It’s a good start, though, and I should be able to convert the file to another format with a little scripting. My first instinct would be to start with Org Mode for Emacs, of course, but we already know what W- thinks of Emacs. ;)

Most of the text was easy to enter. Harkness is quite fond of footnotes, numbered sections, and lots of bold and italic formatting. We’re going to skip the illustrations for now.

Typing all of this in and using it as our own reference, though, we quickly ran into a limitation of the standard TiddlyWiki engine (and really, probably all wiki engines): you had to search for the exact word to find something. In order to find poēta, you had to type poēta, not poeta. That’s because ē and e are two different characters.

We wanted to keep the macrons as pronunciation and grammar guides. We didn’t want to require people to know or type letters with macrons. Hmm. Time to hack Tiddlywiki.

TiddlyWiki plugins use Javascript. I found a sample search plugin that showed me the basics of what I needed.

I considered two approaches:

  1. Changing the search text to a regular expression that included macron versions of each vowel
  2. Replacing all vowels in the Tiddler texts with non-macron vowels when searching

The first approach was cleaner and looked much more efficient, so I chose that route. If the search text contained a macron, I assumed the searcher knew what he or she was doing, so I left the text alone. If the text did not contain a macron, I replaced every vowel with a regular expression matching the macron equivalents. Here’s what that part of the code looked like:

s = s.replace(/(.)/g, "['/]*$1");
if (!s.match(macronPattern)) {
  // Replace the vowels with the corresponding macron matchers
  s = s.replace(/a/, "[aāĀA]");
  s = s.replace(/e/, "[eēĒE]");
  s = s.replace(/i/, "[iīĪI]");
  s = s.replace(/o/, "[oōŌO]");
  s = s.replace(/u/, "[uūŪU]");
}

That got me almost all the way there. I could search for most of the words using plain text (so poeta would find poēta and regina would find rēgīnae), but some words still couldn’t be found.

A further quirk of the textbook is that the characters in a word might be interrupted by formatting. For example, poēt<strong>am</strong> is written as =poēt”am”= in Tiddlywiki markup. So I also inserted a regular expression matching any number of ‘ or / (bold or italic markers when doubled) between each letter:

s = s.replace(/(.)/g, "['/]*$1");

It’s important to do this before the macron substitution, or you’ll have regexp classes inside other classes.

That’s the core of the macron search. Here’s what it looks like. I was so thrilled when I got all of this lined up! =)

image

And the source code:

// Macron Search Plugin
// (c) 2011 Sacha Chua - Creative Commons Attribution ShareAlike 3.0 License
// Based on http://devpad.tiddlyspot.com/#SimpleSearchPlugin by FND

if(!version.extensions.MacronSearchPlugin) { //# ensure that the plugin is only installed once
version.extensions.MacronSearchPlugin = { installed: true };

if(!config.extensions) { config.extensions = {}; }

config.extensions.MacronSearchPlugin = {
  heading: "Search Results",
  containerId: "searchResults",
  btnCloseLabel: "Close search",
  btnCloseTooltip: "dismiss search results",
  btnCloseId: "search_close",
  btnOpenLabel: "Open all search results",
  btnOpenTooltip: "Open all search results",
  btnOpenId: "search_open",

  displayResults: function(matches, query) {
    story.refreshAllTiddlers(true); // update highlighting within story tiddlers
    var el = document.getElementById(this.containerId);
    query = '"""' + query + '"""'; // prevent WikiLinks
    if(el) {
      removeChildren(el);
    } else { //# fallback: use displayArea as parent
      var container = document.getElementById("displayArea");
      el = document.createElement("div");
      el.id = this.containerId;
      el = container.insertBefore(el, container.firstChild);
    }
    var msg = "!" + this.heading + "\n";
    if(matches.length > 0) {
        msg += "''" + config.macros.search.successMsg.format([matches.length.toString(), query]) + ":''\n";
      this.results = [];
      for(var i = 0 ; i < matches.length; i++) {
        this.results.push(matches[i].title);
        msg += "* [[" + matches[i].title + "]]\n";
      }
    } else {
      msg += "''" + config.macros.search.failureMsg.format([query]) + "''\n"; // XXX: do not use bold here!?
    }
    wikify(msg, el);
    createTiddlyButton(el, "[" + this.btnCloseLabel + "]", this.btnCloseTooltip, config.extensions.MacronSearchPlugin.closeResults, "button", this.btnCloseId);
    if(matches.length > 0) { // XXX: redundant!?
      createTiddlyButton(el, "[" + this.btnOpenLabel + "]", this.btnOpenTooltip, config.extensions.MacronSearchPlugin.openAll, "button", this.btnOpenId);
    }
  },

  closeResults: function() {
    var el = document.getElementById(config.extensions.MacronSearchPlugin.containerId);
    removeNode(el);
    config.extensions.MacronSearchPlugin.results = null;
    highlightHack = null;
  },

  openAll: function(ev) {
    story.displayTiddlers(null, config.extensions.MacronSearchPlugin.results);
    return false;
  }
};

// override Story.search()
Story.prototype.search = function(text, useCaseSensitive, useRegExp) {
  var macronPattern = /[āĀēĒīĪōŌūŪ]/;
  var s = text;
  // Deal with bold and italics in the middle of words
  s = s.replace(/(.)/g, "['/]*$1");
  if (!s.match(macronPattern)) {
    // Replace the vowels with the corresponding macron matchers
    s = s.replace(/a/, "[aāĀA]");
    s = s.replace(/e/, "[eēĒE]");
    s = s.replace(/i/, "[iīĪI]");
    s = s.replace(/o/, "[oōŌO]");
    s = s.replace(/u/, "[uūŪU]");
  }
  var searchRegexp = new RegExp(s, "img");
  highlightHack = searchRegexp;
  var matches = store.search(searchRegexp, null, "excludeSearch");
  config.extensions.MacronSearchPlugin.displayResults(matches, text);
};

// override TiddlyWiki.search() to ignore macrons when searching
TiddlyWiki.prototype.search = function(s, sortField, excludeTag, match) {
    // Find out if the search string s has a macron
    var candidates = this.reverseLookup("tags", excludeTag, !!match);
    var matches = [];
    for(var t = 0; t < candidates.length; t++) {
        if (candidates[t].title.search(s) != -1 ||
            candidates[t].text.search(s) != -1) {
            matches.push(candidates[t]);
        }
    }
    return matches;
};

} //# end of "install only once"

To add this to your Tiddlywiki, create a new tiddler. Paste in the source code. Give it the systemConfig tag (the case is important). Save and reload your Tiddlywiki file, and it should be available.

It took me maybe 1.5 hours to research possible ways to do it and hack the search plugin together for Tiddlywiki. I’d never written a plugin for Tiddlywiki before, but I’ve worked with Javascript, and it was easy to pick up. I had a lot of fun coding it with W-, who supplied plenty of ideas and motivation. =) It’s fun geeking out!

Short URL: http://sachachua.com/blog/p/22225

On This Day...

  • 2013: How I got started in investing — When I was growing up, I raided my mom’s bookshelves for whatever I could understand—and quite a few things that [...]
  • 2012: Digital uncluttering: my backup and clean up plan — I organized my files, weeded out blurry photos, and thought about how I want to improve my workflow for input, [...]
  • 2011: Weekly review: Week ending April 29, 2011 — From last week’s plans Work [X] Get Vijay up to speed on project C [X] Review code for project [...]
  • 2010: Paper is the new PowerPoint — … and that’s a great thing. =) Check out these creative presentations by Betsy Streeter: Ten Great Uses for a Pencil How to [...]
  • 2009: Tips for getting started with virtual assistance — People often ask me about my experiments with outsourcing to virtual assistants and my reasons for this experiment, and I’m [...]
  • 2007: Okay, this is really annoying now — I’ve been trying to set up Scone as a proxy on my laptop, since I can’t run it on the prototype [...]
  • 2006: Chicken breasts marinated in red wine vinegar — I invited myself over to Quinn Fung‘s place for dinner. <laugh> She had a mac and cheese casserole, so I [...]
  • 2006: Awwwww! — I just picked up my mom’s care package including pretty earrings and a wonderful, wonderful, scribbled-on notebook with dedications from my friends… I [...]
  • 2004: Andrea Bocelli — Skipping aikido today because it’s too close to the Andrea Bocelli concert tonight. My mom and I have tickets; not very [...]
  • 2004: Preparation — Get a YHA card Get sheets Brisbane first Skytrans shuttle bus between transit center and airport (tel 3236 1000), fare 6.50 Getting around: Off-peak [...]
  • 2003: other funnies — http://www.phdcomics.com/comics/archive.php?comicid=180 For Fr. David: http://www.phdcomics.com/comics/archive.php?comicid=178
  • 2003: Education versus training — education — http://searchdatabase.techtarget.com/tip/1,289483,sid13_gci505309,00.html?FromTaxonomy=%2Fpr%2F284872
  • 2003: Piled higher and deeper – more comics — http://www.phdcomics.com/comics/archive.php?comicid=331
  • 2003: Blog reaction: Filipino Open Source Developers — blog — Reply to http://www.spaceants.org/cgi-bin/blosxom.cgi/computing/software/filoos-280403.writeback : Glad to say that students get to seriously work with Linux in Ateneo - you can thank Horatio [...]
  • 2003: Cory Doctorow — Geek blog. Good read. Nice novels, too – and free. http://boingboing.net/
  • 2003: Common Java errors — http://www.skylit.com/javamethods/appxf.html
  • 2003: ruler-mode — Look! ruler-mode is pretty darn nifty. =)
  • 2003: Gnus tidbits from NEWS — emacs — Gnus can display RSS newsfeeds as a newsgroup. To get started do `B nnrss RET RET’ in the Group buffer.
  • 2003: gnus-dired-mode — emacs — Dired integration `gnus-dired-minor-mode’ installs key bindings in dired buffers to send a file as an attachment (`C-c C-a’), open a file using [...]

Get the highlights as a PDF!

Stories from my Twenties: Highlights from a Decade of Blogging