NEW: For a prettier blog interface, see the Wordpress version!

~/.diary schedule

No entries

Tasks

Priorities - A: high, B: medium, C: low; Status - _: unfinished, X: finished, C: cancelled, P: pending, o: in progress, >: delegated. Covey quadrants - Q1 & Q3: urgent, Q1 & Q2: important
AXPackage new planner

Notes

1. Japanese word list generator

Categories: None -- Permalink
<blockquote> MWAHAHAHAHA! I just pulled off a really neat Emacs hack. &lt;grin&gt; It's Japanese-related. So I've been translating this document for the past two days. It's really slow and boring work because there's no soft copy, so I have to write the characters (blurry because this is a photocopy of a photocopy) using the mouse, and hope I don't make any mistakes along the way. In the course of copying down kanji (Chinese characters) for later translation, I created a spreadsheet with two columns: the kanji word and the number of the slide it appears on. Then I exported that to CSV, opened that in Emacs, and wrote an Emacs Lisp function that split the words up into individual characters. I passed this through shell-command-on-region to sort and uniquify the characters. I then went back to the CSV with words and slide numbers, wrote another Emacs Lisp function that searched edict (Jim Breen's electronic Japanese dictionary) for the words, split the word into individual characters, and filed the word info under each character, also marking words that were not found in the dictionary. After that, I wrote yet another function to add table markup and individual character definitions to each line, then copied the result into an HTML file. </blockquote>

This should probably be rewritten as a Perl script.

;; Generate the list of characters
;; Use add-all to add all the words to the list
;; call sacha/kanji/format-kanji-with-references

(defvar sacha/kanji/output-file "~/tmp/kanji")

(defun sacha/kanji/process-csv ()
  (interactive)
  (sacha/kanji/split-dictionary)
  (sacha/kanji/add-all)
  (sacha/kanji/format-kanji-with-references))

(defun sacha/kanji/split-dictionary ()
  (interactive)
  (let ((buffer (current-buffer)))
    (with-current-buffer (find-file-noselect sacha/kanji/output-file)
      (erase-buffer)
      (insert-buffer-substring buffer)
      (goto-char (point-min))
      (while (re-search-forward "^\"\\([^\"]+\\)\":" nil t)
        (delete-region (match-end 1) (line-end-position))
        (goto-char (line-beginning-position))
        (delete-char 1)
        (while (not (eolp))
          (forward-char 1)
          (unless (bolp)
            (insert "\n"))))
      (goto-char (line-beginning-position))
      (delete-matching-lines "^\\s-*$")
      (shell-command-on-region (point-min) (point-max) "sort | uniq"  nil t))))

(defun sacha/kanji/add-all ()
  (interactive)
  (while (not (eobp))
    (sacha/add-word)
    (forward-line 1)))

(defun sacha/kanji/format-kanji-with-references ()
  "Add character meaning and table markup."
  (interactive)
  (find-file sacha/kanji/output-file)
  (goto-char (point-min))
  (while (not (eobp))
    (goto-char (line-beginning-position))
    (unless (= (char-after (point)) ?<)
      (forward-char 1)
      (let* ((kanji (buffer-substring (line-beginning-position) (point)))
             (definition (sacha/kanji/find-definition kanji)))
        (when definition
          (save-excursion
            (forward-char -1)
            (insert "<tr><td class=\"kanji\">"))
          (insert "</td><td class=\"meaning\">")
          (insert definition)
          (goto-char (line-end-position))
          (insert "</td></tr>"))))
    (forward-line)))

(defun sacha/kanji/find-definition (kanji)
  "Look up kanji definition."
  (with-current-buffer
      (find-file-noselect "/usr/share/edict/kanjidic")
    (goto-char (point-min))
    (when (and (search-forward kanji)
               (re-search-forward "\\({[^}]+}\\( {[^}]+}\\)+\\)"))
      (match-string 0))))  ;; kanji definitions

(defun sacha/kanji/lookup-word (key)
  "Return the definition of the current word. Ensure edict is loaded before running this."
  (with-current-buffer edict-buffer
    (goto-char (point-min))
    (when (re-search-forward (concat "^" key " \\[\\([^]]+\\)\\] /\\(.*\\)") nil t)
      (list (match-string 1) (match-string 2)))))

(defun sacha/add-word ()
  "Look up this word's definition and add the word to individual character entries."
  (interactive)
  (when (looking-at "^\"\\([^\"]+\\)\".*?:\\([0-9]+\\)")
    (let ((word (match-string 1))
          (slide (match-string 2))
          definition
          chars)
      (setq definition (sacha/kanji/lookup-word word))
      (setq chars (split-string word "" t))
      (while chars
        (with-current-buffer (find-file-noselect sacha/kanji/output-file)
          (goto-char (point-min))
          (when (re-search-forward (concat "^" (car chars)) nil t)
            (goto-char (line-end-position))
            (insert "<br><span class=\"slide\">S:" slide "</span> <span class=\"word\">" word "</span>")
            (if definition
                (insert " <span class=\"reading\">" (elt definition 0) "</span>"
                        " <span class=\"meaning\">" (elt definition 1) "</span>")
              (insert "???"))))
        (delete (car chars) chars)
        (setq chars (cdr chars))))))

On Technorati: ,

Previous day | Next day

I'd love to hear about any questions, comments, suggestions or links that you might have. Your comments will not be posted on this website immediately, but will be e-mailed to me first. You can use this form to get in touch with me, or e-mail me at [email protected] .

Page: 2005.01.07
Updated: 2005-12-1702:25:5802:25:58+0800
NOTE: ANTI-SPAM MEASURE NOW IN PLACE. Please answer the following question with the right number in order to send me your comment.
What is two minus one? (hint: one ;) )
Name:
E-mail:
URL:
Comments: