Category Archives: japanese

Sampizcat, canna and kinput2

I was tracking down a Redhat Japanese language support problem for
someone on #linuxhelp. Sampizcat wanted to turn off the kana-kanji
conversion, but it wasn’t straightforward, so he or she is doing a
reinstall. Wish I could have helped more.

Japanese word list generator

MWAHAHAHAHA! I just pulled off a really neat Emacs
hack. <grin> It’s Japanese-related. So I’ve been
translating this document for the past two days. It’s really slow
and boring work because there’s no soft copy, so I have to write
the characters (blurry because this is a photocopy of a
photocopy) using the mouse, and hope I don’t make any mistakes
along the way. In the course of copying down kanji (Chinese
characters) for later translation, I created a spreadsheet with
two columns: the kanji word and the number of the slide it
appears on. Then I exported that to CSV, opened that in Emacs,
and wrote an Emacs Lisp function that split the words up into
individual characters. I passed this through
shell-command-on-region to sort and uniquify the characters. I
then went back to the CSV with words and slide numbers, wrote
another Emacs Lisp function that searched edict (Jim Breen’s
electronic Japanese dictionary) for the words, split the word
into individual characters, and filed the word info under each
character, also marking words that were not found in the
dictionary. After that, I wrote yet another function to add table
markup and individual character definitions to each line, then
copied the result into an HTML file.

This should probably be rewritten as a Perl script.

;; Generate the list of characters
;; Use add-all to add all the words to the list
;; call sacha/kanji/format-kanji-with-references

(defvar sacha/kanji/output-file "~/tmp/kanji")

(defun sacha/kanji/process-csv ()

(defun sacha/kanji/split-dictionary ()
  (let ((buffer (current-buffer)))
    (with-current-buffer (find-file-noselect sacha/kanji/output-file)
      (insert-buffer-substring buffer)
      (goto-char (point-min))
      (while (re-search-forward "^\"\\([^\"]+\\)\":" nil t)
        (delete-region (match-end 1) (line-end-position))
        (goto-char (line-beginning-position))
        (delete-char 1)
        (while (not (eolp))
          (forward-char 1)
          (unless (bolp)
            (insert "\n"))))
      (goto-char (line-beginning-position))
      (delete-matching-lines "^\\s-*$")
      (shell-command-on-region (point-min) (point-max) "sort | uniq"  nil t))))

(defun sacha/kanji/add-all ()
  (while (not (eobp))
    (forward-line 1)))

(defun sacha/kanji/format-kanji-with-references ()
  "Add character meaning and table markup."
  (find-file sacha/kanji/output-file)
  (goto-char (point-min))
  (while (not (eobp))
    (goto-char (line-beginning-position))
    (unless (= (char-after (point)) ?<)
      (forward-char 1)
      (let* ((kanji (buffer-substring (line-beginning-position) (point)))
             (definition (sacha/kanji/find-definition kanji)))
        (when definition
            (forward-char -1)
            (insert ""))
          (insert "")
          (insert definition)
          (goto-char (line-end-position))
          (insert ""))))

(defun sacha/kanji/find-definition (kanji)
  "Look up kanji definition."
      (find-file-noselect "/usr/share/edict/kanjidic")
    (goto-char (point-min))
    (when (and (search-forward kanji)
               (re-search-forward "\\({[^}]+}\\( {[^}]+}\\)+\\)"))
      (match-string 0))))  ;; kanji definitions

(defun sacha/kanji/lookup-word (key)
  "Return the definition of the current word. Ensure edict is loaded before running this."
  (with-current-buffer edict-buffer
    (goto-char (point-min))
    (when (re-search-forward (concat "^" key " \\[\\([^]]+\\)\\] /\\(.*\\)") nil t)
      (list (match-string 1) (match-string 2)))))

(defun sacha/add-word ()
  "Look up this word's definition and add the word to individual character entries."
  (when (looking-at "^\"\\([^\"]+\\)\".*?:\\([0-9]+\\)")
    (let ((word (match-string 1))
          (slide (match-string 2))
      (setq definition (sacha/kanji/lookup-word word))
      (setq chars (split-string word "" t))
      (while chars
        (with-current-buffer (find-file-noselect sacha/kanji/output-file)
          (goto-char (point-min))
          (when (re-search-forward (concat "^" (car chars)) nil t)
            (goto-char (line-end-position))
            (insert "
S:" slide " " word "") (if definition (insert " " (elt definition 0) "" " " (elt definition 1) "") (insert "???")))) (delete (car chars) chars) (setq chars (cdr chars))))))

On Technorati: ,

More hacks for mangling Japanese CSV

This is for use with kdrill.

(defun sacha/kanji/get-ordered-kanji-list ()
  (let (kanji-list)
    (while (not (eobp))
      (let ((c (char-after (point))))
         ((= c ?\"))
         ((= c ?\n))
         ((= c ?:) (forward-line 1) (forward-char -1))
         (t (add-to-list 'kanji-list c))))
      (forward-char 1))

(defun sacha/kanji/ordered-usefile-to-kill ()
  ;; Look up kanji in kanjidic
  (let ((list (sacha/kanji/get-ordered-kanji-list)))
     (with-current-buffer (find-file-noselect "/usr/share/edict/kanjidic")
        (lambda (kanji)
          (goto-char (point-min))
          (when (search-forward (char-to-string kanji) nil t)
            (skip-syntax-forward " ")
            (buffer-substring-no-properties (point) (and (skip-syntax-forward "^ ") (point)))))

On Technorati: ,

Japanese flashcards

This extracts all kanji in the buffer and converts them to the format
expected by flashcard.el.

(defun sacha/kanji/get-ordered-kanji-list ()
  "Return a list of characters in the buffer."
  (goto-char (point-min))
  (let (kanji-list)
    (while (not (eobp))
      (let ((c (char-after (point))))
        (when (>= c ?亜) (add-to-list 'kanji-list c)))
      (forward-char 1))

(defun sacha/kanji/to-flashcard-j2e (&optional list)
  "Return a Japanese-English flashcard set.
If LIST is non-nil, use that instead of the current buffer."
  (interactive (list (sacha/kanji/get-ordered-kanji-list)))
  (unless list (setq list (sacha/kanji/get-ordered-kanji-list)))
  (let ((result
         (with-current-buffer (find-file-noselect "/usr/share/edict/kanjidic")
            (lambda (kanji)
              (goto-char (point-min))
              (when (re-search-forward (format "^%c.*?{\\(.*\\)}" kanji) nil t)
                (format "%c : %s\n"
                        (replace-regexp-in-string "}\\s-+{" "," (match-string 1)))))
    (if (interactive-p) (kill-new result) result)))

(defun sacha/flashcard-method-leitner-check-answer (card answer)
  "Check answer for correctness. Allow multiple correct answers and provide feedback."
  (if (member answer (split-string (flashcard-card-answer card) ","))
        (flashcard-insert "Correct! Answer is:\n"
                          (propertize (flashcard-card-answer card)
                                      'face 'flashcard-answer-face
                                      'rear-nonsticky t)
    (flashcard-insert "The correct answer is:\n"
                      (propertize (flashcard-card-answer card)
                                  'face 'flashcard-answer-face
                                  'rear-nonsticky t)
    (y-or-n-p "Was your answer correct? ")))

(setq flashcard-method-check-answer-function 'sacha/flashcard-method-leitner-check-answer)
(add-to-list 'auto-mode-alist '("\\.deck\\'" . flashcard-mode))
(add-hook 'flashcard-mode-hook 'flashcard-add-scroll-to-bottom)
(add-hook 'flashcard-positive-feedback-functions 'flashcard-feedback-highlight-answer)
(add-hook 'flashcard-positive-feedback-functions 'flashcard-feedback-congratulate)
(add-hook 'flashcard-positive-feedback-functions 'flashcard-method-leitner-positive-feedback)

On Technorati: ,

More Emacs evangelization: flashcard

Aris and I are both struggling with far too much kanji. I used a
combination of kdrill to gain familiarity with kanji and
../emacs/flashcard.el to drill the meaning into my brain, as
flashcard.el requires me to get a question right 5 times in a row
before considering it solved. Aris searched the Internet for flashcard
programs on Windows and played around with things like Kanji Gold and
King Kanji, but couldn’t figure out how to import our wordlist into
them. Kanji Gold looked promising as it also used EDICT, but I
couldn’t figure out the magic number at the end of the dictionary
entry. With over 200 words in our word list, there was no way we were
going to enter those things one by one!

I told him to download Emacs and grab Jorgen Schaefer’s flashcard.el
from my ../emacs directory. I then grabbed the dictionary file that
Kanji Gold couldn’t recognized, replaced [ with : to get flashcard to
recognize it without problems, then set up a deck for him. I tweaked
the default faces a bit—they’re horrible on light-colored displays. I
copied the suggested feedback config and explained the pigeonhole
method to him. I tweaked the checking function so that it checked for
substrings and treated empty input as a definitely incorrect answer.
He wanted the answers displayed all the time, so I coded that in as

The initial word list was too big, so I copied 9 words and put them
into a file, then imported them into a deck. Later, when he finishes
this deck, I’ll show him how to create another colon file and import it.
I’ll also ask him if he wants to tweak the number of compartments.

He’s asked me if I can get YM working in the text editor as well. I’m
currently tunneled through Richi’s host, but I think I can open a
local tunnel for him as well, if he feels like using ERC. ‘course,
normal YM just might work, and chances are there’s a YM-specific
client somewhere in Emacs.

I’ve made no efforts to hide Emacs’ complexity. I lean over and drop
into Lisp code in front of him because I want him to have a working
environment up and running as soon as possible. Who knows? Maybe he’ll
use Emacs even after the internship. =)

He looks like he’s having fun, and certainly appreciates the fact that
I can hack the editor to fit how he wants to do things. He wants to
match the readings, too, which means I’ll need to figure out how leim
works under Windows. I’ll do that on Monday.

On Technorati: , ,