Category Archives: japanese

Sampizcat, canna and kinput2

I was tracking down a Redhat Japanese language support problem for someone on #linuxhelp. Sampizcat wanted to turn off the kana-kanji conversion, but it wasn't straightforward, so he or she is doing a reinstall. Wish I could have helped more.

Japanese word list generator

MWAHAHAHAHA! I just pulled off a really neat Emacs hack. <grin> It's Japanese-related. So I've been translating this document for the past two days. It's really slow and boring work because there's no soft copy, so I have to write the characters (blurry because this is a photocopy of a photocopy) using the mouse, and hope I don't make any mistakes along the way. In the course of copying down kanji (Chinese characters) for later translation, I created a spreadsheet with two columns: the kanji word and the number of the slide it appears on. Then I exported that to CSV, opened that in Emacs, and wrote an Emacs Lisp function that split the words up into individual characters. I passed this through shell-command-on-region to sort and uniquify the characters. I then went back to the CSV with words and slide numbers, wrote another Emacs Lisp function that searched edict (Jim Breen's electronic Japanese dictionary) for the words, split the word into individual characters, and filed the word info under each character, also marking words that were not found in the dictionary. After that, I wrote yet another function to add table markup and individual character definitions to each line, then copied the result into an HTML file.

This should probably be rewritten as a Perl script.

;; Generate the list of characters
;; Use add-all to add all the words to the list
;; call sacha/kanji/format-kanji-with-references

(defvar sacha/kanji/output-file "~/tmp/kanji")

(defun sacha/kanji/process-csv ()
  (interactive)
  (sacha/kanji/split-dictionary)
  (sacha/kanji/add-all)
  (sacha/kanji/format-kanji-with-references))

(defun sacha/kanji/split-dictionary ()
  (interactive)
  (let ((buffer (current-buffer)))
    (with-current-buffer (find-file-noselect sacha/kanji/output-file)
      (erase-buffer)
      (insert-buffer-substring buffer)
      (goto-char (point-min))
      (while (re-search-forward "^\"\\([^\"]+\\)\":" nil t)
        (delete-region (match-end 1) (line-end-position))
        (goto-char (line-beginning-position))
        (delete-char 1)
        (while (not (eolp))
          (forward-char 1)
          (unless (bolp)
            (insert "\n"))))
      (goto-char (line-beginning-position))
      (delete-matching-lines "^\\s-*$")
      (shell-command-on-region (point-min) (point-max) "sort | uniq"  nil t))))

(defun sacha/kanji/add-all ()
  (interactive)
  (while (not (eobp))
    (sacha/add-word)
    (forward-line 1)))

(defun sacha/kanji/format-kanji-with-references ()
  "Add character meaning and table markup."
  (interactive)
  (find-file sacha/kanji/output-file)
  (goto-char (point-min))
  (while (not (eobp))
    (goto-char (line-beginning-position))
    (unless (= (char-after (point)) ?<)
      (forward-char 1)
      (let* ((kanji (buffer-substring (line-beginning-position) (point)))
             (definition (sacha/kanji/find-definition kanji)))
        (when definition
          (save-excursion
            (forward-char -1)
            (insert ""))
          (insert "")
          (insert definition)
          (goto-char (line-end-position))
          (insert ""))))
    (forward-line)))

(defun sacha/kanji/find-definition (kanji)
  "Look up kanji definition."
  (with-current-buffer
      (find-file-noselect "/usr/share/edict/kanjidic")
    (goto-char (point-min))
    (when (and (search-forward kanji)
               (re-search-forward "\\({[^}]+}\\( {[^}]+}\\)+\\)"))
      (match-string 0))))  ;; kanji definitions

(defun sacha/kanji/lookup-word (key)
  "Return the definition of the current word. Ensure edict is loaded before running this."
  (with-current-buffer edict-buffer
    (goto-char (point-min))
    (when (re-search-forward (concat "^" key " \\[\\([^]]+\\)\\] /\\(.*\\)") nil t)
      (list (match-string 1) (match-string 2)))))

(defun sacha/add-word ()
  "Look up this word's definition and add the word to individual character entries."
  (interactive)
  (when (looking-at "^\"\\([^\"]+\\)\".*?:\\([0-9]+\\)")
    (let ((word (match-string 1))
          (slide (match-string 2))
          definition
          chars)
      (setq definition (sacha/kanji/lookup-word word))
      (setq chars (split-string word "" t))
      (while chars
        (with-current-buffer (find-file-noselect sacha/kanji/output-file)
          (goto-char (point-min))
          (when (re-search-forward (concat "^" (car chars)) nil t)
            (goto-char (line-end-position))
            (insert "
S:" slide " " word "") (if definition (insert " " (elt definition 0) "" " " (elt definition 1) "") (insert "???")))) (delete (car chars) chars) (setq chars (cdr chars))))))

On Technorati: ,

More hacks for mangling Japanese CSV

This is for use with kdrill.

(defun sacha/kanji/get-ordered-kanji-list ()
  (let (kanji-list)
    (while (not (eobp))
      (let ((c (char-after (point))))
        (cond
         ((= c ?\"))
         ((= c ?\n))
         ((= c ?:) (forward-line 1) (forward-char -1))
         (t (add-to-list 'kanji-list c))))
      (forward-char 1))
    kanji-list))

(defun sacha/kanji/ordered-usefile-to-kill ()
  (interactive)
  ;; Look up kanji in kanjidic
  (let ((list (sacha/kanji/get-ordered-kanji-list)))
    (kill-new
     (with-current-buffer (find-file-noselect "/usr/share/edict/kanjidic")
       (mapconcat
        (lambda (kanji)
          (goto-char (point-min))
          (when (search-forward (char-to-string kanji) nil t)
            (skip-syntax-forward " ")
            (buffer-substring-no-properties (point) (and (skip-syntax-forward "^ ") (point)))))
        list
        "\n")))))

On Technorati: ,

Japanese flashcards

This extracts all kanji in the buffer and converts them to the format expected by flashcard.el.

(defun sacha/kanji/get-ordered-kanji-list ()
  "Return a list of characters in the buffer."
  (goto-char (point-min))
  (let (kanji-list)
    (while (not (eobp))
      (let ((c (char-after (point))))
        (when (>= c ?亜) (add-to-list 'kanji-list c)))
      (forward-char 1))
    kanji-list))

(defun sacha/kanji/to-flashcard-j2e (&optional list)
  "Return a Japanese-English flashcard set.
If LIST is non-nil, use that instead of the current buffer."
  (interactive (list (sacha/kanji/get-ordered-kanji-list)))
  (unless list (setq list (sacha/kanji/get-ordered-kanji-list)))
  (let ((result
         (with-current-buffer (find-file-noselect "/usr/share/edict/kanjidic")
           (mapconcat
            (lambda (kanji)
              (goto-char (point-min))
              (when (re-search-forward (format "^%c.*?{\\(.*\\)}" kanji) nil t)
                (format "%c : %s\n"
                        kanji
                        (replace-regexp-in-string "}\\s-+{" "," (match-string 1)))))
            list
            ""))))
    (if (interactive-p) (kill-new result) result)))

(defun sacha/flashcard-method-leitner-check-answer (card answer)
  "Check answer for correctness. Allow multiple correct answers and provide feedback."
  (if (member answer (split-string (flashcard-card-answer card) ","))
      (progn
        (flashcard-insert "Correct! Answer is:\n"
                          (propertize (flashcard-card-answer card)
                                      'face 'flashcard-answer-face
                                      'rear-nonsticky t)
                          "\n"
                          "\n")
        t)
    (flashcard-insert "The correct answer is:\n"
                      (propertize (flashcard-card-answer card)
                                  'face 'flashcard-answer-face
                                  'rear-nonsticky t)
                      "\n"
                      "\n")
    (y-or-n-p "Was your answer correct? ")))

(setq flashcard-method-check-answer-function 'sacha/flashcard-method-leitner-check-answer)
(add-to-list 'auto-mode-alist '("\\.deck\\'" . flashcard-mode))
(add-hook 'flashcard-mode-hook 'flashcard-add-scroll-to-bottom)
(add-hook 'flashcard-positive-feedback-functions 'flashcard-feedback-highlight-answer)
(add-hook 'flashcard-positive-feedback-functions 'flashcard-feedback-congratulate)
(add-hook 'flashcard-positive-feedback-functions 'flashcard-method-leitner-positive-feedback)

On Technorati: ,

More Emacs evangelization: flashcard

Aris and I are both struggling with far too much kanji. I used a combination of kdrill to gain familiarity with kanji and ../emacs/flashcard.el to drill the meaning into my brain, as flashcard.el requires me to get a question right 5 times in a row before considering it solved. Aris searched the Internet for flashcard programs on Windows and played around with things like Kanji Gold and King Kanji, but couldn't figure out how to import our wordlist into them. Kanji Gold looked promising as it also used EDICT, but I couldn't figure out the magic number at the end of the dictionary entry. With over 200 words in our word list, there was no way we were going to enter those things one by one!

I told him to download Emacs and grab Jorgen Schaefer's flashcard.el from my ../emacs directory. I then grabbed the dictionary file that Kanji Gold couldn't recognized, replaced [ with : to get flashcard to recognize it without problems, then set up a deck for him. I tweaked the default faces a bit—they're horrible on light-colored displays. I copied the suggested feedback config and explained the pigeonhole method to him. I tweaked the checking function so that it checked for substrings and treated empty input as a definitely incorrect answer. He wanted the answers displayed all the time, so I coded that in as well.

The initial word list was too big, so I copied 9 words and put them into a file, then imported them into a deck. Later, when he finishes this deck, I'll show him how to create another colon file and import it. I'll also ask him if he wants to tweak the number of compartments.

He's asked me if I can get YM working in the text editor as well. I'm currently tunneled through Richi's host, but I think I can open a local tunnel for him as well, if he feels like using ERC. 'course, normal YM just might work, and chances are there's a YM-specific client somewhere in Emacs.

I've made no efforts to hide Emacs' complexity. I lean over and drop into Lisp code in front of him because I want him to have a working environment up and running as soon as possible. Who knows? Maybe he'll use Emacs even after the internship. =)

He looks like he's having fun, and certainly appreciates the fact that I can hack the editor to fit how he wants to do things. He wants to match the readings, too, which means I'll need to figure out how leim works under Windows. I'll do that on Monday.

On Technorati: , ,