Category Archives: japanese

More hacks for mangling Japanese CSV

This is for use with kdrill.

(defun sacha/kanji/get-ordered-kanji-list ()
  (let (kanji-list)
    (while (not (eobp))
      (let ((c (char-after (point))))
         ((= c ?\"))
         ((= c ?\n))
         ((= c ?:) (forward-line 1) (forward-char -1))
         (t (add-to-list 'kanji-list c))))
      (forward-char 1))

(defun sacha/kanji/ordered-usefile-to-kill ()
  ;; Look up kanji in kanjidic
  (let ((list (sacha/kanji/get-ordered-kanji-list)))
     (with-current-buffer (find-file-noselect "/usr/share/edict/kanjidic")
        (lambda (kanji)
          (goto-char (point-min))
          (when (search-forward (char-to-string kanji) nil t)
            (skip-syntax-forward " ")
            (buffer-substring-no-properties (point) (and (skip-syntax-forward "^ ") (point)))))

On Technorati: ,

Japanese word list generator

MWAHAHAHAHA! I just pulled off a really neat Emacs hack. <grin> It's Japanese-related. So I've been translating this document for the past two days. It's really slow and boring work because there's no soft copy, so I have to write the characters (blurry because this is a photocopy of a photocopy) using the mouse, and hope I don't make any mistakes along the way. In the course of copying down kanji (Chinese characters) for later translation, I created a spreadsheet with two columns: the kanji word and the number of the slide it appears on. Then I exported that to CSV, opened that in Emacs, and wrote an Emacs Lisp function that split the words up into individual characters. I passed this through shell-command-on-region to sort and uniquify the characters. I then went back to the CSV with words and slide numbers, wrote another Emacs Lisp function that searched edict (Jim Breen's electronic Japanese dictionary) for the words, split the word into individual characters, and filed the word info under each character, also marking words that were not found in the dictionary. After that, I wrote yet another function to add table markup and individual character definitions to each line, then copied the result into an HTML file.

This should probably be rewritten as a Perl script.

;; Generate the list of characters
;; Use add-all to add all the words to the list
;; call sacha/kanji/format-kanji-with-references

(defvar sacha/kanji/output-file "~/tmp/kanji")

(defun sacha/kanji/process-csv ()

(defun sacha/kanji/split-dictionary ()
  (let ((buffer (current-buffer)))
    (with-current-buffer (find-file-noselect sacha/kanji/output-file)
      (insert-buffer-substring buffer)
      (goto-char (point-min))
      (while (re-search-forward "^\"\\([^\"]+\\)\":" nil t)
        (delete-region (match-end 1) (line-end-position))
        (goto-char (line-beginning-position))
        (delete-char 1)
        (while (not (eolp))
          (forward-char 1)
          (unless (bolp)
            (insert "\n"))))
      (goto-char (line-beginning-position))
      (delete-matching-lines "^\\s-*$")
      (shell-command-on-region (point-min) (point-max) "sort | uniq"  nil t))))

(defun sacha/kanji/add-all ()
  (while (not (eobp))
    (forward-line 1)))

(defun sacha/kanji/format-kanji-with-references ()
  "Add character meaning and table markup."
  (find-file sacha/kanji/output-file)
  (goto-char (point-min))
  (while (not (eobp))
    (goto-char (line-beginning-position))
    (unless (= (char-after (point)) ?<)
      (forward-char 1)
      (let* ((kanji (buffer-substring (line-beginning-position) (point)))
             (definition (sacha/kanji/find-definition kanji)))
        (when definition
            (forward-char -1)
            (insert ""))
          (insert "")
          (insert definition)
          (goto-char (line-end-position))
          (insert ""))))

(defun sacha/kanji/find-definition (kanji)
  "Look up kanji definition."
      (find-file-noselect "/usr/share/edict/kanjidic")
    (goto-char (point-min))
    (when (and (search-forward kanji)
               (re-search-forward "\\({[^}]+}\\( {[^}]+}\\)+\\)"))
      (match-string 0))))  ;; kanji definitions

(defun sacha/kanji/lookup-word (key)
  "Return the definition of the current word. Ensure edict is loaded before running this."
  (with-current-buffer edict-buffer
    (goto-char (point-min))
    (when (re-search-forward (concat "^" key " \\[\\([^]]+\\)\\] /\\(.*\\)") nil t)
      (list (match-string 1) (match-string 2)))))

(defun sacha/add-word ()
  "Look up this word's definition and add the word to individual character entries."
  (when (looking-at "^\"\\([^\"]+\\)\".*?:\\([0-9]+\\)")
    (let ((word (match-string 1))
          (slide (match-string 2))
      (setq definition (sacha/kanji/lookup-word word))
      (setq chars (split-string word "" t))
      (while chars
        (with-current-buffer (find-file-noselect sacha/kanji/output-file)
          (goto-char (point-min))
          (when (re-search-forward (concat "^" (car chars)) nil t)
            (goto-char (line-end-position))
            (insert "
S:" slide " " word "") (if definition (insert " " (elt definition 0) "" " " (elt definition 1) "") (insert "???")))) (delete (car chars) chars) (setq chars (cdr chars))))))

On Technorati: ,

Sampizcat, canna and kinput2

I was tracking down a Redhat Japanese language support problem for someone on #linuxhelp. Sampizcat wanted to turn off the kana-kanji conversion, but it wasn't straightforward, so he or she is doing a reinstall. Wish I could have helped more.