Updating my audio braindump workflow to take advantage of WhisperX

| emacs, speechtotext, org

I get word timestamps for free when I transcribe with WhisperX, so I can skip the Aeneas alignment step. That means I can update my previous code for handling audio braindumps . Breaking the transcript up into sections Also, I recently updated subed-word-data to colour words based on their transcription score, which draws my attention to things that might be uncertain.

Here's what it looks like when I have the post, the transcript, and the annotated PDF.

2024-11-17_20-44-30.png
Figure 1: Screenshot of draft, transcript, and PDF

Here's what I needed to implement my-audio-braindump-from-whisperx-json (plus some code from my previous audio braindump workflow):

(defun my-whisperx-word-list (file)
  (let* ((json-object-type 'alist)
         (json-array-type 'list))
    (seq-mapcat (lambda (seg)
                  (alist-get 'words seg))
                (alist-get 'segments (json-read-file file)))))

;; (seq-take (my-whisperx-word-list (my-latest-file "~/sync/recordings" "\\.json")) 10)
(defun my-whisperx-insert-word-list (words)
  "Inserts WORDS with text properties."
  (require 'subed-word-data)
  (mapc (lambda (word)
            (let ((start (point)))
              (insert
               (alist-get 'word word))
              (subed-word-data--add-word-properties start (point) word)
              (insert " ")))
        words))

(defun my-audio-braindump-turn-sections-into-headings ()
  (interactive)
  (goto-char (point-min))
  (while (re-search-forward "START SECTION \\(.+?\\) STOP SECTION" nil t)
    (replace-match
     (save-match-data
       (format
        "\n*** %s\n"
        (save-match-data (string-trim (replace-regexp-in-string "^[,\\.]\\|[,\\.]$" "" (match-string 1))))))
     nil t)
    (let ((prop-match (save-excursion (text-property-search-forward 'subed-word-data-start))))
      (when prop-match
        (org-entry-put (point) "START" (format-seconds "%02h:%02m:%02s" (prop-match-value prop-match)))))))

(defun my-audio-braindump-split-sentences ()
  (interactive)
  (goto-char (point-min))
  (while (re-search-forward "[a-z]\\. " nil t)
    (replace-match (concat (string-trim (match-string 0)) "\n") )))

(defun my-audio-braindump-restructure ()
  (interactive)
  (goto-char (point-min))
  (my-subed-fix-common-errors)
  (org-mode)
  (my-audio-braindump-prepare-alignment-breaks)
  (my-audio-braindump-turn-sections-into-headings)
  (my-audio-braindump-split-sentences)
  (goto-char (point-min))
  (my-remove-filler-words-at-start))

(defun my-audio-braindump-from-whisperx-json (file)
  (interactive (list (read-file-name "JSON: " "~/sync/recordings/" nil nil nil (lambda (f) (string-match "\\.json\\'" f)))))
  ;; put them all into a buffer
  (with-current-buffer (get-buffer-create "*Words*")
    (erase-buffer)
    (fundamental-mode)
    (my-whisperx-insert-word-list (my-whisperx-word-list file))
    (my-audio-braindump-restructure)
    (goto-char (point-min))
    (switch-to-buffer (current-buffer))))

(defun my-audio-braindump-process-text (file)
  (interactive (list (read-file-name "Text: " "~/sync/recordings/" nil nil nil (lambda (f) (string-match "\\.txt\\'" f)))))
  (with-current-buffer (find-file-noselect file)
    (my-audio-braindump-restructure)
    (save-buffer)))
;; (my-audio-braindump-from-whisperx-json (my-latest-file "~/sync/recordings" "\\.json"))

Ideas for next steps:

  • I can change my processing script to split up the Whisper TXT into sections and automatically make the PDF with nice sections.
  • I can add reminders and other callouts. I can style them, and I can copy reminders into a different section for easier processing.
  • I can look into extracting PDF annotations so that I can jump to the next highlight or copy highlighted text.
This is part of my Emacs configuration.
View org source for this post
You can comment with Disqus or you can e-mail me at sacha@sachachua.com.