Updating my audio braindump workflow to take advantage of WhisperX
| emacs, speechtotext, orgI get word timestamps for free when I transcribe with WhisperX, so I can skip the Aeneas alignment step. That means I can update my previous code for handling audio braindumps . Breaking the transcript up into sections Also, I recently updated subed-word-data to colour words based on their transcription score, which draws my attention to things that might be uncertain.
Here's what it looks like when I have the post, the transcript, and the annotated PDF.
Here's what I needed to implement my-audio-braindump-from-whisperx-json
(plus some code from my previous audio braindump workflow):
(defun my-whisperx-word-list (file) (let* ((json-object-type 'alist) (json-array-type 'list)) (seq-mapcat (lambda (seg) (alist-get 'words seg)) (alist-get 'segments (json-read-file file))))) ;; (seq-take (my-whisperx-word-list (my-latest-file "~/sync/recordings" "\\.json")) 10) (defun my-whisperx-insert-word-list (words) "Inserts WORDS with text properties." (require 'subed-word-data) (mapc (lambda (word) (let ((start (point))) (insert (alist-get 'word word)) (subed-word-data--add-word-properties start (point) word) (insert " "))) words)) (defun my-audio-braindump-turn-sections-into-headings () (interactive) (goto-char (point-min)) (while (re-search-forward "START SECTION \\(.+?\\) STOP SECTION" nil t) (replace-match (save-match-data (format "\n*** %s\n" (save-match-data (string-trim (replace-regexp-in-string "^[,\\.]\\|[,\\.]$" "" (match-string 1)))))) nil t) (let ((prop-match (save-excursion (text-property-search-forward 'subed-word-data-start)))) (when prop-match (org-entry-put (point) "START" (format-seconds "%02h:%02m:%02s" (prop-match-value prop-match))))))) (defun my-audio-braindump-split-sentences () (interactive) (goto-char (point-min)) (while (re-search-forward "[a-z]\\. " nil t) (replace-match (concat (string-trim (match-string 0)) "\n") ))) (defun my-audio-braindump-restructure () (interactive) (goto-char (point-min)) (my-subed-fix-common-errors) (org-mode) (my-audio-braindump-prepare-alignment-breaks) (my-audio-braindump-turn-sections-into-headings) (my-audio-braindump-split-sentences) (goto-char (point-min)) (my-remove-filler-words-at-start)) (defun my-audio-braindump-from-whisperx-json (file) (interactive (list (read-file-name "JSON: " "~/sync/recordings/" nil nil nil (lambda (f) (string-match "\\.json\\'" f))))) ;; put them all into a buffer (with-current-buffer (get-buffer-create "*Words*") (erase-buffer) (fundamental-mode) (my-whisperx-insert-word-list (my-whisperx-word-list file)) (my-audio-braindump-restructure) (goto-char (point-min)) (switch-to-buffer (current-buffer)))) (defun my-audio-braindump-process-text (file) (interactive (list (read-file-name "Text: " "~/sync/recordings/" nil nil nil (lambda (f) (string-match "\\.txt\\'" f))))) (with-current-buffer (find-file-noselect file) (my-audio-braindump-restructure) (save-buffer))) ;; (my-audio-braindump-from-whisperx-json (my-latest-file "~/sync/recordings" "\\.json"))
Ideas for next steps:
- I can change my processing script to split up the Whisper TXT into sections and automatically make the PDF with nice sections.
- I can add reminders and other callouts. I can style them, and I can copy reminders into a different section for easier processing.
- I can look into extracting PDF annotations so that I can jump to the next highlight or copy highlighted text.
This is part of my Emacs configuration.
You can comment with Disqus or you can e-mail me at sacha@sachachua.com.