Categories: geek

RSS - Atom - Subscribe via email

Updating my audio braindump workflow to take advantage of WhisperX

| emacs, speechtotext, org

I get word timestamps for free when I transcribe with WhisperX, so I can skip the Aeneas alignment step. That means I can update my previous code for handling audio braindumps . Breaking the transcript up into sections Also, I recently updated subed-word-data to colour words based on their transcription score, which draws my attention to things that might be uncertain.

Here's what it looks like when I have the post, the transcript, and the annotated PDF.

2024-11-17_20-44-30.png
Figure 1: Screenshot of draft, transcript, and PDF

Here's what I needed to implement my-audio-braindump-from-whisperx-json (plus some code from my previous audio braindump workflow):

(defun my-whisperx-word-list (file)
  (let* ((json-object-type 'alist)
         (json-array-type 'list))
    (seq-mapcat (lambda (seg)
                  (alist-get 'words seg))
                (alist-get 'segments (json-read-file file)))))

;; (seq-take (my-whisperx-word-list (my-latest-file "~/sync/recordings" "\\.json")) 10)
(defun my-whisperx-insert-word-list (words)
  "Inserts WORDS with text properties."
  (require 'subed-word-data)
  (mapc (lambda (word)
            (let ((start (point)))
              (insert
               (alist-get 'word word))
              (subed-word-data--add-word-properties start (point) word)
              (insert " ")))
        words))

(defun my-audio-braindump-turn-sections-into-headings ()
  (interactive)
  (goto-char (point-min))
  (while (re-search-forward "START SECTION \\(.+?\\) STOP SECTION" nil t)
    (replace-match
     (save-match-data
       (format
        "\n*** %s\n"
        (save-match-data (string-trim (replace-regexp-in-string "^[,\\.]\\|[,\\.]$" "" (match-string 1))))))
     nil t)
    (let ((prop-match (save-excursion (text-property-search-forward 'subed-word-data-start))))
      (when prop-match
        (org-entry-put (point) "START" (format-seconds "%02h:%02m:%02s" (prop-match-value prop-match)))))))

(defun my-audio-braindump-split-sentences ()
  (interactive)
  (goto-char (point-min))
  (while (re-search-forward "[a-z]\\. " nil t)
    (replace-match (concat (string-trim (match-string 0)) "\n") )))

(defun my-audio-braindump-restructure ()
  (interactive)
  (goto-char (point-min))
  (my-subed-fix-common-errors)
  (org-mode)
  (my-audio-braindump-prepare-alignment-breaks)
  (my-audio-braindump-turn-sections-into-headings)
  (my-audio-braindump-split-sentences)
  (goto-char (point-min))
  (my-remove-filler-words-at-start))

(defun my-audio-braindump-from-whisperx-json (file)
  (interactive (list (read-file-name "JSON: " "~/sync/recordings/" nil nil nil (lambda (f) (string-match "\\.json\\'" f)))))
  ;; put them all into a buffer
  (with-current-buffer (get-buffer-create "*Words*")
    (erase-buffer)
    (fundamental-mode)
    (my-whisperx-insert-word-list (my-whisperx-word-list file))
    (my-audio-braindump-restructure)
    (goto-char (point-min))
    (switch-to-buffer (current-buffer))))

(defun my-audio-braindump-process-text (file)
  (interactive (list (read-file-name "Text: " "~/sync/recordings/" nil nil nil (lambda (f) (string-match "\\.txt\\'" f)))))
  (with-current-buffer (find-file-noselect file)
    (my-audio-braindump-restructure)
    (save-buffer)))
;; (my-audio-braindump-from-whisperx-json (my-latest-file "~/sync/recordings" "\\.json"))

Ideas for next steps:

  • I can change my processing script to split up the Whisper TXT into sections and automatically make the PDF with nice sections.
  • I can add reminders and other callouts. I can style them, and I can copy reminders into a different section for easier processing.
  • I can look into extracting PDF annotations so that I can jump to the next highlight or copy highlighted text.
This is part of my Emacs configuration.
View org source for this post

2024-11-18 Emacs news

| emacs, emacs-news

Links from reddit.com/r/emacs, r/orgmode, r/spacemacs, r/planetemacs, Mastodon #emacs, Hacker News, lobste.rs, programming.dev, lemmy.world, lemmy.ml, communick.news, planet.emacslife.com, YouTube, the Emacs NEWS file, Emacs Calendar, and emacs-devel. Thanks to Andrés Ramírez for emacs-devel links. Do you have an Emacs-related link or announcement? Please e-mail me at sacha@sachachua.com. Thank you!

View org source for this post

Changing Org Mode underlines to the HTML mark element

| org

Apparently, HTML has a mark element that is useful for highlighting. ox-html.el in Org Mode doesn't seem to export that yet. I don't use _ to underline things because I don't want that confused with links. Maybe I can override org-html-text-markup-alist to use it for my own purposes…

(with-eval-after-load 'org
  (setf (alist-get 'underline org-html-text-markup-alist)
        "<mark>%s</mark>"))

Okay, let's try it with:

Let's see _how that works._

Let's see how that works. Oooh, that's promising.

Now, what if I want something fancier, like the way it can be nice to use different-coloured highlighters when marking up notes in order to make certain things jump out easily? A custom link might come in handy.

(defun my-org-highlight-export (link desc format _)
  (pcase format
    ((or '11ty 'html)
     (format "<mark%s>%s</mark>"
             (if link
                 (format " class=\"%s\"" link)
               link)
             desc))))
(with-eval-after-load 'org
  (org-link-set-parameters "hl" :export 'my-org-highlight-export)
  )

A green highlight might be good for ideas, while red might be good for warnings. (Idea: I wonder how to font-lock them differently in Emacs…)

I shouldn't rely only on the colours, since people reading through RSS won't get them and also since some people are colour-blind. Still, the highlights could make my blog posts easier to skim on my website.

Of course, now I want to port Prot's excellent colours from the Modus themes over to CSS variables so that I can have colours that make sense in both light mode and dark mode. Here's a snippet that exports the colours from one of the themes:

(format ":root {\n%s\n}\n"
        (mapconcat
         (lambda (entry)
           (format "  --modus-%s: %s;"
                   (symbol-name (car entry))
                   (if (stringp (cadr entry))
                       (cadr entry)
                     (format "var(--modus-%s)" (symbol-name (cadr entry))))))
         modus-operandi-palette
         "\n"))

So now my style.css has:

/* Based on Modus Operandi by Protesilaous Stavrou */
:root {
   // ...
   --modus-bg-red-subtle: #ffcfbf;
   --modus-bg-green-subtle: #b3fabf;
   --modus-bg-yellow-subtle: #fff576;
   // ...
}
@media (prefers-color-scheme: dark) {
   /* Based on Modus Vivendi by Protesilaous Stavrou */
   :root {
      // ...
      --modus-bg-red-subtle: #620f2a;
      --modus-bg-green-subtle: #00422a;
      --modus-bg-yellow-subtle: #4a4000;
      // ...
   }
}
mark { background-color: var(--modus-bg-yellow-subtle) }
mark.green { background-color: var(--modus-bg-green-subtle) }
mark.red { background-color: var(--modus-bg-red-subtle) }

Interesting, interesting…

View org source for this post

Checking caption timing by skimming with Emacs Lisp or JS

| js, emacs, subed

Sometimes automatic subtitle timing tools like Aeneas can get confused by silences, extraneous sounds, filler words, mis-starts, and text that I've edited out of the raw captions for easier readability. It's good to quickly check each caption. I used to listen to captions at 1.5x speed, watching carefully as each caption displayed. This took a fair bit of time and focus, so… it usually didn't happen. Sampling the first second of each caption is faster and requires a little less attention.

Skimming with subed.el

Here's a function that I wrote to play the first second of each subtitle.

(defvar my-subed-skim-msecs 1000 "Number of milliseconds to play when skimming.")
(defun my-subed-skim-starts ()
  (interactive)
  (subed-mpv-unpause)
  (subed-disable-loop-over-current-subtitle)
  (catch 'done
    (while (not (eobp))
      (subed-mpv-jump-to-current-subtitle)
      (let ((ch
             (read-char "(q)uit? " nil (/ my-subed-skim-msecs 1000.0))))
        (when ch
          (throw 'done t)))
      (subed-forward-subtitle-time-start)
      (when (and subed-waveform-minor-mode
                 (not subed-waveform-show-all))
        (subed-waveform-refresh))
      (recenter)))
  (subed-mpv-pause))

Now I can read the lines as the subtitles play, and I can press any key to stop so that I can fix timestamps.

Skimming with Javascript

I also want to check the times on the Web in case there have been caching issues. Here's some Javascript to skim the first second of each cue in the first text track for a video, with some code to make it easy to process the first video in the visible area.

function getVisibleVideo() {
  const videos = document.querySelectorAll('video');
  for (const video of videos) {
    const rect = video.getBoundingClientRect();
    if (
      rect.top >= 0 &&
      rect.left >= 0 &&
      rect.bottom <= (window.innerHeight || document.documentElement.clientHeight) &&
      rect.right <= (window.innerWidth || document.documentElement.clientWidth)
    ) {
      return video;
    }
  }
  return null;
}

async function skimVideo(video=getVisibleVideo(), msecs=1000) {
  // Get the first text track (assumed to be captions/subtitles)
  const textTrack = video.textTracks[0];
  if (!textTrack) return;
  const remaining = [...textTrack.cues].filter((cue) => cue.endTime >= video.currentTime);
  video.play();
  // Play the first 1 second of each visible subtitle
  for (let i = 0; i < remaining.length && !video.paused; i++) {
    video.currentTime = remaining[i].startTime;
    await new Promise((resolve) => setTimeout(resolve, msecs));
  }
}

Then I can call it with skimVideo();. Actually, in our backstage area, it might be useful to add a Skim button so that I can skim things from my phone.

function handleSkimButton(event) {
   const vid = event.target.closest('.vid').querySelector('video');
   skimVideo(vid);
 }

document.querySelectorAll('video').forEach((vid) => {
   const div = document.createElement('div');
   const skim = document.createElement('button');
   skim.textContent = 'Skim';
   div.appendChild(skim);
   vid.parentNode.insertBefore(div, vid.nextSibling);
   skim.addEventListener('click', handleSkimButton);
});

Results

How much faster is it this way?

Some code to help figure out the speedup
(-let* ((files (directory-files "~/proj/emacsconf/2024/cache" t "--main\\.vtt"))
        ((count-subs sum-seconds)
         (-unzip (mapcar
                  (lambda (file)
                    (list
                     (length (subed-parse-file file))
                     (/ (compile-media-get-file-duration-ms
                         (concat (file-name-sans-extension file) ".webm")) 1000.0)))
                  files)))
        (total-seconds (-reduce #'+ sum-seconds))
        (total-subs (-reduce #'+ count-subs)))
  (format "%d files, %.1f hours, %d total captions, speed up of %.1f"
          (length files)
          (/ total-seconds 3600.0)
          total-subs
          (/ total-seconds total-subs)))

It looks like for EmacsConf talks where we typically format captions to be one long line each (< 60 characters), this can be a speed-up of about 4x compared to listening to the video at normal speed. More usefully, it's different enough to get my brain to do it instead of putting it off.

Most of the automatically-generated timestamps are fine. It's just a few that might need tweaking. It's nice to be able to skim them with fewer keystrokes.

View org source for this post

Yay Emacs 7: Using word timing in caption editing with subed-word-data

| emacs, yay-emacs, subed

When I work with video captions, I often want to split long captions using subed-split-subtitle. If my player is somewhere in the current subtitle, it'll use that timestamp. If not, it'll make a reasonable guess based on character position.

I can use subed-word-data.el to load word-level times from WhisperX JSON or from Youtube SRV2 files. This allows me to split a subtitle using the timestamp for that word.

Because subed-word-data colours words based on transcription confidence, I can see where something might need to be closely examined, like when there's no timing information for the words at the start or end.

If I combine that with subed-waveform, I can see silences. Then I can tweak start times by shift-left-clicking on the waveform. This automatically adjusts the end time of the previous subtitle too.

I like how Emacs makes it easy to use word timing data when editing captions. Yay Emacs!

You can watch this on YouTube, download the video, or download the audio.

Note: Sometimes WhisperX gives me overlapping timestamps for captions, so I use M-x subed-align to get the aeneas forced alignment tool to give me subtitle-level timestamps. Then I use the word-level data from WhisperX for further splitting.

Links:

Aside: I was trying to find some kind of value-to-color translator for Emacs Lisp for easier visualization, like the way the d3 Javascript library makes it easy to translate a range of numbers (say, linear 0.0 to 1.0) to colors (ex: red-yellow-green). I found color-hsl-to-rgb and also the range of colours defined by the faces calendar-scale-1 to calendar-scale-10. There's also prism, which colours code by depth and allows people to specify the colour transformations (saturation, lightness, etc.). I wonder if someone's already written a general-purpose data-to-fg/bg-color Elisp library that supports numerical and categorical data…

View org source for this post

2024-11-11 Emacs news

| emacs, emacs-news

Links from reddit.com/r/emacs, r/orgmode, r/spacemacs, r/planetemacs, Mastodon #emacs, Hacker News, lobste.rs, programming.dev, lemmy.world, lemmy.ml, communick.news, planet.emacslife.com, YouTube, the Emacs NEWS file, Emacs Calendar, and emacs-devel. Thanks to Andrés Ramírez for emacs-devel links. Do you have an Emacs-related link or announcement? Please e-mail me at sacha@sachachua.com. Thank you!

View org source for this post

Using a coloured template on my Supernote A5X

Posted: - Modified: | supernote, design

[2024-11-14 Thu]: stefanvdwalt suggested using hue-rotate in the filter, ooooh. I tweaked my CSS to do hue-rotate to get back to the original colours and boosted the brightness slightly so that the yellow feels more like a highlighter. I also changed my dark red colour to a medium-gray colour, which is more flexible for shading and for layout cues.

The Supernote A5X is an e-ink notebook that lets me draw in black, white, and two shades of gray. It has a drawing app that supports other shades of gray, but the main notebook app and the PDF annotation is limited to those two shades of gray.

I like to use a dotted grid in order to write in neat lines. I used to manually change this template to a white one before exporting. Then it occurred to me to make a coloured template:

dot-grid-blue-quad(1).png

Using colour lets me use a darker grid, which is more visible on the Supernote, while still letting that grid blend into the background if I export without processing. Screen mirroring shares the grayscale version, though.

I use my recoloring script to change #a6d2ff (light blue) to #ffffff (white).

Here's the SVG source in case you want to customize it. When I exported the PNG from Inkscape, I needed to make sure that antialiasing was turned off. This involved unchecking the "Hide export settings" checkbox in the Export dialog, then setting Antialias to 0. source

My current color scheme is 9d9d9d,c2c2c2,c9c9c9,f6f396,cacaca,f6f396,a6d2ff,ffffff', which maps light gray to a highlighter sort of yellow and dark gray to a light gray. I used to map the dark gray to a dark red like the links on my site, but light gray is more flexible for shading and layout.

Anyway, here's an example of the export from my Supernote and the result after processing:

Books_Page_17.png
Figure 1: Before processing
2024-10-26-01%20How%20to%20Take%20Smart%20Notes%20-%20Sonke%20Ahrens%202017%20#visual-book-notes%20%23writing%20%23pkm%20%23book.png
Figure 2: After processing

(This sketch is How to Take Smart Notes, one of my visual book notes.)

I use a CSS rule to invert my sketch colours when viewed in dark mode:

@media (prefers-color-scheme: dark) {
    .sketch-full img, .gallery img, .left-doodle, .right-doodle, .center-doodle { filter: invert(1) hue-rotate(180deg) brightness(150%) contrast(0.9); }
}

which is not fine-tuned or amazing, but it reduces the glare from the white background when I browse on my phone at night.

2024-11-14_08-07-53.png
Figure 3: Screenshot of sketch in dark mode

Sometimes I switch things around and use blue/dark blue instead. I now have some Emacs Lisp code to let me somewhat interactively recolour a sketch from the Emacs text editor so that I can change the colours in a sketch as I'm writing a post about it.

Using a coloured template and a script to change the colours around has made my Supernote workflow more convenient. I don't need to change the template on new pages. I just export the image, sync with Dropbox or use the Browse & Access feature, and run my processing script. My processing script also uses Google Cloud Vision to recognize the text, rename the sketch, and file it in the appropriate directory, so it's pretty smooth. It's pretty idiosyncratic, but maybe you might be able to adapt the ideas to your own setup. Hope this helps!

View org source for this post