Remove filler words at the start and upcase the next word

| audio, speechtotext, emacs

Like many people, I tend to use "So", "And", "You know", and "Uh" to bridge between sentences when thinking. WhisperX does a reasonable job of detecting sentences and splitting them up anyway, but it leaves those filler words in at the start of the sentence. I usually like to remove these from transcripts so that they read more smoothly.

Here's a short Emacs Lisp function that removes those filler words when they start a sentence, capitalizing the next word. When called interactively, it prompts while displaying an overlay. When called from Emacs Lisp, it changes without asking for confirmation.

(defvar my-filler-words-regexp "\\. \\(?:so,?\\|and\\|you know,\\|uh,?\\) \\(.\\)")
(defun my-remove-filler-words-at-start ()
  (interactive)
  (save-excursion
    (while (re-search-forward my-filler-words-regexp nil t)
      (if (and (called-interactively-p) (not current-prefix-arg))
          (let ((overlay (make-overlay (match-beginning 0)
                                       (match-end 0))))
            (overlay-put overlay 'common-edit t)
            (overlay-put
             overlay 'display
             (propertize (concat (match-string 0) " -> . "
                                 (upcase (match-string 1)))
                         'face 'modus-themes-mark-sel))
            (unwind-protect
                (pcase (read-char-choice "Replace (y/n/!/q)? " "yn!q")
                  (?!
                   (replace-match (concat ". " (upcase (match-string 1))) t)
                   (while (re-search-forward "\\. \\(?:So\\|And\\) \\(.\\)" nil t)
                     (replace-match (concat ". " (upcase (match-string 1))) t)))
                  (?y
                   (replace-match (concat ". " (upcase (match-string 1))) t))
                  (?n nil)
                  (?q (goto-char (point-max))))
              (delete-overlay overlay)))
        (replace-match (concat ". " (upcase (match-string 1))) t)))))
This is part of my Emacs configuration.
View org source for this post

Wednesday weblog: week ending November 20, 2024

| weekly, weblog, review
  • Reflection on writing style - 2024-11-18T00:44:18.080Z

    I notice that I have a lot more fun writing tiny workflow tweaks (mostly #Emacs ) and sharing them on my blog versus, say, insightful reflections developed over a longer period of time. I think it's the payoff of being able to enjoy those tweaks. Sometimes abstract thoughts help me come to realizations that I can then try to use to change my concrete behaviours, but it's a lot less straightforward.

    Also, I notice that I prefer to write with a curious, exploratory tone instead of an authoritative one, which is probably also related to my focus on "I" rather than "you". Kinda like: here's what I'm experimenting with, sharing in case it's helpful (and also because I want to be able to find it again), everyone's different and that's awesome, curious about what works for you. :) I'm glad other people can pull off being authoritative/persuasive, though.

    23+ years #blogging and still learning more!

  • Sketchnote blogs - 2024-11-17T18:19:47.826Z

    I'm surprised by how few active blogs I could find about #sketchnotes (or had a category feed for sketchnotes). It's mostly rohdesign and Verbal to Visual, I think. Sketchnote Army still comes out with episodes, but the posts themselves don't seem to be very visual, so people have to click through to the person's website. I guess a lot of people are on Instagram, but that doesn't seem to support RSS any more, and I'm not really keen on scrolling through that. Ah well!

  • dark mode sketch filter - 2024-11-14T13:41:01.165Z

    I tweaked my dark-mode sketch CSS rule thanks to stefanvdwalt's comment. Now I've got

      @media (prefers-color-scheme: dark) {
      .sketch-full img, .gallery img, .left-doodle, .right-doodle,
      .center-doodle { filter: invert(1) hue-rotate(180deg) brightness(150%)
      contrast(0.9); }
      }
    

    Updated: https://sachachua.com/blog/2024/11/using-a-coloured-template-on-my-supernote-a5x/

  • Researched BBB hosting options and compared the costs with self-hosting on Linode.
  • Checked the shell scripts to make sure that hosts can start the videos by using shortcuts.

Quotes

  • Excerpts from Rebecca Solnit's "A Field Guide to Getting Lost" (2006) - 2024-11-19T12:52:11.878Z

    One of the books that has just arrived from the library is "A Field Guide to Getting Lost" (Rebecca Solnit, 2006), which was recommended to me by @janoli .

    Here are some snippets that have resonated with me so far:

    p5. Love, wisdom, grace, inspiration–how do you go about finding these things that are in some ways about extending the boundaries of the self into unknown territory, about becoming someone else?

    p10. and there's another art of being at home in the unknown, so that being in its midst isn't cause for panic or suffering, of being at home with being lost.

    p14. The historian Aaron Sachs, about explorers: "In my opinion, their most important skill was simply a sense of optimism about surviving and finding their way."

    p80. Even in the everyday world of the present, an anxiety to survive manifests itself in cars and clothes for far more rugged occasions than those at hand, as though to express some sense of the toughness of things and of readiness to face them. But the real difficulties, the real arts of survival, seem to lie in more subtle realms. There, what's called for is a kind of resilience of the psyche, a readiness to deal with what comes next.

    p99. Probably it had its origins in protective urges, but it had gone sour long ago.

  • Excerpts from Bill Watterson's speech at Kenyon College in 1990 - 2024-11-19T12:19:15.286Z

    Thanks to @kims for sharing Bill Watterson's speech at Kenyon College, Gambier Ohio, to the 1990 graduating class (https://web.mit.edu/jmorzins/www/C-H-speech.html)

    This section particularly resonated with me: "Creating a life that reflects your values and satisfies your soul is a rare achievement. In a culture that relentlessly promotes avarice and excess as the good life, a person happy doing his own work is usually considered an eccentric, if not a subversive. Ambition is only understood if it's to rise to the top of some imaginary ladder of success. Someone who takes an undemanding job because it affords him the time to pursue other interests and activities is considered a flake. A person who abandons a career in order to stay home and raise children is considered not to be living up to his potential-as if a job title and salary are the sole measure of human worth."

    I also appreciated his resistance to commercializing Calvin & Hobbes:
    "Selling out is usually more a matter of buying in. Sell out, and you're really buying into someone else's system of values, rules and rewards.
    The so-called 'opportunity' I faced would have meant giving up my individual voice for that of a money-grubbing corporation. It would have meant my purpose in writing was to sell things, not say things. My pride in craft would be sacrificed to the efficiency of mass production and the work of assistants. Authorship would become committee decision. Creativity would become work for pay. Art would turn into commerce. In short, money was supposed to supply all the meaning I'd need.
    What the syndicate wanted to do, in other words, was turn my comic strip into everything calculated, empty and robotic that I hated about my old job. They would turn my characters into television hucksters and T-shirt sloganeers and deprive me of characters that actually expressed my own thoughts."

Other links

View org source for this post

Updating my audio braindump workflow to take advantage of WhisperX

| emacs, speechtotext, org

I get word timestamps for free when I transcribe with WhisperX, so I can skip the Aeneas alignment step. That means I can update my previous code for handling audio braindumps . Breaking the transcript up into sections Also, I recently updated subed-word-data to colour words based on their transcription score, which draws my attention to things that might be uncertain.

Here's what it looks like when I have the post, the transcript, and the annotated PDF.

2024-11-17_20-44-30.png
Figure 1: Screenshot of draft, transcript, and PDF

Here's what I needed to implement my-audio-braindump-from-whisperx-json (plus some code from my previous audio braindump workflow):

(defun my-whisperx-word-list (file)
  (let* ((json-object-type 'alist)
         (json-array-type 'list))
    (seq-mapcat (lambda (seg)
                  (alist-get 'words seg))
                (alist-get 'segments (json-read-file file)))))

;; (seq-take (my-whisperx-word-list (my-latest-file "~/sync/recordings" "\\.json")) 10)
(defun my-whisperx-insert-word-list (words)
  "Inserts WORDS with text properties."
  (require 'subed-word-data)
  (mapc (lambda (word)
            (let ((start (point)))
              (insert
               (alist-get 'word word))
              (subed-word-data--add-word-properties start (point) word)
              (insert " ")))
        words))

(defun my-audio-braindump-turn-sections-into-headings ()
  (interactive)
  (goto-char (point-min))
  (while (re-search-forward "START SECTION \\(.+?\\) STOP SECTION" nil t)
    (replace-match
     (save-match-data
       (format
        "\n*** %s\n"
        (save-match-data (string-trim (replace-regexp-in-string "^[,\\.]\\|[,\\.]$" "" (match-string 1))))))
     nil t)
    (let ((prop-match (save-excursion (text-property-search-forward 'subed-word-data-start))))
      (when prop-match
        (org-entry-put (point) "START" (format-seconds "%02h:%02m:%02s" (prop-match-value prop-match)))))))

(defun my-audio-braindump-split-sentences ()
  (interactive)
  (goto-char (point-min))
  (while (re-search-forward "[a-z]\\. " nil t)
    (replace-match (concat (string-trim (match-string 0)) "\n") )))

(defun my-audio-braindump-restructure ()
  (interactive)
  (goto-char (point-min))
  (my-subed-fix-common-errors)
  (org-mode)
  (my-audio-braindump-prepare-alignment-breaks)
  (my-audio-braindump-turn-sections-into-headings)
  (my-audio-braindump-split-sentences)
  (goto-char (point-min))
  (my-remove-filler-words-at-start))

(defun my-audio-braindump-from-whisperx-json (file)
  (interactive (list (read-file-name "JSON: " "~/sync/recordings/" nil nil nil (lambda (f) (string-match "\\.json\\'" f)))))
  ;; put them all into a buffer
  (with-current-buffer (get-buffer-create "*Words*")
    (erase-buffer)
    (fundamental-mode)
    (my-whisperx-insert-word-list (my-whisperx-word-list file))
    (my-audio-braindump-restructure)
    (goto-char (point-min))
    (switch-to-buffer (current-buffer))))

(defun my-audio-braindump-process-text (file)
  (interactive (list (read-file-name "Text: " "~/sync/recordings/" nil nil nil (lambda (f) (string-match "\\.txt\\'" f)))))
  (with-current-buffer (find-file-noselect file)
    (my-audio-braindump-restructure)
    (save-buffer)))
;; (my-audio-braindump-from-whisperx-json (my-latest-file "~/sync/recordings" "\\.json"))

Ideas for next steps:

  • I can change my processing script to split up the Whisper TXT into sections and automatically make the PDF with nice sections.
  • I can add reminders and other callouts. I can style them, and I can copy reminders into a different section for easier processing.
  • I can look into extracting PDF annotations so that I can jump to the next highlight or copy highlighted text.
This is part of my Emacs configuration.
View org source for this post

Looking at my blog post stats by year

| blogging
blog-stats.svg
Figure 1: Blog statistics

I was curious about the shape of my blog over the years, excluding Emacs News and my link-heavy weekly/monthly reviews. It started off with lots of little posts like the way other weblogs were also quick links and notes. As weblogs morphed into blogs with more text, I also settled down into fewer, longer posts with lots of code (analyzed by looking for <pre> blocks). I wrote much less after A+ was born. Interestingly, I've been shifting towards longer posts with more images.

  • Blog posts exclude permalinks that match emacs-news|review|week-ending, which casts a bit of a wide net but should give me the general shape of things.
  • Total words per year and average words per post both exclude code snippets.

Here's how I got those numbers:

(append
 '(("Year" "Posts" "Total words" "Words per post" "Posts with pre" "Posts with images")
   hline)
 (cl-loop for i from 2001 to 2024
          collect
          (let* ((default-directory (expand-file-name (number-to-string i) "~/proj/static-blog/blog"))
                 (exclude (shell-quote-argument "emacs-news|review|week-ending"))
                 (files (format "find . -name '*.html' | grep -v -e '%s' | " exclude))
                 (posts (string-to-number
                         (string-trim
                          (shell-command-to-string (concat files "wc -l")))))
                 (words (string-to-number
                         (replace-regexp-in-string
                          "TOTAL: " ""
                          (shell-command-to-string
                           (concat files "xargs ~/bin/count-words | grep TOTAL")))))
                 (posts-with-images
                  (string-to-number
                   (string-trim
                    (shell-command-to-string (concat files "xargs grep -l '<img' | wc -l")))))
                 (posts-with-pre
                  (string-to-number
                   (string-trim
                    (shell-command-to-string (concat files "xargs grep -l '<pre' | wc -l"))))))
            (list i
                  posts
                  words
                  (/ words posts)
                  posts-with-images
                  posts-with-pre))))
Year Posts Total words Words per post Posts with pre Posts with images
2001 3 438 146 0 0
2002 31 4336 139 0 0
2003 863 64953 75 0 59
2004 967 125789 130 2 98
2005 679 135334 199 4 40
2006 869 171042 196 19 42
2007 489 107011 218 33 32
2008 380 121158 318 85 57
2009 400 175692 439 81 20
2010 335 160289 478 93 19
2011 324 163274 503 93 28
2012 286 124300 434 111 12
2013 273 173021 633 172 11
2014 272 186788 686 138 30
2015 173 133682 772 82 36
2016 25 11560 462 13 6
2017 37 24063 650 6 2
2018 66 46827 709 7 8
2019 18 13054 725 3 6
2020 13 6791 522 4 5
2021 31 17389 560 8 16
2022 21 11264 536 4 9
2023 68 47188 693 26 52
2024 74 58439 789 27 40

And here's how I plotted the charts:

import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(data[1:], columns=data[0])
# Create a figure with subplots
fig, (ax1, ax4, ax2, ax3) = plt.subplots(4, 1, figsize=(10, 12))
fig.suptitle('Blog Statistics by Year', fontsize=16)

# Plot Posts
ax1.bar(df['Year'], df['Posts'], color='lightblue', label='Other posts')
ax1.bar(df['Year'], df['Posts with pre'] , color='darkblue', label='With preformatted blocks')
ax1.set_title('Number of posts per year')
ax1.set_ylabel('Posts')
ax1.legend()

# Plot Posts
ax4.bar(df['Year'], df['Posts'], color='lightblue', label='Other posts')
ax4.bar(df['Year'], df['Posts with images'] , color='darkgreen', label='With images')
ax4.set_title('Number of posts per year')
ax4.set_ylabel('Posts')
ax4.legend()

# Plot Total Words
ax2.bar(df['Year'], df['Total words'], color='lightblue')
ax2.set_title('Total words per year')
ax2.set_ylabel('Total words')

# Plot Words per Post
ax3.bar(df['Year'], df['Words per post'], color='lightblue')
ax3.set_title('Average words per post')
ax3.set_ylabel('Words per post')
ax3.set_xlabel('Year')

# Adjust layout and display
plt.savefig(f)
View org source for this post

2024-11-18 Emacs news

| emacs, emacs-news

Links from reddit.com/r/emacs, r/orgmode, r/spacemacs, r/planetemacs, Mastodon #emacs, Hacker News, lobste.rs, programming.dev, lemmy.world, lemmy.ml, communick.news, planet.emacslife.com, YouTube, the Emacs NEWS file, Emacs Calendar, and emacs-devel. Thanks to Andrés Ramírez for emacs-devel links. Do you have an Emacs-related link or announcement? Please e-mail me at sacha@sachachua.com. Thank you!

View org source for this post

Changing Org Mode underlines to the HTML mark element

| org

Apparently, HTML has a mark element that is useful for highlighting. ox-html.el in Org Mode doesn't seem to export that yet. I don't use _ to underline things because I don't want that confused with links. Maybe I can override org-html-text-markup-alist to use it for my own purposes…

(with-eval-after-load 'org
  (setf (alist-get 'underline org-html-text-markup-alist)
        "<mark>%s</mark>"))

Okay, let's try it with:

Let's see _how that works._

Let's see how that works. Oooh, that's promising.

Now, what if I want something fancier, like the way it can be nice to use different-coloured highlighters when marking up notes in order to make certain things jump out easily? A custom link might come in handy.

(defun my-org-highlight-export (link desc format _)
  (pcase format
    ((or '11ty 'html)
     (format "<mark%s>%s</mark>"
             (if link
                 (format " class=\"%s\"" link)
               link)
             desc))))
(with-eval-after-load 'org
  (org-link-set-parameters "hl" :export 'my-org-highlight-export)
  )

A green highlight might be good for ideas, while red might be good for warnings. (Idea: I wonder how to font-lock them differently in Emacs…)

I shouldn't rely only on the colours, since people reading through RSS won't get them and also since some people are colour-blind. Still, the highlights could make my blog posts easier to skim on my website.

Of course, now I want to port Prot's excellent colours from the Modus themes over to CSS variables so that I can have colours that make sense in both light mode and dark mode. Here's a snippet that exports the colours from one of the themes:

(format ":root {\n%s\n}\n"
        (mapconcat
         (lambda (entry)
           (format "  --modus-%s: %s;"
                   (symbol-name (car entry))
                   (if (stringp (cadr entry))
                       (cadr entry)
                     (format "var(--modus-%s)" (symbol-name (cadr entry))))))
         modus-operandi-palette
         "\n"))

So now my style.css has:

/* Based on Modus Operandi by Protesilaous Stavrou */
:root {
   // ...
   --modus-bg-red-subtle: #ffcfbf;
   --modus-bg-green-subtle: #b3fabf;
   --modus-bg-yellow-subtle: #fff576;
   // ...
}
@media (prefers-color-scheme: dark) {
   /* Based on Modus Vivendi by Protesilaous Stavrou */
   :root {
      // ...
      --modus-bg-red-subtle: #620f2a;
      --modus-bg-green-subtle: #00422a;
      --modus-bg-yellow-subtle: #4a4000;
      // ...
   }
}
mark { background-color: var(--modus-bg-yellow-subtle) }
mark.green { background-color: var(--modus-bg-green-subtle) }
mark.red { background-color: var(--modus-bg-red-subtle) }

Interesting, interesting…

View org source for this post

Checking caption timing by skimming with Emacs Lisp or JS

| js, emacs, subed

Sometimes automatic subtitle timing tools like Aeneas can get confused by silences, extraneous sounds, filler words, mis-starts, and text that I've edited out of the raw captions for easier readability. It's good to quickly check each caption. I used to listen to captions at 1.5x speed, watching carefully as each caption displayed. This took a fair bit of time and focus, so… it usually didn't happen. Sampling the first second of each caption is faster and requires a little less attention.

Skimming with subed.el

Here's a function that I wrote to play the first second of each subtitle.

(defvar my-subed-skim-msecs 1000 "Number of milliseconds to play when skimming.")
(defun my-subed-skim-starts ()
  (interactive)
  (subed-mpv-unpause)
  (subed-disable-loop-over-current-subtitle)
  (catch 'done
    (while (not (eobp))
      (subed-mpv-jump-to-current-subtitle)
      (let ((ch
             (read-char "(q)uit? " nil (/ my-subed-skim-msecs 1000.0))))
        (when ch
          (throw 'done t)))
      (subed-forward-subtitle-time-start)
      (when (and subed-waveform-minor-mode
                 (not subed-waveform-show-all))
        (subed-waveform-refresh))
      (recenter)))
  (subed-mpv-pause))

Now I can read the lines as the subtitles play, and I can press any key to stop so that I can fix timestamps.

Skimming with Javascript

I also want to check the times on the Web in case there have been caching issues. Here's some Javascript to skim the first second of each cue in the first text track for a video, with some code to make it easy to process the first video in the visible area.

function getVisibleVideo() {
  const videos = document.querySelectorAll('video');
  for (const video of videos) {
    const rect = video.getBoundingClientRect();
    if (
      rect.top >= 0 &&
      rect.left >= 0 &&
      rect.bottom <= (window.innerHeight || document.documentElement.clientHeight) &&
      rect.right <= (window.innerWidth || document.documentElement.clientWidth)
    ) {
      return video;
    }
  }
  return null;
}

async function skimVideo(video=getVisibleVideo(), msecs=1000) {
  // Get the first text track (assumed to be captions/subtitles)
  const textTrack = video.textTracks[0];
  if (!textTrack) return;
  const remaining = [...textTrack.cues].filter((cue) => cue.endTime >= video.currentTime);
  video.play();
  // Play the first 1 second of each visible subtitle
  for (let i = 0; i < remaining.length && !video.paused; i++) {
    video.currentTime = remaining[i].startTime;
    await new Promise((resolve) => setTimeout(resolve, msecs));
  }
}

Then I can call it with skimVideo();. Actually, in our backstage area, it might be useful to add a Skim button so that I can skim things from my phone.

function handleSkimButton(event) {
   const vid = event.target.closest('.vid').querySelector('video');
   skimVideo(vid);
 }

document.querySelectorAll('video').forEach((vid) => {
   const div = document.createElement('div');
   const skim = document.createElement('button');
   skim.textContent = 'Skim';
   div.appendChild(skim);
   vid.parentNode.insertBefore(div, vid.nextSibling);
   skim.addEventListener('click', handleSkimButton);
});

Results

How much faster is it this way?

Some code to help figure out the speedup
(-let* ((files (directory-files "~/proj/emacsconf/2024/cache" t "--main\\.vtt"))
        ((count-subs sum-seconds)
         (-unzip (mapcar
                  (lambda (file)
                    (list
                     (length (subed-parse-file file))
                     (/ (compile-media-get-file-duration-ms
                         (concat (file-name-sans-extension file) ".webm")) 1000.0)))
                  files)))
        (total-seconds (-reduce #'+ sum-seconds))
        (total-subs (-reduce #'+ count-subs)))
  (format "%d files, %.1f hours, %d total captions, speed up of %.1f"
          (length files)
          (/ total-seconds 3600.0)
          total-subs
          (/ total-seconds total-subs)))

It looks like for EmacsConf talks where we typically format captions to be one long line each (< 60 characters), this can be a speed-up of about 4x compared to listening to the video at normal speed. More usefully, it's different enough to get my brain to do it instead of putting it off.

Most of the automatically-generated timestamps are fine. It's just a few that might need tweaking. It's nice to be able to skim them with fewer keystrokes.

View org source for this post