Remove filler words at the start and upcase the next word

| audio, speechtotext, emacs

[2024-11-21 Thu]: Fixed the second filler words regexp, and make it work at the start of lines too. Thanks to @arialdo@mastodon.online for the feedback!

Like many people, I tend to use "So", "And", "You know", and "Uh" to bridge between sentences when thinking. WhisperX does a reasonable job of detecting sentences and splitting them up anyway, but it leaves those filler words in at the start of the sentence. I usually like to remove these from transcripts so that they read more smoothly.

Here's a short Emacs Lisp function that removes those filler words when they start a sentence, capitalizing the next word. When called interactively, it prompts while displaying an overlay. When called from Emacs Lisp, it changes without asking for confirmation.

(defvar my-filler-words-regexp "\\(\\. \\|^\\)\\(?:So?\\|And\\|You know\\|Uh\\)\\(?:,\\|\\.\\.\\.\\)? \\(.\\)")
(defun my-remove-filler-words-at-start ()
  (interactive)
  (save-excursion
    (let ((case-fold-search nil))
      (while (re-search-forward my-filler-words-regexp nil t)
        (if (and (called-interactively-p) (not current-prefix-arg))
            (let ((overlay (make-overlay (match-beginning 0)
                                         (match-end 0))))
              (overlay-put overlay 'common-edit t)
              (overlay-put
               overlay 'display
               (propertize (concat (match-string 0) " -> "
                                   (match-string 1)
                                   (upcase (match-string 2)))
                           'face 'modus-themes-mark-sel))
              (unwind-protect
                  (pcase (save-match-data (read-char-choice "Replace (y/n/!/q)? " "yn!q"))
                    (?!
                     (replace-match (concat (match-string 1) (upcase (match-string 2))) t)
                     (while (re-search-forward my-filler-words-regexp nil t)
                       (replace-match (concat (match-string 1) (upcase (match-string 2))) t)))
                    (?y
                     (replace-match (concat (match-string 1) (upcase (match-string 2))) t))
                    (?n nil)
                    (?q (goto-char (point-max))))
                (delete-overlay overlay)))
          (replace-match (concat (match-string 1) (upcase (match-string 2))) t))))))
This is part of my Emacs configuration.
View org source for this post
You can comment with Disqus or you can e-mail me at sacha@sachachua.com.