Remove filler words at the start and upcase the next word

| audio, speechtotext, emacs

Like many people, I tend to use "So", "And", "You know", and "Uh" to bridge between sentences when thinking. WhisperX does a reasonable job of detecting sentences and splitting them up anyway, but it leaves those filler words in at the start of the sentence. I usually like to remove these from transcripts so that they read more smoothly.

Here's a short Emacs Lisp function that removes those filler words when they start a sentence, capitalizing the next word. When called interactively, it prompts while displaying an overlay. When called from Emacs Lisp, it changes without asking for confirmation.

(defvar my-filler-words-regexp "\\. \\(?:so,?\\|and\\|you know,\\|uh,?\\) \\(.\\)")
(defun my-remove-filler-words-at-start ()
  (interactive)
  (save-excursion
    (while (re-search-forward my-filler-words-regexp nil t)
      (if (and (called-interactively-p) (not current-prefix-arg))
          (let ((overlay (make-overlay (match-beginning 0)
                                       (match-end 0))))
            (overlay-put overlay 'common-edit t)
            (overlay-put
             overlay 'display
             (propertize (concat (match-string 0) " -> . "
                                 (upcase (match-string 1)))
                         'face 'modus-themes-mark-sel))
            (unwind-protect
                (pcase (read-char-choice "Replace (y/n/!/q)? " "yn!q")
                  (?!
                   (replace-match (concat ". " (upcase (match-string 1))) t)
                   (while (re-search-forward "\\. \\(?:So\\|And\\) \\(.\\)" nil t)
                     (replace-match (concat ". " (upcase (match-string 1))) t)))
                  (?y
                   (replace-match (concat ". " (upcase (match-string 1))) t))
                  (?n nil)
                  (?q (goto-char (point-max))))
              (delete-overlay overlay)))
        (replace-match (concat ". " (upcase (match-string 1))) t)))))
This is part of my Emacs configuration.
View org source for this post
You can comment with Disqus or you can e-mail me at sacha@sachachua.com.