Remove filler words at the start and upcase the next word
| audio, speechtotext, emacs@arialdo@mastodon.online for the feedback!
: Fixed the second filler words regexp, and make it work at the start of lines too. Thanks toLike many people, I tend to use "So", "And", "You know", and "Uh" to bridge between sentences when thinking. WhisperX does a reasonable job of detecting sentences and splitting them up anyway, but it leaves those filler words in at the start of the sentence. I usually like to remove these from transcripts so that they read more smoothly.
Here's a short Emacs Lisp function that removes those filler words when they start a sentence, capitalizing the next word. When called interactively, it prompts while displaying an overlay. When called from Emacs Lisp, it changes without asking for confirmation.
(defvar my-filler-words-regexp "\\(\\. \\|^\\)\\(?:So?\\|And\\|You know\\|Uh\\)\\(?:,\\|\\.\\.\\.\\)? \\(.\\)") (defun my-remove-filler-words-at-start () (interactive) (save-excursion (let ((case-fold-search nil)) (while (re-search-forward my-filler-words-regexp nil t) (if (and (called-interactively-p) (not current-prefix-arg)) (let ((overlay (make-overlay (match-beginning 0) (match-end 0)))) (overlay-put overlay 'common-edit t) (overlay-put overlay 'display (propertize (concat (match-string 0) " -> " (match-string 1) (upcase (match-string 2))) 'face 'modus-themes-mark-sel)) (unwind-protect (pcase (save-match-data (read-char-choice "Replace (y/n/!/q)? " "yn!q")) (?! (replace-match (concat (match-string 1) (upcase (match-string 2))) t) (while (re-search-forward my-filler-words-regexp nil t) (replace-match (concat (match-string 1) (upcase (match-string 2))) t))) (?y (replace-match (concat (match-string 1) (upcase (match-string 2))) t)) (?n nil) (?q (goto-char (point-max)))) (delete-overlay overlay))) (replace-match (concat (match-string 1) (upcase (match-string 2))) t))))))
This is part of my Emacs configuration.
You can comment with Disqus or you can e-mail me at sacha@sachachua.com.