subed.el: Tweaking subtitle times

| emacs, subed

When subtitle times are too far off from the video or audio, people start worrying if their video has frozen or jumped ahead. It's good to keep subtitles roughly in time with the audio.

For EmacsConf, we can get timing information from two places. WhisperX produces a JSON file with word data in the process of doing the speech recognition, and the aeneas forced alignment tool can use synthesized text-to-speech to figure out the timestamps for each line of text compared to a media file.

Aeneas timestamps are more helpful once we start editing, but it can be confused by long silences, extraneous noises, multiple speakers, and inaccurate transcripts (words added or removed).

When I combine the WhisperX word data with subtitles, I can see where the times might need a closer look because matching words weren't found.

2024-12-18_12-27-37.png
Figure 1: Screenshot with word data loaded

Loading word data requires a pretty close match at the moment, but since we change only about 4% of the subtitle text when editing, those cues are still helpful. (I measured this by the Levenshtein distance between the combined cue texts of edited subtitles versus the original WhisperX transcripts, using string-distance to approximate the editing percentage.)

Calculating how much we edited
(let ((sum-original 0)
      (sum-dist 0))
  (append
   (seq-keep
    (lambda (talk)
      (when (and (emacsconf-talk-file talk "--main.vtt")
                 (emacsconf-talk-file talk "--reencoded.json"))
        (let* ((json-object-type 'alist)
               (json-array-type 'list)
               (edited-text
                (mapconcat (lambda (sub) (elt sub 3))
                           (subed-parse-file (emacsconf-talk-file talk "--main.vtt"))
                           " "))
               (original-text
                (mapconcat
                 (lambda (word)
                   (assoc-default 'word word))
                 (assoc-default
                  'word_segments
                  (json-read-file (emacsconf-talk-file talk "--reencoded.json")))
                 " "))
               (dist (string-distance original-text edited-text)))
          (setq sum-original (+ sum-original (length original-text)))
          (setq sum-dist (+ sum-dist dist))
          (list
           (length original-text)
           (length edited-text)
           dist))))
    (emacsconf-get-talk-info))
   '(hline)
   (list
    (list
     sum-original
     (format "%d%%" (/ (* 100.0 sum-dist) sum-original))
     sum-dist))))

To make it easier to correct subtitle timing, I added a few ways to tweak subtitle timing for a region of subtitles.

WhisperX: subed-word-data-fix-subtitle-timing in subed-word-data.el tries to match the word data from WhisperX against the text of the current subtitle, using string-distance for approximate matches. I start at about two words shorter than what's in the subtitle, and then increase the number of words taken from the data while the string distance decreases. I skip the data for words before the beginning of the first subtitle in the region.

Screencast of subed-word-data-fix-subtitle-timing

Aeneas: subed-align-region uses Aeneas to realign the subtitles from the region using the section of the media file between the start of the first subtitle and the end of the last subtitle in the region. When I notice that the times are off, I skim the subtitles (or just skim them visually) to find the last well-timed subtitle. Then I pick a subtitle that's in the incorrectly-timed section. I use subed-mpv-jump-to-current-subtitle (M-j) to jump to that position, and I play back that subtitle. It usually belongs to some text further down, so I reset to that position with M-j, set my mark before the previous correctly-timed subtitle with C-SPC, go to the subtitle that matches that time, and use subed-copy-player-pos-to-start-time (C-c [) to set the proper timestamp. Then I can go to the previous incorrectly-timed subtitle and use M-x subed-align-region. This runs the Aeneas forced alignment tool using just the subtitle text in the region, the starting timestamp of the first subtitle, and the ending timestamp of the last subtitle, making it easy to adjust that section. subed-align-region is in subed-align.el

Retiming by pressing SPC after each subtitle: As an experiment, I've also added a subed-retime-subtitles command that plays through the subtitles so that I can press SPC when the next subtitle starts. It begins with the current subtitle and stops when you press a key that's not in its keymap.

Screencast with audio: subed-retime-subtitles

Manual adjustments: For fine-tuning timestamps, I usually turn on subed-waveform-show-all and shift-left-click (subed-waveform-set-start-and-copy-to-previous) or shift-right-click (subed-waveform-set-stop-and-copy-to-next) on the waveforms because it's easy to see where the words and pauses are. When I'm not sure, I can use middle-click (subed-waveform-play-sample) to play part of the file without changing the subtitle start/stop or the MPV playback position.

Screencast with audio of using the waveforms

I'm experimenting with adding repeating keybindings. There was a subed-mpv-frame-step-map that was bound to C-c C-f, so I've renamed it to subed-mpv-control, added a whole bunch of keybindings to the subed-mpv-control-map based on MPV and Aegisub shortcuts, and made it a repeating transient map.

Screencast with audio, experimenting with the mpv control map

Ideas for next steps:

Gotta get the hang of all these new capabilities through practice! =)

To make my subed-align-region workflow even more convenient, I could use completing-read to let me select a future subtitle with completion, and then Emacs could automatically fix the subtitle start time, go to the previous subtitle, and realign the region.

Also, I think switching the waveforms from overlays to text properties could be a good idea. When I cut text, the overlays get left behind, but I want the waveforms to go away too.

While writing this post and fiddling with subed, I ended up adding a bunch of keybindings and a menu. I figured this was as good a time as any to stop tweaking it and finally publish. (But it's fun! Just one more idea…)

View org source for this post
You can comment with Disqus or you can e-mail me at sacha@sachachua.com.