subed.el: Tweaking subtitle times
| emacs, subedWhen subtitle times are too far off from the video or audio, people start worrying if their video has frozen or jumped ahead. It's good to keep subtitles roughly in time with the audio.
For EmacsConf, we can get timing information from two places. WhisperX produces a JSON file with word data in the process of doing the speech recognition, and the aeneas forced alignment tool can use synthesized text-to-speech to figure out the timestamps for each line of text compared to a media file.
Aeneas timestamps are more helpful once we start editing, but it can be confused by long silences, extraneous noises, multiple speakers, and inaccurate transcripts (words added or removed).
When I combine the WhisperX word data with subtitles, I can see where the times might need a closer look because matching words weren't found.
Loading word data requires a pretty close match at
the moment, but since we change only about 4% of
the subtitle text when editing, those cues are
still helpful. (I measured this by the Levenshtein
distance between the combined cue texts of edited
subtitles versus the original WhisperX
transcripts, using string-distance
to
approximate the editing percentage.)
Calculating how much we edited
(let ((sum-original 0) (sum-dist 0)) (append (seq-keep (lambda (talk) (when (and (emacsconf-talk-file talk "--main.vtt") (emacsconf-talk-file talk "--reencoded.json")) (let* ((json-object-type 'alist) (json-array-type 'list) (edited-text (mapconcat (lambda (sub) (elt sub 3)) (subed-parse-file (emacsconf-talk-file talk "--main.vtt")) " ")) (original-text (mapconcat (lambda (word) (assoc-default 'word word)) (assoc-default 'word_segments (json-read-file (emacsconf-talk-file talk "--reencoded.json"))) " ")) (dist (string-distance original-text edited-text))) (setq sum-original (+ sum-original (length original-text))) (setq sum-dist (+ sum-dist dist)) (list (length original-text) (length edited-text) dist)))) (emacsconf-get-talk-info)) '(hline) (list (list sum-original (format "%d%%" (/ (* 100.0 sum-dist) sum-original)) sum-dist))))
To make it easier to correct subtitle timing, I added a few ways to tweak subtitle timing for a region of subtitles.
WhisperX: subed-word-data-fix-subtitle-timing
in
subed-word-data.el tries to match the word
data from WhisperX against the text of the current
subtitle, using string-distance
for approximate
matches. I start at about two words shorter than
what's in the subtitle, and then increase the
number of words taken from the data while the
string distance decreases. I skip the data for
words before the beginning of the first
subtitle in the region.
Aeneas: subed-align-region
uses Aeneas to realign the
subtitles from the region using the section of the
media file between the start of the first subtitle
and the end of the last subtitle in the region.
When I notice that the times are off, I skim the
subtitles (or just skim them visually) to find the
last well-timed subtitle. Then I pick a subtitle
that's in the incorrectly-timed section. I use
subed-mpv-jump-to-current-subtitle
(M-j
) to
jump to that position, and I play back that
subtitle. It usually belongs to some text further
down, so I reset to that position with M-j
, set
my mark before the previous correctly-timed
subtitle with C-SPC
, go to the subtitle that
matches that time, and use
subed-copy-player-pos-to-start-time
(C-c [
) to
set the proper timestamp. Then I can go to the
previous incorrectly-timed subtitle and use M-x
subed-align-region
. This runs the Aeneas forced
alignment tool using just the subtitle text in the
region, the starting timestamp of the first
subtitle, and the ending timestamp of the last
subtitle, making it easy to adjust that section.
subed-align-region
is in subed-align.el
Retiming by pressing SPC after each subtitle: As
an experiment, I've also added a
subed-retime-subtitles
command that plays
through the subtitles so that I can press SPC
when the next subtitle starts. It begins with the
current subtitle and stops when you press a key
that's not in its keymap.
Manual adjustments: For fine-tuning timestamps,
I usually turn on subed-waveform-show-all
and
shift-left-click
(subed-waveform-set-start-and-copy-to-previous
)
or shift-right-click
(subed-waveform-set-stop-and-copy-to-next
) on
the waveforms because it's easy to see where the
words and pauses are. When I'm not sure, I can use
middle-click (subed-waveform-play-sample
) to
play part of the file without changing the
subtitle start/stop or the MPV playback position.
I'm experimenting with adding repeating
keybindings. There was a
subed-mpv-frame-step-map
that was bound to C-c
C-f
, so I've renamed it to subed-mpv-control
,
added a whole bunch of keybindings to the
subed-mpv-control-map
based on MPV and Aegisub
shortcuts, and made it a repeating transient map.
Ideas for next steps:
Gotta get the hang of all these new capabilities through practice! =)
To make my subed-align-region
workflow even more
convenient, I could use completing-read
to let
me select a future subtitle with completion, and
then Emacs could automatically fix the subtitle
start time, go to the previous subtitle, and
realign the region.
Also, I think switching the waveforms from overlays to text properties could be a good idea. When I cut text, the overlays get left behind, but I want the waveforms to go away too.
While writing this post and fiddling with subed, I ended up adding a bunch of keybindings and a menu. I figured this was as good a time as any to stop tweaking it and finally publish. (But it's fun! Just one more idea…)