Using Emacs to fix automatically generated subtitle timestamps
Posted: - Modified: | emacs, subedI like how people are making more and more Emacs-related videos. I think subtitles, transcripts, and show notes would go a long way to helping people quickly search, skim, and squeeze these videos into their day.
Youtube's automatically-generated subtitles overlap. I think some players scroll the subtitles, but the ones I use just display them in alternating positions. I like to have non-overlapping subtitles, so here's some code that works with subed.el to fix the timestamps.
(defun my/subed-fix-timestamps () "Change all ending timestamps to the start of the next subtitle." (goto-char (point-max)) (let ((timestamp (subed-subtitle-msecs-start))) (while (subed-backward-subtitle-time-start) (subed-set-subtitle-time-stop timestamp) (setq timestamp (subed-subtitle-msecs-start)))))
Then it's easy to edit the subtitles (punctuation, capitalization, special terms), especially with the shortcuts for splitting and merging subtitles.
For transcripts with starting and ending timestamps per paragraph, I like using the merge shortcut to merge all the subtitles for a paragraph together. Here's a sample: https://emacsconf.org/2020/talks/05/
Tonight I edited automatically-generated subtitles for a screencast that was about 40 minutes long. The resulting file had 1157 captions, so about 2 seconds each. I finished it in about 80 minutes, pretty much the 2x speed that I've been seeing. I can probably get a little faster if I figure out good workflows for:
- jumping: avy muscle memory, maybe?
- splitting things into sentences and phrases
- fixing common speech recognition errors (ex: emax -> Emacs, which I handle with regex replaces; maybe a list of them?)
I experimented with making a hydra for this before, but thinking about the keys to use slowed me down a bit and it didn't flow very well. Might be worth tinkering with.
Transcribing from scratch takes me about 4-5x playtime. I haven't tweaked my workflow for that one yet because I've only transcribed one talk with subed.el , and there's a backlog of talks that already have automatically generated subtitles to edit. Low-hanging fruit! =)
So that's another thing I (or other people) can occasionally do to help out even if I don't have enough focused time to think about a programming challenge or do a podcast myself. And I get to learn more in the process, too. Fun!