Categories: geek » emacs » subed

View topic page - RSS - Atom - Subscribe via email

Converting our VTT files to TTML

| subed, emacsconf, geek, ffmpeg

I wanted to convert our VTT files to TTML files so that we might be able to use them for training lachesis for transcript segmentation. I downloaded the VTT files from EmacsConf 2021 to a directory and copied the edited captions from the EmacsConf 2022 backstage area (using head -1 ${FILE} | grep -q "captioned" to distinguish them from the automatic ones). I installed the ttconv python package. Then I used the following command to convert the TTML files:

for FILE in *.vtt; do
    BASE=$(basename -s .vtt "$FILE");
    ffmpeg -y -i $FILE $BASE.srt; tt convert -i $BASE.srt -o $BASE.ttml
done           

I haven't gotten around to installing whanever I need in order to get lachesis to work under Python 2.7, since it hasn't been updated for Python 3. It'll probably be a low-priority project anyway, as EmacsConf is fast approaching. Anyway, I thought I'd stash this in my blog somewhere in case I need to make TTML files again!

subed.el: Word-level timing improvements, TSV support

| emacs, subed

I figured out how to align the subtitles to get word-level timestamps and generate SRV2 files, so now I'm working on improving the support in subed.el so that it can work with those timestamps.

The subed-word-data-load-from-file function in subed-word-data.el should load the word data from the SRV2 file and attempt to match it up with the text, colouring words if they were successfully matched.

Screenshot_2022-10-26_13-46-31.png

Figure 1: After subed-word-data-load-from-file

I also updated and committed code for working with TSV files like the label export from the Audacity audio editor. The concise format might make editing and reviewing easier. The files look like this:

Screenshot_2022-10-26_13-49-00.png

Figure 2: Tab-separated values

To convert an existing file, use subed-convert (from subed-common.el). You can also manually turn on subed-tsv-mode from subed-tsv.el when you're visitng a TSV subtitle/label file. Tab-separated values can be in any sort of text file and tsv is a common file extension, so I don't automatically add it to auto-mode-alist.

The changes should be in 1.0.16 or the latest version from the Git repository at https://github.com/sachac/subed .

Coverage reporting in Emacs with Buttercup, Undercover, Coverage, and a Makefile

| emacs, elisp, subed

One of the things that I always wanted to get back to was the practice of having good test coverage. That way, I can have all these tests catch me in case I break something in my sleep-deprived late-night hacking sessions, and I can see where I may have missed a spot.

Fortunately, subed-mode included lots of tests using the Buttercup testing framework. They look like this:

(describe "SRT"
  (describe "Getting"
    (describe "the subtitle ID"
      (it "returns the subtitle ID if it can be found."
        (with-temp-srt-buffer
         (insert mock-srt-data)
         (subed-jump-to-subtitle-text 2)
         (expect (subed-subtitle-id) :to-equal 2)))
      (it "returns nil if no subtitle ID can be found."
        (with-temp-srt-buffer
         (expect (subed-subtitle-id) :to-equal nil))))
    ...))

and I can run them with make test, which the Makefile defines as emacs -batch -f package-initialize -L . -f buttercup-run-discover.

I don't have Cask set up for subed. I should probably learn how to use Cask. In the meantime, I needed to figure out how to get my Makefile to get the buttercup tests to capture the coverage data and report it in a nice way.

It turns out that the undercover coverage recording library works well with buttercup. It took me a little fiddling (and some reference to undercover.el-buttercup-integration-example) to figure out exactly how to invoke it so that undercover instrumented libraries that I was loading, since the subed files were in one subdirectory and the tests were in another. This is what I eventually came up with for tests/undercover-init.el:

(add-to-list 'load-path "./subed")
(when (require 'undercover nil t)
  (undercover "./subed/*.el" (:report-format 'simplecov) (:send-report nil)))

Then the tests files could start with:

(load-file "./tests/undercover-init.el")
(require 'subed-srt)

and my Makefile target for running tests with coverage reporting could be:

test-coverage:
	mkdir -p coverage
	UNDERCOVER_FORCE=true emacs -batch -L . -f package-initialize -f buttercup-run-discover

Displaying the coverage information in code buffers was easy with the coverage package. It looks in the git root directory for the coverage results, so I didn't need to tell it where the results were. This is what it looks like:

2022-01-02-19-00-28.svg

There are a few other options for displaying coverage info. cov uses the fringe and coverlay focuses on highlighting missed lines.

So now I can actually see how things are going, and I can start writing tests for some of those gaps. At some point I may even do the badge thing mentioned in my blog post from 2015 on continuous integration and code coverage for Emacs packages. There are a lot of things I'm slowly remembering how to do… =)

Defining generic and mode-specific Emacs Lisp functions with cl-defmethod

| elisp, emacs, subed

2022-01-27: Added example function description.
2022-01-02: Changed quote to function in the defalias.

I recently took over the maintenance of subed, an Emacs mode for editing subtitles. One of the things on my TODO list was to figure out how to handle generic and format-specific functions instead of relying on defalias. For example, there are SubRip files (.srt), WebVTT files (.vtt), and Advanced SubStation Alpha (.ass). I also want to add support for Audacity labels and other formats.

There are some functions that will work across all of them once you have the appropriate format-specific functions in place, and there are some functions that have to be very different depending on the format that you're working with. Now, how do you do those things in Emacs Lisp? There are several ways of making general functions and specific functions.

For example, the forward-paragraph and backward-paragraph commands use variables to figure out the paragraph separators, so buffer-local variables can change the behaviour.

However, I needed a bit more than regular expressions. An approach taken in some packages like smartparens is to have buffer-local variables have the actual functions to be called, like sp-forward-bound-fn and sp-backward-bound-fn.

(defvar-local sp-forward-bound-fn nil
  "Function to restrict the forward search")

(defun sp--get-forward-bound ()
  "Get the bound to limit the forward search for looking for pairs.
If it returns nil, the original bound passed to the search
function will be considered."
  (and sp-forward-bound-fn (funcall sp-forward-bound-fn)))

Since there were so many functions, I figured that might be a little bit unwieldy. In Org mode, custom export backends are structs that have an alist that maps the different types of things to the functions that will be called, overriding the functions that are defined in the parent export backend.

(cl-defstruct (org-export-backend (:constructor org-export-create-backend)
          (:copier nil))
  name parent transcoders options filters blocks menu)

(defun org-export-get-all-transcoders (backend)
  "Return full translation table for BACKEND.

BACKEND is an export back-end, as return by, e.g,,
`org-export-create-backend'.  Return value is an alist where
keys are element or object types, as symbols, and values are
transcoders.

Unlike to `org-export-backend-transcoders', this function
also returns transcoders inherited from parent back-ends,
if any."
  (when (symbolp backend) (setq backend (org-export-get-backend backend)))
  (when backend
    (let ((transcoders (org-export-backend-transcoders backend))
          parent)
      (while (setq parent (org-export-backend-parent backend))
        (setq backend (org-export-get-backend parent))
        (setq transcoders
              (append transcoders (org-export-backend-transcoders backend))))
      transcoders)))

The export code looked a little bit complicated, though. I wanted to see if there was a different way of doing things, and I came across cl-defmethod. Actually, the first time I tried to implement this, I was focused on the fact that cl-defmethod could call different things depending on the class that you give it. So initially I had created a couple of classes: subed-backend class, and then subclasses such as subed-vtt-backend. This allowed me to store the backend as a buffer-local variable and differentiate based on that.

(require 'eieio)

(defclass subed-backend ()
  ((regexp-timestamp :initarg :regexp-timestamp
                     :initform ""
                     :type string
                     :custom string
                     :documentation "Regexp matching a timestamp.")
   (regexp-separator :initarg :regexp-separator
                     :initform ""
                     :type string
                     :custom string
                     :documentation "Regexp matching the separator between subtitles."))
  "A class for data and functions specific to a subtitle format.")

(defclass subed-vtt-backend (subed-backend) nil
  "A class for WebVTT subtitle files.")

(cl-defmethod subed--timestamp-to-msecs ((backend subed-vtt-backend) time-string)
  "Find HH:MM:SS,MS pattern in TIME-STRING and convert it to milliseconds.
Return nil if TIME-STRING doesn't match the pattern.
Use the format-specific function for BACKEND."
  (save-match-data
    (when (string-match (oref backend regexp-timestamp) time-string)
      (let ((hours (string-to-number (match-string 1 time-string)))
            (mins  (string-to-number (match-string 2 time-string)))
            (secs  (string-to-number (match-string 3 time-string)))
            (msecs (string-to-number (subed--right-pad (match-string 4 time-string) 3 ?0))))
        (+ (* (truncate hours) 3600000)
           (* (truncate mins) 60000)
           (* (truncate secs) 1000)
           (truncate msecs))))))

Then I found out that you can use major-mode as a context specifier for cl-defmethod, so you can call different specific functions depending on the major mode that your buffer is in. It doesn't seem to be mentioned in the elisp manual, so at some point I should figure out how to suggest mentioning it. Anyway, now I have some functions that get called if the buffer is in subed-vtt-mode and some functions that get called if the buffer is in subed-srt-mode.

The catch is that cl-defmethod can't define interactive functions. So if I'm defining a command, an interactive function that can be called with M-x, then I will need to have a regular function that calls the function defined with cl-defmethod. This resulted in a bit of duplicated code, so I have a macro that defines the method and then defines the possibly interactive command that calls that method. I didn't want to think about whether something was interactive or not, so my macro just always creates those two functions. One is a cl-defmethod that I can override for a specific major mode, and one is the function that actually calls it, which may may not be interactive. It doesn't handle &rest args, but I don't have any in subed.el at this time.

(defmacro subed-define-generic-function (name args &rest body)
  "Declare an object method and provide the old way of calling it."
  (declare (indent 2))
  (let (is-interactive
        doc)
    (when (stringp (car body))
      (setq doc (pop body)))
    (setq is-interactive (eq (caar body) 'interactive))
    `(progn
       (cl-defgeneric ,(intern (concat "subed--" (symbol-name name)))
           ,args
         ,doc
         ,@(if is-interactive
               (cdr body)
             body))
       ,(if is-interactive
            `(defun ,(intern (concat "subed-" (symbol-name name))) ,args
               ,(concat doc "\n\nThis function calls the generic function `"
                        (concat "subed--" (symbol-name name)) "' for the actual implementation.")
               ,(car body)
               (,(intern (concat "subed--" (symbol-name name)))
                ,@(delq nil (mapcar (lambda (a)
                                      (unless (string-match "^&" (symbol-name a))
                                        a))
                                    args))))
          `(defalias (quote ,(intern (concat "subed-" (symbol-name name))))
             (function ,(intern (concat "subed--" (symbol-name name))))
             ,doc)))))

For example, the function:

(subed-define-generic-function timestamp-to-msecs (time-string)
  "Find timestamp pattern in TIME-STRING and convert it to milliseconds.
Return nil if TIME-STRING doesn't match the pattern.")

expands to:

(progn
  (cl-defgeneric subed--timestamp-to-msecs
      (time-string)
    "Find timestamp pattern in TIME-STRING and convert it to milliseconds.
Return nil if TIME-STRING doesn't match the pattern.")
  (defalias 'subed-timestamp-to-msecs 'subed--timestamp-to-msecs "Find timestamp pattern in TIME-STRING and convert it to milliseconds.
Return nil if TIME-STRING doesn't match the pattern."))

and the interactive command defined with:

(subed-define-generic-function forward-subtitle-end ()
  "Move point to end of next subtitle.
Return point or nil if there is no next subtitle."
  (interactive)
  (when (subed-forward-subtitle-id)
    (subed-jump-to-subtitle-end)))

expands to:

(progn
  (cl-defgeneric subed--forward-subtitle-end nil "Move point to end of next subtitle.
Return point or nil if there is no next subtitle."
                 (when
                     (subed-forward-subtitle-id)
                   (subed-jump-to-subtitle-end)))
  (defun subed-forward-subtitle-end nil "Move point to end of next subtitle.
Return point or nil if there is no next subtitle.

This function calls the generic function `subed--forward-subtitle-end' for the actual implementation."
         (interactive)
         (subed--forward-subtitle-end)))

Then I can define a specific one with:

(cl-defmethod subed--timestamp-to-msecs (time-string &context (major-mode subed-srt-mode))
  "Find HH:MM:SS,MS pattern in TIME-STRING and convert it to milliseconds.
Return nil if TIME-STRING doesn't match the pattern.
Use the format-specific function for MAJOR-MODE."
  (save-match-data
    (when (string-match subed--regexp-timestamp time-string)
      (let ((hours (string-to-number (match-string 1 time-string)))
            (mins  (string-to-number (match-string 2 time-string)))
            (secs  (string-to-number (match-string 3 time-string)))
            (msecs (string-to-number (subed--right-pad (match-string 4 time-string) 3 ?0))))
        (+ (* (truncate hours) 3600000)
           (* (truncate mins) 60000)
           (* (truncate secs) 1000)
           (truncate msecs))))))

The upside is that it's easy to either override or extend a function's behavior. For example, after I sort subtitles, I want to renumber them if I'm in an SRT buffer because SRT subtitles have numeric IDs. This doesn't happen in any of the other modes. So I can just define that this bit of code runs after the regular code that runs.

(cl-defmethod subed--sort :after (&context (major-mode subed-srt-mode))
  "Renumber after sorting. Format-specific for MAJOR-MODE."
  (subed-srt--regenerate-ids))

The downside is that going to the function's definition and stepping through it is a little more complicated because it's hidden behind this macro and the cl-defmethod infrastructure. I think that if you describe-function the right function, the internal version with the --, then it will list the different implementations of it. I added a note to the regular function's docstring to make it a little easier.

Here's what M-x describe-function subed-forward-subtitle-end looks like:

describe-function.svg

Figure 1: Describing a generic function

I'm going to give this derived-mode branch a try for a little while by subtitling some more EmacsConf talks before I merge it into the main branch. This is my first time working with cl-defmethod, and it looks pretty interesting.

EmacsConf backstage: chapter markers

| subed, emacs

Long videos are easier to navigate with chapter markers, so I've been slowly adding chapter markers to the Q&A sessions for EmacsConf 2021. I wrote an IkiWiki template and some Javascript code so that adding chapter markers to the EmacsConf wiki should be just a matter of as adding something like this:

[[!template id="chapters" vidid="mainVideo" data="""
00:00 Introduction
00:11 Upcoming Emacs 28 release
00:24 Org mode 9.5
00:57 Magit major release
01:18 Completion
01:51 Embark
02:12 tree-sitter
02:44 Collaborative editing
03:03 Graphical experiments
03:41 Community
04:00 libera.chat
"""]]

That way, updating the talk pages with chapter descriptions should be less reliant on my Emacs Lisp functions for generating HTML, so it's more likely to be something other people can do.

If you happen to be interested in Emacs and you're planning to watch the talks or Q&A sessions anyway, you can help add chapter markers to videos that don't have them yet. You can either edit the wiki yourself or e-mail me chapter timestamps at . You can also help out by cross-referencing the chapter timestamps with the discussion session on the page, so that people reading the questions can see where to find the answers. If you're feeling extra-helpful, you could even write down the answers for easy reference.

Here are a few pages that have long Q&A sessions. I've linked to the autogenerated captions in the Discussion sections.

You can call dibs by editing https://etherpad.wikimedia.org/p/emacsconf-2021-volunteers .

Little steps towards making things easier to find! =)

Behind the scenes

I used the auto-generated captions from YouTube as a starting point, since I could skim them easily. I found that the .ass format was easier to speed-read than the .vtt format, so I used ffmpeg to convert them. Then I used emacsconf-subed-mark-chapter from emacsconf-subed to capture the timestamps as a .vtt file.

This is what part of the autogenerated captions looks like:

...
Dialogue: 0,0:01:16.11,0:01:18.11,Default,,0,0,0,,First of all, in your opinion, what is
Dialogue: 0,0:01:18.11,0:01:20.11,Default,,0,0,0,,Emacs' achilles heel? it's obviously a
Dialogue: 0,0:01:20.11,0:01:22.35,Default,,0,0,0,,powerful tool but no tool is perfect
...

and this is part of the chapters file I made:

00:00:26.319 --> 00:03:09.598
In your opinion, what is Emacs' Achilles heel?

00:03:09.599 --> 00:05:06.959
What is your opinion about the documentation of Emacs in other languages?
...

I converted the timestamps to a simple text format handy for including in video descriptions and on the wiki.

[[!template id="chapters" vidid="qanda" data="""
00:00 Thanks
00:26 In your opinion, what is Emacs' Achilles heel?
03:09 What is your opinion about the documentation of Emacs in other languages?
...
]]

A number of Emacs users browse the web without Javascript, so I wanted the chapter information to be available even then. Putting all the data into a pre tag seems like the easiest way to do it with an ikiwiki template. Here's the template I used:

<pre class="chapters" data-target="<TMPL_VAR vidid>">
<TMPL_VAR data>
</pre>

I also modified the IkiWiki htmlscrubber.pm plugin to allow the attributes I wanted, like data-target and data-start.

If Javascript was enabled, I wanted people to be able to click on the chapters in order to jump to the right spot in the video. I split the content into lines, parsed out the timestamps, and replaced the pre tag with the list of links. I also added the chapters as a hidden track in the video so that I could use the cuechange event to highlight the current chapter. This is what I added to the page.tmpl:

<script>
 // @license magnet:?xt=urn:btih:90dc5c0be029de84e523b9b3922520e79e0e6f08&dn=cc0.txt txt CC0-1.0
 // Copyright (c) 2021 Sacha Chua - CC0 Public Domain
 function displayChapters(elem) {
   var i;
   var chapter;
   var list = document.createElement('ol');
   list.setAttribute('class', 'chapters');
   var link;
   var target = elem.getAttribute('data-target');
   var video = document.getElementById(target);
   var track;
   if (video) {
     track = video.addTextTrack('chapters');
     track.mode = 'hidden';
   }
   var chapters = elem.textContent.split(/[ \t]*\n+[ \t]*/).forEach(function(line) {
     var m = (line.match(/^(([0-9]+:)?[0-9]+:[0-9]+)[ \t]+(.*)/));
     if (m) {
       var start = m[1];
       var text = m[3];
       chapter = document.createElement('li');
       link = document.createElement('a');
       link.setAttribute('href', '#');
       link.setAttribute('data-video', target);
       link.setAttribute('data-start', start);
       link.setAttribute('data-start-s', parseSeconds(start));
       link.appendChild(document.createTextNode(m[1] + ' ' + text));
       link.onclick = handleSubtitleClick;
       chapter.appendChild(link);
       list.appendChild(chapter);
       if (track) {
         var time = parseSeconds(start);
         if (track.cues.length > 0) {
           track.cues[track.cues.length - 1].endTime = time - 1;
         }
         track.addCue(new VTTCue(time, time, text));
       }
     }
   })
   if (track && track.cues.length > 0) {
     video.addEventListener('durationchange', function() {
       track.cues[track.cues.length - 1].endTime = video.duration;
     });
     track.addEventListener('cuechange', function() {
       if (!this.activeCues[0]) return;
       if (list.querySelector('.current')) {
         list.querySelector('.current').className = '';
       }
       var chapter;
       if (chapter = list.querySelector('a[data-start-s="' + this.activeCues[0].startTime + '"]')) {
         chapter.parentNode.className = 'current';
       }
     });
   }
   elem.parentNode.replaceChild(list, elem);
 }
  
  document.querySelectorAll('pre.chapters').forEach(displayChapters);

 // @license-end
</script>

handleSubtitleClick is also part of the JS on that page. It sets the current time of the video and scrolls so that the video is in view.

Using word-level timing information when editing subtitles or captions in Emacs

| emacs, subed, video

2022-10-26: Merged word-level timing support into subed.el, so I don't need my old caption functions.

2022-04-18: Switched to using yt-dlp.

I like to split captions at logical points, such as at the end of a phrase or sentence. At first, I used subed.el to play the video for the caption, pausing it at the appropriate point and then calling subed-split-subtitle to split at the playback position. Then I modified subed-split-subtitle to split at the video position that's proportional to the text position, so that it's roughly in the right spot even if I'm not currently listening. That got me most of the way to being able to quickly edit subtitles.

It turns out that word-level timing is actually available from YouTube if I download the autogenerated SRV2 file using yt-dlp, which I can do with the following function:

(defun my-caption-download-srv2 (id)
  (interactive "MID: ")
  (require 'subed-word-data)
  (when (string-match "v=\\([^&]+\\)" id) (setq id (match-string 1 id)))
  (let ((default-directory "/tmp"))
    (call-process "yt-dlp" nil nil nil "--write-auto-sub" "--write-sub" "--no-warnings" "--sub-lang" "en" "--skip-download" "--sub-format" "srv2"
                  (concat "https://youtu.be/" id))
    (subed-word-data-load-from-file (my-latest-file "/tmp" "\\.srv2\\'"))))

2022-10-26: I can also generate a SRV2-ish file using torchaudio, which I can then load with subed-word-data-load-from-file.

(defun my-caption-fix-common-errors (data)
  (mapc (lambda (o)
          (mapc (lambda (e)
                  (when (string-match (concat "\\<" (regexp-opt (if (listp e) (seq-remove (lambda (s) (string= "" s)) e)
                                                                  (list e)))
                                              "\\>")
                                      (alist-get 'text o))
                    (map-put! o 'text (replace-match (car (if (listp e) e (list e))) t t (alist-get 'text o)))))
                my-subed-common-edits))
        data))

Assuming I start editing from the beginning of the file, then the part of the captions file after point is mostly unedited. That means I can match the remainder of the current caption with the word-level timing to try to figure out the time to use when splitting the subtitle, falling back to the proportional method if the data is not available.

(defun subed-avy-set-up-actions ()
  (interactive)
  (make-local-variable 'avy-dispatch-alist)
  (add-to-list
   'avy-dispatch-alist
   (cons ?, 'subed-split-subtitle)))

(use-package subed
  :if my-laptop-p
  :load-path "~/vendor/subed/subed"
  :hook
  (subed-mode . display-fill-column-indicator-mode)
  (subed-mode . subed-avy-set-up-actions)
  :bind
  (:map subed-mode-map
        ("M-," . subed-split-subtitle)
        ("M-." . subed-merge-with-next)
        ("M-p" . avy-goto-char-timer)
        ("M-e" . avy-goto-char-timer)))

That way, I can use the word-level timing information for most of the reformatting, but I can easily replay segments of the video if I'm unsure about a word that needs to be changed.

If I want to generate a VTT based on the caption data, breaking it at certain words, these functions help:

(defvar my-caption-breaks
  '("the" "this" "we" "we're" "I" "finally" "but" "and" "when")
  "List of words to try to break at.")
(defun my-caption-make-groups (list &optional threshold)
  (let (result
        current-item
        done
        (current-length 0)
        (limit (or threshold 70))
        (lower-limit 30)
        (break-regexp (concat "\\<" (regexp-opt my-caption-breaks) "\\>")))
    (while list
      (cond
       ((null (car list)))
       ((string-match "^\n*$" (alist-get 'text (car list)))
        (push (cons '(text . " ") (car list)) current-item)
        (setq current-length (1+ current-length)))
       ((< (+ current-length (length (alist-get 'text (car list)))) limit)
        (setq current-item (cons (car list) current-item)
              current-length (+ current-length (length (alist-get 'text (car list))) 1)))
       (t (setq done nil)
          (while (not done)
          (cond
           ((< current-length lower-limit)
            (setq done t))
           ((and (string-match break-regexp (alist-get 'text (car current-item)))
                 (not (string-match break-regexp (alist-get 'text (cadr current-item)))))
            (setq current-length (- current-length (length (alist-get 'text (car current-item)))))
            (push (pop current-item) list)
            (setq done t))
           (t
            (setq current-length (- current-length (length (alist-get 'text (car current-item)))))
            (push (pop current-item) list))))
          (push nil list)
          (setq result (cons (reverse current-item) result) current-item nil current-length 0)))
      (setq list (cdr list)))
    (reverse result)))

(defun my-caption-format-as-subtitle (list &optional word-timing)
  "Turn a LIST of the form (((start . ms) (end . ms) (text . s)) ...) into VTT.
If WORD-TIMING is non-nil, include word-level timestamps."
  (format "%s --> %s\n%s\n\n"
          (subed-vtt--msecs-to-timestamp (alist-get 'start (car list)))
          (subed-vtt--msecs-to-timestamp (alist-get 'end (car (last list))))
          (s-trim (mapconcat (lambda (entry)
                               (if word-timing
                                   (format " <%s>%s"
                                           (subed-vtt--msecs-to-timestamp (alist-get 'start entry))
                                           (string-trim (alist-get 'text entry)))
                                 (alist-get 'text entry)))
                             list ""))))

(defun my-caption-to-vtt (&optional data)
  (interactive)
  (with-temp-file "captions.vtt"
    (insert "WEBVTT\n\n"
            (mapconcat
             (lambda (entry) (my-caption-format-as-subtitle entry))
             (my-caption-make-groups
              (or data (my-caption-fix-common-errors subed-word-data--cache)))
             ""))))
This is part of my Emacs configuration.

Using Emacs to fix automatically generated subtitle timestamps

Posted: - Modified: | emacs, subed

I like how people are making more and more Emacs-related videos. I think subtitles, transcripts, and show notes would go a long way to helping people quickly search, skim, and squeeze these videos into their day.

Youtube's automatically-generated subtitles overlap. I think some players scroll the subtitles, but the ones I use just display them in alternating positions. I like to have non-overlapping subtitles, so here's some code that works with subed.el to fix the timestamps.

(defun my/subed-fix-timestamps ()
  "Change all ending timestamps to the start of the next subtitle."
  (goto-char (point-max))
  (let ((timestamp (subed-subtitle-msecs-start)))
    (while (subed-backward-subtitle-time-start)
      (subed-set-subtitle-time-stop timestamp)
      (setq timestamp (subed-subtitle-msecs-start)))))

Then it's easy to edit the subtitles (punctuation, capitalization, special terms), especially with the shortcuts for splitting and merging subtitles.

For transcripts with starting and ending timestamps per paragraph, I like using the merge shortcut to merge all the subtitles for a paragraph together. Here's a sample: https://emacsconf.org/2020/talks/05/

Tonight I edited automatically-generated subtitles for a screencast that was about 40 minutes long. The resulting file had 1157 captions, so about 2 seconds each. I finished it in about 80 minutes, pretty much the 2x speed that I've been seeing. I can probably get a little faster if I figure out good workflows for:

  • jumping: avy muscle memory, maybe?
  • splitting things into sentences and phrases
  • fixing common speech recognition errors (ex: emax -> Emacs, which I handle with regex replaces; maybe a list of them?)

I experimented with making a hydra for this before, but thinking about the keys to use slowed me down a bit and it didn't flow very well. Might be worth tinkering with.

Transcribing from scratch takes me about 4-5x playtime. I haven't tweaked my workflow for that one yet because I've only transcribed one talk with subed.el , and there's a backlog of talks that already have automatically generated subtitles to edit. Low-hanging fruit! =)

So that's another thing I (or other people) can occasionally do to help out even if I don't have enough focused time to think about a programming challenge or do a podcast myself. And I get to learn more in the process, too. Fun!