#YayEmacs 9: Trimming/adding silences to get to a target; subed-record-sum-time

Jan 9, 2025| audio, subed, yay-emacs, emacs, video

New in this video: subed-record-sum-time, #+PAD_LEFT and #+PAD_RIGHT

I like the constraints of a one-minute video, so I added a subed-record-sum-time command. That way, when I edit the video using Emacs, I can check how long the result will be. First, I split the subtitles, align it with the audio to fix the timestamps, and double check the times. Then I can skip my oopses. Sometimes WhisperX doesn't catch them, so I also look at waveforms and characters per second. I already talk quickly, so I'm not going to speed that up but I can trim the pauses in between phrases which is easy to do with waveforms. Sometimes, after reviewing a draft, I realize I need a little more time. If the original audio has some silence, I can just copy and paste it. If not, I can pad left or pad right to add some silence. I can try the flow of some sections and compile the video when I'm ready. Emacs can do almost anything. Yay Emacs!

You can watch this on YouTube, download the video, or download the audio.

Play by play:

I like the constraints of a one-minute video, so I added a subed-record-sum-time command. That way, when I edit the video using Emacs, I can check how long the result will be.
- subed-record uses subtitles and directives in comments in a VTT subtitle file to edit audio and video. subed-record-sum-time calculates the resulting duration and displays it in the minibuffer.
First, I split the subtitles, align it with the audio to fix the timestamps, and double check the times.
- I'm experimenting with an algorithmic way to combine the breaks from my script with the text from the transcript. subed-align calls the aeneas forced alignment tool to match up the text with the timestamps. I use subed-waveform-show-all to show all the waveforms.
Then I can skip my oopses.
- Adding a NOTE #+SKIP comment before a subtitle makes subed-record-compile-video and subed-record-compile-flow skip that part of the audio.
Sometimes WhisperX doesn't catch them,
- WhisperX sometimes doesn't transcribe my false starts if I repeat things quickly.
so I also look at waveforms
- subed-waveform-show-all adds waveforms for all the subtitles. If I notice there's a pause or a repeated shape in the waveform, or if I listen and notice the repetition, I can confirm by middle-clicking on the waveform to sample part of it.
and characters per second.
- Low characters per second is sometimes a sign that the timestamps are incorrect or there's a repetition that wasn't transcribed.
I already talk quickly, so I'm not going to speed that up
- Also, I already sound like a chipmunk; mechanically speeding up my recording to fit in a certain time will make that worse =)
but I can trim the pauses in between phrases which is easy to do with waveforms.
- left-click to set the start, right-click to set the stop. If I want to adjust the previous/next one at the same time, I would use shift-left-click or shift-right-click, but here I want to skip the gaps between phrases, so I adjust the current subtitle without making the previous/next one longer.
Sometimes, after reviewing a draft, I realize I need a little more time.
- I can specify visuals like a video, animated GIF, or an image by adding a [[file:...]] link in the comment for a subtitle. That visual will be used until the next visual is specified in a comment on a different subtitle. subed-record-compile-video can automatically speed up video clips to fit in the time for the current audio segment, which is the set of subtitles before the next visual is defined. After I compile and review the video, sometimes I notice that something goes by too quickly.
If the original audio has some silence, I can just copy and paste it.
- This can sometimes feel more natural than adding in complete silence.
If not, I can pad left or pad right to add some silence.
- I added a new feature so that I could specify something like #+PAD_RIGHT: 1.5 in a comment to add 1.5 seconds of silence after the audio specified by that subtitle.
I can try the flow of some sections
- I can select a region and then use M-x subed-record-compile-try-flow to play the audio or C-u M-x subed-record-compile-try-flow to play the audio+video for that region.
and compile the video when I'm ready.
- subed-record-compile-video compiles the video to the file specified in #+OUTPUT: filename. ffmpeg is very arcane, so I'm glad I can simplify my use of it with Emacs Lisp.
Emacs can do almost anything. Yay Emacs!
- Non-linear audio and video editing is actually pretty fun in a text editor, especially when I can just use M-x vundo to navigate my undo history.

Links:

View org source for this post

You can e-mail me at sacha@sachachua.com.