#YayEmacs 9: Trimming/adding silences to get to a target; subed-record-sum-time
| subed, yay-emacs, emacs, video
New in this video: subed-record-sum-time, #+PAD_LEFT
and #+PAD_RIGHT
I like the constraints of a one-minute video, so I added a subed-record-sum-time command. That way, when I edit the video using Emacs, I can check how long the result will be. First, I split the subtitles, align it with the audio to fix the timestamps, and double check the times. Then I can skip my oopses. Sometimes WhisperX doesn't catch them, so I also look at waveforms and characters per second. I already talk quickly, so I'm not going to speed that up but I can trim the pauses in between phrases which is easy to do with waveforms. Sometimes, after reviewing a draft, I realize I need a little more time. If the original audio has some silence, I can just copy and paste it. If not, I can pad left or pad right to add some silence. I can try the flow of some sections and compile the video when I'm ready. Emacs can do almost anything. Yay Emacs!
You can watch this on YouTube, download the video, or download the audio.
Play by play:
- I like the constraints of a one-minute video, so I added a
subed-record-sum-time
command. That way, when I edit the video using Emacs, I can check how long the result will be.- subed-record uses subtitles and directives in
comments in a VTT subtitle file to edit audio
and video.
subed-record-sum-time
calculates the resulting duration and displays it in the minibuffer.
- subed-record uses subtitles and directives in
comments in a VTT subtitle file to edit audio
and video.
- First, I split the subtitles, align it with the audio to fix the timestamps, and double check the times.
- I'm experimenting with an algorithmic way to
combine the breaks from my script with the
text from the transcript.
subed-align
calls the aeneas forced alignment tool to match up the text with the timestamps. I usesubed-waveform-show-all
to show all the waveforms.
- I'm experimenting with an algorithmic way to
combine the breaks from my script with the
text from the transcript.
- Then I can skip my oopses.
- Adding a
NOTE #+SKIP
comment before a subtitle makessubed-record-compile-video
andsubed-record-compile-flow
skip that part of the audio.
- Adding a
- Sometimes WhisperX doesn't catch them,
- WhisperX sometimes doesn't transcribe my false starts if I repeat things quickly.
- so I also look at waveforms
subed-waveform-show-all
adds waveforms for all the subtitles. If I notice there's a pause or a repeated shape in the waveform, or if I listen and notice the repetition, I can confirm by middle-clicking on the waveform to sample part of it.
- and characters per second.
- Low characters per second is sometimes a sign that the timestamps are incorrect or there's a repetition that wasn't transcribed.
- I already talk quickly, so I'm not going to speed that up
- Also, I already sound like a chipmunk; mechanically speeding up my recording to fit in a certain time will make that worse =)
- but I can trim the pauses in between phrases which is easy to do with waveforms.
- left-click to set the start, right-click to set the stop. If I want to adjust the previous/next one at the same time, I would use shift-left-click or shift-right-click, but here I want to skip the gaps between phrases, so I adjust the current subtitle without making the previous/next one longer.
- Sometimes, after reviewing a draft, I realize I need a little more time.
- I can specify visuals like a video, animated
GIF, or an image by adding a
[[file:...]]
link in the comment for a subtitle. That visual will be used until the next visual is specified in a comment on a different subtitle.subed-record-compile-video
can automatically speed up video clips to fit in the time for the current audio segment, which is the set of subtitles before the next visual is defined. After I compile and review the video, sometimes I notice that something goes by too quickly.
- I can specify visuals like a video, animated
GIF, or an image by adding a
- If the original audio has some silence, I can just copy and paste it.
- This can sometimes feel more natural than adding in complete silence.
- If not, I can pad left or pad right to add some silence.
- I added a new feature so that I could specify
something like
#+PAD_RIGHT: 1.5
in a comment to add 1.5 seconds of silence after the audio specified by that subtitle.
- I added a new feature so that I could specify
something like
- I can try the flow of some sections
- I can select a region and then use
M-x subed-record-compile-try-flow
to play the audio orC-u M-x subed-record-compile-try-flow
to play the audio+video for that region.
- I can select a region and then use
- and compile the video when I'm ready.
subed-record-compile-video
compiles the video to the file specified in#+OUTPUT: filename
. ffmpeg is very arcane, so I'm glad I can simplify my use of it with Emacs Lisp.
- Emacs can do almost anything. Yay Emacs!
- Non-linear audio and video editing is actually
pretty fun in a text editor, especially when I
can just use
M-x vundo
to navigate my undo history.
- Non-linear audio and video editing is actually
pretty fun in a text editor, especially when I
can just use
Links:
- sachac/subed: subed is a subtitle editor for Emacs
- sachac/subed-record: Record audio in segments and compile it into a file
- m-bain/whisperX: WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
- readbeyond/aeneas: aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Related: