Using Emacs and Python to record an animation and synchronize it with audio
| emacs, emacsconf, python, subed, video: Removed my fork since upstream now has the :eval function.
The Q&A session for Things I'd like to see in Emacs (Richard Stallman) from EmacsConf 2022 was done over Mumble. Amin pasted the questions into the Mumble chat buffer and I copied them into a larger buffer as the speaker answered them, but I didn't do it consistently. I figured it might be worth making another video with easier-to-read visuals. At first, I thought about using LaTeX to create Beamer slides with the question text, which I could then turn into a video using ffmpeg. Then I decided to figure out how to animate the text in Emacs, because why not? I figured a straightforward typing animation would probably be less distracting than animate-string
, and emacs-director seems to handle that nicely. I forked it to add a few things I wanted, like variables to make the typing speed slower (so that it could more reliably type things on my old laptop, since sometimes the timers seemed to have hiccups) and an . (2023-01-14: Upstream has the :eval feature now.)
:eval
step for running things without needing to log them
To make it easy to synchronize the resulting animation with the chapter markers I derived from the transcript of the audio file, I decided to beep between scenes. First step: make a beep file.
ffmpeg -y -f lavfi -i 'sine=frequency=1000:duration=0.1' beep.wav
Next, I animated the text, with a beep between scenes. I used
subed-parse-file
to read the question text directly from the chapter
markers, and I used simplescreenrecorder to set up the recording
settings (including audio).
(defun my-beep () (interactive) (save-window-excursion (shell-command "aplay ~/recordings/beep.wav &" nil nil))) (require 'director) (defvar emacsconf-recording-process nil) (shell-command "xdotool getwindowfocus windowsize 1282 720") (progn (switch-to-buffer (get-buffer-create "*Questions*")) (erase-buffer) (org-mode) (face-remap-add-relative 'default :height 300) (setq-local mode-line-format " Q&A for EmacsConf 2022: What I'd like to see in Emacs (Richard M. Stallman) - emacsconf.org/2022/talks/rms") (sit-for 3) (delete-other-windows) (hl-line-mode -1) (when (process-live-p emacsconf-recording-process) (kill-process emacsconf-recording-process)) (setq emacsconf-recording-process (start-process "ssr" (get-buffer-create "*ssr*") "simplescreenrecorder" "--start-recording" "--start-hidden")) (sit-for 3) (director-run :version 1 :log-target '(file . "/tmp/director.log") :before-start (lambda () (switch-to-buffer (get-buffer-create "*Questions*")) (delete-other-windows)) :steps (let ((subtitles (subed-parse-file "~/proj/emacsconf/rms/emacsconf-2022-rms--what-id-like-to-see-in-emacs--answers--chapters.vtt"))) (apply #'append (list (list :eval '(my-beep)) (list :type "* Q&A for Richard Stallman's EmacsConf 2022 talk: What I'd like to see in Emacs\nhttps://emacsconf.org/2022/talks/rms\n\n")) (mapcar (lambda (sub) (list (list :log (elt sub 3)) (list :eval '(progn (org-end-of-subtree) (unless (bolp) (insert "\n")))) (list :type (concat "** " (elt sub 3) "\n\n")) (list :eval '(org-back-to-heading)) (list :wait 5) (list :eval '(my-beep)))) subtitles))) :typing-style 'human :delay-between-steps 0 :after-end (lambda () (process-send-string emacsconf-recording-process "record-save\nwindow-show\nquit\n")) :on-failure (lambda () (process-send-string emacsconf-recording-process "record-save\nwindow-show\nquit\n")) :on-error (lambda () (process-send-string emacsconf-recording-process "record-save\nwindow-show\nquit\n"))))
I used the following code to copy the latest recording to animation.webm
and extract the audio to animation.wav
. my-latest-file
and my-recordings-dir
are in my Emacs config.
(let ((name "animation.webm")) (copy-file (my-latest-file my-recordings-dir) name t) (shell-command (format "ffmpeg -y -i %s -ar 8000 -ac 1 %s.wav" (shell-quote-argument name) (shell-quote-argument (file-name-sans-extension name)))))
Then I needed to get the timestamps of the beeps in the recording. I subtracted a little bit (0.82
seconds) based on comparing the waveform with the results.
filename = "animation.wav" from scipy.io import wavfile from scipy import signal import numpy as np import re rate, source = wavfile.read(filename) peaks = signal.find_peaks(source, height=1000, distance=1000) base_times = (peaks[0] / rate) - 0.82 print(base_times)
I noticed that the first question didn't seem to get beeped properly, so I tweaked the times. Then I wrote some code to generate a very long ffmpeg command that used trim and tpad to select the segments and extend them to the right durations. There was some drift when I did it without the audio track, but the timestamps seemed to work right when I included the Q&A audio track as well.
import webvtt import subprocess chapters_filename = "emacsconf-2022-rms--what-id-like-to-see-in-emacs--answers--chapters.vtt" answers_filename = "answers.wav" animation_filename = "animation.webm" def get_length(filename): result = subprocess.run(["ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", filename], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) return float(result.stdout) def get_frames(filename): result = subprocess.run(["ffprobe", "-v", "error", "-select_streams", "v:0", "-count_packets", "-show_entries", "stream=nb_read_packets", "-of", "csv=p=0", filename], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) return float(result.stdout) answers_length = get_length(answers_filename) # override base_times times = np.asarray([ 1.515875, 13.50, 52.32125 , 81.368625, 116.66625 , 146.023125, 161.904875, 182.820875, 209.92125 , 226.51525 , 247.93875 , 260.971 , 270.87375 , 278.23325 , 303.166875, 327.44925 , 351.616375, 372.39525 , 394.246625, 409.36325 , 420.527875, 431.854 , 440.608625, 473.86825 , 488.539 , 518.751875, 544.1515 , 555.006 , 576.89225 , 598.157375, 627.795125, 647.187125, 661.10875 , 695.87175 , 709.750125, 717.359875]) fps = 30.0 times = np.append(times, get_length(animation_filename)) anim_spans = list(zip(times[:-1], times[1:])) chapters = webvtt.read(chapters_filename) if chapters[0].start_in_seconds == 0: vtt_times = [[c.start_in_seconds, c.text] for c in chapters] else: vtt_times = [[0, "Introduction"]] + [[c.start_in_seconds, c.text] for c in chapters] vtt_times = vtt_times + [[answers_length, "End"]] # Add ending timestamps vtt_times = [[x[0][0], x[1][0], x[0][1]] for x in zip(vtt_times[:-1], vtt_times[1:])] test_rate = 1.0 i = 0 concat_list = "" groups = list(zip(anim_spans, vtt_times)) import ffmpeg animation = ffmpeg.input('animation.webm').video audio = ffmpeg.input('rms.opus') for_overlay = ffmpeg.input('color=color=black:size=1280x720:d=%f' % answers_length, f='lavfi') params = {"b:v": "1k", "vcodec": "libvpx", "r": "30", "crf": "63"} test_limit = 1 params = {"vcodec": "libvpx", "r": "30", "copyts": None, "b:v": "1M", "crf": 24} test_limit = 0 anim_rate = 1 import math cursor = 0 if test_limit > 0: groups = groups[0:test_limit] clips = [] # cursor is the current time for anim, vtt in groups: padding = vtt[1] - cursor - (anim[1] - anim[0]) / anim_rate if (padding < 0): print("Squeezing", math.floor((anim[1] - anim[0]) / (anim_rate * 1.0)), 'into', vtt[1] - cursor, padding) clips.append(animation.trim(start=anim[0], end=anim[1]).setpts('PTS-STARTPTS')) elif padding == 0: clips.append(animation.trim(start=anim[0], end=anim[1]).setpts('PTS-STARTPTS')) else: print("%f to %f: Padding %f into %f - pad: %f" % (cursor, vtt[1], (anim[1] - anim[0]) / (anim_rate * 1.0), vtt[1] - cursor, padding)) cursor = cursor + padding + (anim[1] - anim[0]) / anim_rate clips.append(animation.trim(start=anim[0], end=anim[1]).setpts('PTS-STARTPTS').filter('tpad', stop_mode="clone", stop_duration=padding)) for_overlay = for_overlay.overlay(animation.trim(start=anim[0], end=anim[1]).setpts('PTS-STARTPTS+%f' % vtt[0])) clips.append(audio.filter('atrim', start=vtt[0], end=vtt[1]).filter('asetpts', 'PTS-STARTPTS')) args = ffmpeg.concat(*clips, v=1, a=1).output('output.webm', **params).overwrite_output().compile() print(' '.join(f'"{item}"' for item in args))
Anyway, it's here for future reference. =)