category - video :: Sacha Chua

Categories: video

Updating YouTube videos via the YouTube Data API using Emacs Lisp and url-http-oauth

Dec 9, 2023| elisp, emacs, emacsconf, youtube, video

We upload EmacsConf videos to both YouTube and Toobnix, which is a PeerTube instance. This makes it easier for people to come across them after the conference.

I can upload to Toobnix and set titles and descriptions using the peertube-cli tool. I tried a Python script for uploading to YouTube, but it was a bit annoying due to quota restrictions. Instead, I uploaded the videos by dragging and dropping them into YouTube Studio. This allowed me to upload 15 at a time.

The videos on YouTube had just the filenames. I wanted to rename the videos and set the descriptions. In 2022, I used xdotool, simulating mouse clicks and pasting in text for larger text blocks.

Xdotool script

(defun my-xdotool-insert-mouse-location
    (interactive)
  (let ((pos (shell-command-to-string "xdotool getmouselocation")))
    (when (string-match "x:\\([0-9]+\\) y:\\([0-9]+\\)" pos)
      (insert (format "(shell-command \"xdotool mousemove %s %s click 1\")\n" (match-string 1 pos) (match-string 2 pos))))))

(setq list (seq-filter (lambda (o)
                         (and
                          (file-exists-p
                           (expand-file-name
                            (concat (plist-get o :video-slug) "--final.webm")
                            emacsconf-cache-dir))
                          (null (plist-get o :youtube-url))))
            (emacsconf-publish-prepare-for-display (emacsconf-get-talk-info))))

(while list
  (progn
    (shell-command "xdotool mousemove 707 812 click 1 sleep 2")

    (setq talk (pop list))
    ;; click create
    (shell-command "xdotool mousemove 843 187 click 1 sleep 1")
    ;; video
    (shell-command "xdotool mousemove 833 217 click 1 sleep 1")
    ;; select files
    (shell-command (concat "xdotool mousemove 491 760 click 1 sleep 4 type "
                           (shell-quote-argument (concat (plist-get talk :video-slug) "--final.webm"))))
    ;; open
    (shell-command "xdotool mousemove 1318 847 click 1 sleep 5")

    (kill-new (concat
               emacsconf-name " "
               emacsconf-year ": "
               (plist-get talk :title)
               " - "
               (plist-get talk :speakers-with-pronouns)))
    (shell-command "xdotool sleep 1 mousemove 331 440 click :1 key Ctrl+a Delete sleep 1 key Ctrl+Shift+v sleep 2")

    (kill-new (emacsconf-publish-video-description talk t))
    (shell-command "xdotool mousemove 474 632 click 1 sleep 1 key Ctrl+a sleep 1 key Delete sleep 1 key Ctrl+Shift+v"))
  (read-string "Press a key once you've pasted in the description")

  ;; next
  (when (emacsconf-captions-edited-p (expand-file-name (concat (plist-get talk :video-slug) "--main.vtt") emacsconf-cache-dir))
    (shell-command "xdotool mousemove 352 285 click 1 sleep 1")

    ;; add captions
    (shell-command "xdotool mousemove 877 474 click 1 sleep 3")
    (shell-command "xdotool mousemove 165 408 click 1 sleep 1")
    (shell-command "xdotool mousemove 633 740 click 1 sleep 2")
    (shell-command (concat "xdotool mousemove 914 755  click 1 sleep 4 type "
                           (shell-quote-argument (concat (plist-get talk :video-slug) "--main.vtt"))))
    (read-string "Press a key once you've loaded the VTT")
    (shell-command "xdotool mousemove 910 1037 sleep 1 click 1 sleep 4")
    ;; done
    (shell-command "xdotool mousemove 890 297 click 1 sleep 3")
    )


  (progn
    ;; visibility
    (shell-command "xdotool mousemove 810 303 click 1 sleep 2")
    ;; public
    (shell-command "xdotool mousemove 119 614 click 1 sleep 2")
    ;; copy
    (shell-command "xdotool mousemove 882 669 click 1 sleep 1")
    ;; done
    (shell-command "xdotool mousemove 908 1089 click 1 sleep 5 key Alt+Tab")

    (emacsconf-with-talk-heading talk
      (org-entry-put (point) "YOUTUBE_URL" (read-string "URL: "))
      ))
  )

Using xdotool wasn't very elegant, since I needed to figure out the coordinates for each click. I tried using Spookfox to control Mozilla Firefox from Emacs, but Youtube's editing interface didn't seem to have any textboxes that I could set. I decided to use EmacsConf 2023 as an excuse to learn how to talk to the Youtube Data API, which required figuring out OAuth. Even though it was easy to find examples in Python and NodeJS, I wanted to see if I could stick with using Emacs Lisp so that I could add the code to the emacsconf-el repository.

After a quick search, I picked url-http-oauth as the library that I'd try first. I used the url-http-oauth-demo.el included in the package to figure out what to set for the YouTube Data API. I wrote a function to make getting the redirect URL easier (emacsconf-extract-oauth-browse-and-prompt). Once I authenticated successfully, I explored using alphapapa's plz library. It can handle finding the JSON object and parsing it out for me. With it, I updated videos to include titles and descriptions from my Emacs code, and I copied the video IDs into my Org properties.

emacsconf-extract.el code for Youtube renaming

;;; YouTube

;; When the token needs refreshing, delete the associated lines from
;; ~/.authinfo This code just sets the title and description. Still
;; need to figure out how to properly set the license, visibility,
;; recording date, and captions.
;;
;; To avoid being prompted for the client secret, it's helpful to have a line in ~/.authinfo or ~/.authinfo.gpg with
;; machine https://oauth2.googleapis.com/token username CLIENT_ID password CLIENT_SECRET

(defvar emacsconf-extract-google-client-identifier nil)
(defvar emacsconf-extract-youtube-api-channels nil)
(defvar emacsconf-extract-youtube-api-categories nil)

(defun emacsconf-extract-oauth-browse-and-prompt (url)
  "Open URL and wait for the redirected code URL."
  (browse-url url)
  (read-from-minibuffer "Paste the redirected code URL: "))

(defun emacsconf-extract-youtube-api-setup ()
  (interactive)
  (require 'plz)
  (require 'url-http-oauth)
  (when (getenv "GOOGLE_APPLICATION_CREDENTIALS")
    (let-alist (json-read-file (getenv "GOOGLE_APPLICATION_CREDENTIALS"))
      (setq emacsconf-extract-google-client-identifier .web.client_id)))
  (unless (url-http-oauth-interposed-p "https://youtube.googleapis.com/youtube/v3/")
    (url-http-oauth-interpose
     `(("client-identifier" . ,emacsconf-extract-google-client-identifier)
       ("resource-url" . "https://youtube.googleapis.com/youtube/v3/")
       ("authorization-code-function" . emacsconf-extract-oauth-browse-and-prompt)
       ("authorization-endpoint" . "https://accounts.google.com/o/oauth2/v2/auth")
       ("authorization-extra-arguments" .
        (("redirect_uri" . "http://localhost:8080")))
       ("access-token-endpoint" . "https://oauth2.googleapis.com/token")
       ("scope" . "https://www.googleapis.com/auth/youtube")
       ("client-secret-method" . prompt))))
  (setq emacsconf-extract-youtube-api-channels
        (plz 'get "https://youtube.googleapis.com/youtube/v3/channels?part=contentDetails&mine=true"
          :headers `(("Authorization" . ,(url-oauth-auth "https://youtube.googleapis.com/youtube/v3/")))
          :as #'json-read))
  (setq emacsconf-extract-youtube-api-categories
        (plz 'get "https://youtube.googleapis.com/youtube/v3/videoCategories?part=snippet&regionCode=CA"
          :headers `(("Authorization" . ,(url-oauth-auth "https://youtube.googleapis.com/youtube/v3/")))
          :as #'json-read))
  (setq emacsconf-extract-youtube-api-videos
        (plz 'get (concat "https://youtube.googleapis.com/youtube/v3/playlistItems?part=snippet,contentDetails,status&forMine=true&order=date&maxResults=50&playlistId="
                          (url-hexify-string
                           (let-alist (elt (assoc-default 'items emacsconf-extract-youtube-api-channels) 0)
                             .contentDetails.relatedPlaylists.uploads)
                           ))
          :headers `(("Authorization" . ,(url-oauth-auth "https://youtube.googleapis.com/youtube/v3/")))
          :as #'json-read)))

(defvar emacsconf-extract-youtube-tags '("emacs" "emacsconf"))
(defun emacsconf-extract-youtube-object (video-id talk &optional privacy-status)
  "Format the video object for VIDEO-ID using TALK details."
  (setq privacy-status (or privacy-status "unlisted"))
  (let ((properties (emacsconf-publish-talk-video-properties talk 'youtube)))
    `((id . ,video-id)
      (kind . "youtube#video")
      (snippet
       (categoryId . "28")
       (title . ,(plist-get properties :title))
       (tags . ,emacsconf-extract-youtube-tags)
       (description . ,(plist-get properties :description))
       ;; Even though I set recordingDetails and status, it doesn't seem to stick.
       ;; I'll leave this in here in case someone else can figure it out.
       (recordingDetails (recordingDate . ,(format-time-string "%Y-%m-%dT%TZ" (plist-get talk :start-time) t))))
      (status (privacyStatus . "unlisted")
              (license . "creativeCommon")))))

(defun emacsconf-extract-youtube-api-update-video (video-object)
  "Update VIDEO-OBJECT."
  (let-alist video-object
    (let* ((slug (cond
                  ;; not yet renamed
                  ((string-match (rx (literal emacsconf-id) " " (literal emacsconf-year) " "
                                     (group (1+ (or (syntax word) "-")))
                                     "  ")
                                 .snippet.title)
                   (match-string 1 .snippet.title))
                  ;; renamed, match the description instead
                  ((string-match (rx (literal emacsconf-base-url) (literal emacsconf-year) "/talks/"
                                     (group (1+ (or (syntax word) "-"))))
                                 .snippet.description)
                   (match-string 1 .snippet.description))
                  ;; can't find, prompt
                  (t
                   (when (string-match (rx (literal emacsconf-id) " " (literal emacsconf-year))
                                       .snippet.title)
                     (completing-read (format "Slug for %s: "
                                              .snippet.title)
                                      (seq-map (lambda (o) (plist-get o :slug))
                                               (emacsconf-publish-prepare-for-display (emacsconf-get-talk-info))))))))
           (video-id .snippet.resourceId.videoId)
           (id .id)
           result)
      (when slug
        ;; set the YOUTUBE_URL property
        (emacsconf-with-talk-heading slug
          (org-entry-put (point) "YOUTUBE_URL" (concat "https://www.youtube.com/watch?v=" video-id))
          (org-entry-put (point) "YOUTUBE_ID" id))
        (plz 'put "https://www.googleapis.com/youtube/v3/videos?part=snippet,recordingDetails,status"
          :headers `(("Authorization" . ,(url-oauth-auth "https://youtube.googleapis.com/youtube/v3/"))
                     ("Accept" . "application/json")
                     ("Content-Type" . "application/json"))
          :body (json-encode (emacsconf-extract-youtube-object video-id (emacsconf-resolve-talk slug))))))))

(defun emacsconf-extract-youtube-rename-videos (&optional videos)
  "Rename videos and set the YOUTUBE_URL property in the Org heading."
  (let ((info (emacsconf-get-talk-info)))
    (mapc
     (lambda (video)
       (when (string-match (rx (literal emacsconf-id) " " (literal emacsconf-year)))
         (emacsconf-extract-youtube-api-update-video video)))
     (assoc-default 'items (or videos emacsconf-extract-youtube-api-videos)))))

(provide 'emacsconf-extract)

I haven't quite figured out how to set status and recordingDetails properly. The code sets them, but they don't stick. That's okay. I think I can set those as a batch operation. It looks like I need to change visibility one by one, though, which might be a good opportunity to check the end of the video for anything that needs to be trimmed off.

I also want to figure out how to upload captions. I'm not entirely sure how to do multipart form data yet with the url library or plz. It might be nice to someday set up an HTTP server so that Emacs can handle OAuth redirects itself. I'll save that for another blog post and share my notes for now.

This code is in emacsconf-extract.el.

You can e-mail me at sacha@sachachua.com.

Figuring out how to use ffmpeg to mask a chroma-keyed video based on the differences between images

Dec 25, 2022| linux, geek, ffmpeg, video

A- is really into Santa and Christmas because of the books she's read. Last year, she wanted to set up the GoPro to capture footage during Christmas Eve. I helped her set it up for a timelapse video. After she went to bed, we gradually positioned the presents. I extracted the frames from the video, removed the ones that caught us moving around, and then used Krita's new animation features to animate sparkles so that the presents magically appeared. She mentioned the sparkles a number of times during her deliberations about whether Santa exists or not.

This year, I want to see if I can use green-screen videos like this reversed-spin sparkle or this other sparkle video. I'm going to take a series of images, with each image adding one more gift. Then I'm going to make a mask in Krita with white covering the gift and a transparent background for the rest of the image. Then I'll use chroma-key to drop out the green screen of the sparkle video and mask it in so that the sparkles only happen within the boundaries of the gift that was added. I also want to fade one image into the other, and I want the sparkles to fade out as the gift appears.

Figuring things out

I didn't know how to do any of that yet with ffmpeg, so here's how I started figuring things out. First, I wanted to see how to fade test.jpg into test2.jpg over 4 seconds.

ffmpeg -y -loop 1 -i test.jpg -loop 1 -i test2.jpg -filter_complex "[1:v]fade=t=in:d=4:alpha=1[fadein];[0:v][fadein]overlay[out]" -map "[out]" -r 1 -t 4 -shortest test.webm

Here's another way using the blend filter:

ffmpeg -y -loop 1 -i test.jpg -loop 1 -i test2.jpg -filter_complex "[1:v][0:v]blend=all_expr='A*(if(gte(T,4),1,T/4))+B*(1-(if(gte(T,4),1,T/4)))" -t 4 -r 1 test.webm

Then I looked into chromakeying in the other video. I used balloons instead of sparkles just in case she happened to look at my screen.

ffmpeg -y -i test.webm -i balloons.mp4 -filter_complex "[1:v]chromakey=0x00ff00:0.1:0.2[ckout];[0:v][ckout]overlay[out]" -map "[out]" -shortest -r 1 overlaid.webm

I experimented with the alphamerge filter.

ffmpeg -y -i test.jpg -i test2.jpg -i mask.png -filter_complex "[1:v][2:v]alphamerge[a];[0:v][a]overlay[out]" -map "[out]" masked.jpg

Okay! That overlaid test.jpg with a masked part of test2.jpg. How about alphamerging in a video? First, I need a mask video…

ffmpeg -y -loop 1 -i mask.png  -r 1 -t 4  mask.webm

Then I can combine that:

ffmpeg -loglevel 32 -y -i test.webm -i balloons.mp4 -i mask.webm -filter_complex "[1:v][2:v]alphamerge[masked];[0:v][masked]overlay[out]" -map "[out]" -r 1 -t 4 alphamerged.webm

Great, let's figure out how to combine chroma-key and alphamerge video. The naive approach doesn't work, probably because they're both messing with the alpha layer.

ffmpeg -loglevel 32 -y -i test.webm -i balloons.mp4 -i mask.webm -filter_complex "[1:v]chromakey=0x00ff00:0.1:0.2[ckout];[ckout][2:v]alphamerge[masked];[0:v][masked]overlay[out]" -map "[out]" -r 1 -t 4 masked.webm

So I probably need to blend the chromakey and the mask. Let's see if I can extract the chromakey alpha.

ffmpeg -loglevel 32 -y -i test.webm -i balloons.mp4 -i mask.webm -filter_complex "[1:v]chromakey=0x00ff00:0.1:0.2,format=rgba,alphaextract[out]" -map "[out]" -r 1 -t 4
chroma-alpha.webm

Now let's blend it with the mask.webm.

ffmpeg -loglevel 32 -y -i test.webm -i balloons.mp4 -i mask.webm -filter_complex "[1:v]chromakey=0x00ff00:0.1:0.2,format=rgba,alphaextract[ckalpha];[ckalpha][2:v]blend=all_mode=and[out]" -map "[out]" -r 1 -t 4 masked-alpha.webm

Then let's use it as the alpha:

ffmpeg -loglevel 32 -y -i test.webm -i balloons.mp4 -i masked-alpha.webm -filter_complex "[2:v]format=rgba[mask];[1:v][mask]alphamerge[masked];[0:v][masked]overlay[out]" -map "[out]" -r 1 -t 4 alphamerged.webm

Okay, that worked! Now how do I combine everything into one command? Hmm…

ffmpeg -loglevel 32 -y -loop 1 -i test.jpg -t 4 -loop 1 -i test2.jpg -t 4 -i balloons.mp4 -loop 1 -i mask.png -t 4 -filter_complex "[1:v][0:v]blend=all_expr='A*(if(gte(T,4),1,T/4))+B*(1-(if(gte(T,4),1,T/4)))'[fade];[2:v]chromakey=0x00ff00:0.1:0.2,format=rgba,alphaextract[ckalpha];[ckalpha][3:v]blend=all_mode=and,format=rgba[maskedalpha];[2:v][maskedalpha]alphamerge[masked];[fade][masked]overlay[out]" -map "[out]" -r 5 -t 4 alphamerged.webm

Then I wanted to fade the masked video out by the end.

ffmpeg -loglevel 32 -y -loop 1 -i test.jpg -t 4 -loop 1 -i test2.jpg -t 4 -i balloons.mp4 -loop 1 -i mask.png -t 4 -filter_complex "[1:v][0:v]blend=all_expr='A*(if(gte(T,4),1,T/4))+B*(1-(if(gte(T,4),1,T/4)))'[fade];[2:v]chromakey=0x00ff00:0.1:0.2,format=rgba,alphaextract[ckalpha];[ckalpha][3:v]blend=all_mode=and,format=rgba[maskedalpha];[2:v][maskedalpha]alphamerge[masked];[masked]fade=type=out:st=2:d=1:alpha=1[maskedfade];[fade][maskedfade]overlay[out]" -map "[out]" -r 10 -t 4 alphamerged.webm

Making the video

When A- finally went to bed, we arranged the presents, using the GoPro to take a picture at each step of the way. I cropped and resized the images, using Krita to figure out the cropping rectangle and offset.

for FILE in *.JPG; do convert $FILE -crop 1558x876+473+842 -resize 1280x720 cropped/$FILE; done

I used ImageMagick to calculate the masks automatically.

files=(*.JPG)
i=0
j=1
len="${#files[@]}"
while [ "$j" -lt $len ]; do
  compare -fuzz 15% cropped/${files[$i]} cropped/${files[$j]} -compose Src -highlight-color White -lowlight-color Black masks/${files[$j]}
  convert -morphology Open Disk -morphology Close Disk -blur 20x5 masks/${files[$j]} processed-masks/${files[$j]}
  i=$((i+1))
  j=$((j+1))
done

Then I faded the images together to make a video.

import ffmpeg
import glob
files = glob.glob("images/cropped/*.JPG")
files.sort()
fps = 15
crf = 32
out = ffmpeg.input(files[0], loop=1, r=fps)
duration = 3
for i in range(1, len(files)):
    out = ffmpeg.filter([out, ffmpeg.input(files[i], loop=1, r=fps).filter('fade', t='in', d=duration, st=i*duration, alpha=1)], 'overlay')
args = out.output('images.webm', t=len(files) * duration, r=fps, y=None, crf=crf).compile()
print(' '.join(f'"{item}"' for item in args))

"ffmpeg" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2317.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2318.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2319.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2320.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2321.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2322.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2323.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2324.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2325.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2326.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2327.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2328.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2329.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2330.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2331.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2332.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2333.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2334.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2335.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2336.JPG" "-loop" "1" "-r" "15" "-i" "images/cropped/GOPR2337.JPG" "-filter_complex" "[1]fade=alpha=1:d=3:st=3:t=in[s0];[0][s0]overlay[s1];[2]fade=alpha=1:d=3:st=6:t=in[s2];[s1][s2]overlay[s3];[3]fade=alpha=1:d=3:st=9:t=in[s4];[s3][s4]overlay[s5];[4]fade=alpha=1:d=3:st=12:t=in[s6];[s5][s6]overlay[s7];[5]fade=alpha=1:d=3:st=15:t=in[s8];[s7][s8]overlay[s9];[6]fade=alpha=1:d=3:st=18:t=in[s10];[s9][s10]overlay[s11];[7]fade=alpha=1:d=3:st=21:t=in[s12];[s11][s12]overlay[s13];[8]fade=alpha=1:d=3:st=24:t=in[s14];[s13][s14]overlay[s15];[9]fade=alpha=1:d=3:st=27:t=in[s16];[s15][s16]overlay[s17];[10]fade=alpha=1:d=3:st=30:t=in[s18];[s17][s18]overlay[s19];[11]fade=alpha=1:d=3:st=33:t=in[s20];[s19][s20]overlay[s21];[12]fade=alpha=1:d=3:st=36:t=in[s22];[s21][s22]overlay[s23];[13]fade=alpha=1:d=3:st=39:t=in[s24];[s23][s24]overlay[s25];[14]fade=alpha=1:d=3:st=42:t=in[s26];[s25][s26]overlay[s27];[15]fade=alpha=1:d=3:st=45:t=in[s28];[s27][s28]overlay[s29];[16]fade=alpha=1:d=3:st=48:t=in[s30];[s29][s30]overlay[s31];[17]fade=alpha=1:d=3:st=51:t=in[s32];[s31][s32]overlay[s33];[18]fade=alpha=1:d=3:st=54:t=in[s34];[s33][s34]overlay[s35];[19]fade=alpha=1:d=3:st=57:t=in[s36];[s35][s36]overlay[s37];[20]fade=alpha=1:d=3:st=60:t=in[s38];[s37][s38]overlay[s39]" "-map" "[s39]" "-crf" "32" "-r" "15" "-t" "63" "-y" "images.webm"

Next, I faded the masks together. These ones faded in and out so that only one mask was active at a time.

import ffmpeg
import glob
files = glob.glob("images/processed-masks/*.JPG")
files.sort()
files = files[:-2]  # Omit the last two, where I'm just turning off the lights
fps = 15
crf = 32
out = ffmpeg.input('color=black:s=1280x720', f='lavfi', r=fps)
duration = 3
for i in range(0, len(files)):
    out = ffmpeg.filter([out, ffmpeg.input(files[i], loop=1, r=fps).filter('fade', t='in', d=1, st=(i + 1)*duration, alpha=1).filter('fade', t='out', st=(i + 2)*duration - 1)], 'overlay')
args = out.output('processed-masks.webm', t=len(files) * duration, r=fps, y=None, crf=crf).compile()
print(' '.join(f'"{item}"' for item in args))

"ffmpeg" "-f" "lavfi" "-r" "15" "-i" "color=s=1280x720" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2318.JPG" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2319.JPG" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2320.JPG" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2321.JPG" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2322.JPG" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2323.JPG" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2324.JPG" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2325.JPG" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2326.JPG" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2327.JPG" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2328.JPG" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2329.JPG" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2330.JPG" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2331.JPG" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2332.JPG" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2333.JPG" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2334.JPG" "-loop" "1" "-r" "15" "-i" "images/processed-masks/GOPR2335.JPG" "-filter_complex" "[1]fade=alpha=1:d=1:st=3:t=in[s0];[s0]fade=st=5:t=out[s1];[0][s1]overlay[s2];[2]fade=alpha=1:d=1:st=6:t=in[s3];[s3]fade=st=8:t=out[s4];[s2][s4]overlay[s5];[3]fade=alpha=1:d=1:st=9:t=in[s6];[s6]fade=st=11:t=out[s7];[s5][s7]overlay[s8];[4]fade=alpha=1:d=1:st=12:t=in[s9];[s9]fade=st=14:t=out[s10];[s8][s10]overlay[s11];[5]fade=alpha=1:d=1:st=15:t=in[s12];[s12]fade=st=17:t=out[s13];[s11][s13]overlay[s14];[6]fade=alpha=1:d=1:st=18:t=in[s15];[s15]fade=st=20:t=out[s16];[s14][s16]overlay[s17];[7]fade=alpha=1:d=1:st=21:t=in[s18];[s18]fade=st=23:t=out[s19];[s17][s19]overlay[s20];[8]fade=alpha=1:d=1:st=24:t=in[s21];[s21]fade=st=26:t=out[s22];[s20][s22]overlay[s23];[9]fade=alpha=1:d=1:st=27:t=in[s24];[s24]fade=st=29:t=out[s25];[s23][s25]overlay[s26];[10]fade=alpha=1:d=1:st=30:t=in[s27];[s27]fade=st=32:t=out[s28];[s26][s28]overlay[s29];[11]fade=alpha=1:d=1:st=33:t=in[s30];[s30]fade=st=35:t=out[s31];[s29][s31]overlay[s32];[12]fade=alpha=1:d=1:st=36:t=in[s33];[s33]fade=st=38:t=out[s34];[s32][s34]overlay[s35];[13]fade=alpha=1:d=1:st=39:t=in[s36];[s36]fade=st=41:t=out[s37];[s35][s37]overlay[s38];[14]fade=alpha=1:d=1:st=42:t=in[s39];[s39]fade=st=44:t=out[s40];[s38][s40]overlay[s41];[15]fade=alpha=1:d=1:st=45:t=in[s42];[s42]fade=st=47:t=out[s43];[s41][s43]overlay[s44];[16]fade=alpha=1:d=1:st=48:t=in[s45];[s45]fade=st=50:t=out[s46];[s44][s46]overlay[s47];[17]fade=alpha=1:d=1:st=51:t=in[s48];[s48]fade=st=53:t=out[s49];[s47][s49]overlay[s50];[18]fade=alpha=1:d=1:st=54:t=in[s51];[s51]fade=st=56:t=out[s52];[s50][s52]overlay[s53]" "-map" "[s53]" "-crf" "32" "-r" "15" "-t" "54" "-y" "processed-masks.webm"

I ended up using this particle glitter video because the gifts were small, so I wanted a video that was dense with sparkly things. I also wanted the sparkles to be more concentrated on the area where the gifts were, so I resized it and positioned it.

ffmpeg -loglevel 32 -y -f lavfi -i color=black:s=1280x720 -i sparkles4.webm -ss 13 -filter_complex "[1:v]scale=700:392[sparkles];[0:v][sparkles]overlay=x=582:y=194,setpts=(PTS-STARTPTS)*1.05[out]" -map "[out]" -r 15 -t 53 -shortest sparkles-trimmed.webm
ffmpeg -y -stream_loop 2 -i sparkles-trimmed.webm -t 57 sparkles-looped.webm

Lastly, I combined the videos with the sparkles.

ffmpeg -loglevel 32 -y -i images.webm -i sparkles-looped.webm -i processed-masks.webm -filter_complex "[1:v]chromakey=0x0a9d06:0.1:0.2,format=rgba,alphaextract[ckalpha];[ckalpha][2:v]blend=all_mode=and,format=rgba[maskedalpha];[1:v][maskedalpha]alphamerge[masked];[masked]fade=t=out:st=57:d=1:alpha=1[maskedfaded];[0:v][maskedfaded]overlay[combined];[combined]tpad=start_mode=clone:start_duration=4:stop_mode=clone:stop_duration=4[out]" -map "[out]" -r 15 -crf 32 output.webm

After many iterations and a very late night, I got (roughly) the video I wanted, which I'm not posting here because of reasons. But it worked, yay! Now I don't have to manually place stars frame-by-frame in Krita, and I can just have all that magic happen semi-automatically.

You can e-mail me at sacha@sachachua.com.

Using Emacs and Python to record an animation and synchronize it with audio

Dec 23, 2022| emacs, emacsconf, python, subed, video

[2023-01-14 Sat]: Removed my fork since upstream now has the :eval function.

The Q&A session for Things I'd like to see in Emacs (Richard Stallman) from EmacsConf 2022 was done over Mumble. Amin pasted the questions into the Mumble chat buffer and I copied them into a larger buffer as the speaker answered them, but I didn't do it consistently. I figured it might be worth making another video with easier-to-read visuals. At first, I thought about using LaTeX to create Beamer slides with the question text, which I could then turn into a video using ffmpeg. Then I decided to figure out how to animate the text in Emacs, because why not? I figured a straightforward typing animation would probably be less distracting than animate-string, and emacs-director seems to handle that nicely. I forked it to add a few things I wanted, like variables to make the typing speed slower (so that it could more reliably type things on my old laptop, since sometimes the timers seemed to have hiccups) ~~and an :eval step for running things without needing to log them~~. (2023-01-14: Upstream has the :eval feature now.)

To make it easy to synchronize the resulting animation with the chapter markers I derived from the transcript of the audio file, I decided to beep between scenes. First step: make a beep file.

ffmpeg -y -f lavfi -i 'sine=frequency=1000:duration=0.1' beep.wav

Next, I animated the text, with a beep between scenes. I used subed-parse-file to read the question text directly from the chapter markers, and I used simplescreenrecorder to set up the recording settings (including audio).

(defun my-beep ()
  (interactive)
  (save-window-excursion
    (shell-command "aplay ~/recordings/beep.wav &" nil nil)))

(require 'director)
(defvar emacsconf-recording-process nil)
(shell-command "xdotool getwindowfocus windowsize 1282 720")
(progn
  (switch-to-buffer (get-buffer-create "*Questions*"))
  (erase-buffer)
  (org-mode)
  (face-remap-add-relative 'default :height 300)
  (setq-local mode-line-format "   Q&A for EmacsConf 2022: What I'd like to see in Emacs (Richard M. Stallman) - emacsconf.org/2022/talks/rms")
  (sit-for 3)
  (delete-other-windows)
  (hl-line-mode -1)
  (when (process-live-p emacsconf-recording-process) (kill-process emacsconf-recording-process))
  (setq emacsconf-recording-process (start-process "ssr" (get-buffer-create "*ssr*")
                                                   "simplescreenrecorder"
                                                   "--start-recording"
                                                   "--start-hidden"))
  (sit-for 3)
  (director-run
   :version 1
   :log-target '(file . "/tmp/director.log")
   :before-start
   (lambda ()
     (switch-to-buffer (get-buffer-create "*Questions*"))
     (delete-other-windows))
   :steps
   (let ((subtitles (subed-parse-file "~/proj/emacsconf/rms/emacsconf-2022-rms--what-id-like-to-see-in-emacs--answers--chapters.vtt")))
     (apply #'append
            (list
             (list :eval '(my-beep))
             (list :type "* Q&A for Richard Stallman's EmacsConf 2022 talk: What I'd like to see in Emacs\nhttps://emacsconf.org/2022/talks/rms\n\n"))
            (mapcar
             (lambda (sub)
               (list
                (list :log (elt sub 3))
                (list :eval '(progn (org-end-of-subtree)
                                    (unless (bolp) (insert "\n"))))
                (list :type (concat "** " (elt sub 3) "\n\n"))
                (list :eval '(org-back-to-heading))
                (list :wait 5)
                (list :eval '(my-beep))))
             subtitles)))
   :typing-style 'human
   :delay-between-steps 0
   :after-end (lambda ()
                (process-send-string emacsconf-recording-process "record-save\nwindow-show\nquit\n"))
   :on-failure (lambda ()
                 (process-send-string emacsconf-recording-process "record-save\nwindow-show\nquit\n"))
   :on-error (lambda ()
               (process-send-string emacsconf-recording-process "record-save\nwindow-show\nquit\n"))))

I used the following code to copy the latest recording to animation.webm and extract the audio to animation.wav. my-latest-file and my-recordings-dir are in my Emacs config.

(let ((name "animation.webm"))
  (copy-file (my-latest-file my-recordings-dir) name t)
  (shell-command
   (format "ffmpeg -y -i %s -ar 8000 -ac 1 %s.wav"
           (shell-quote-argument name)
           (shell-quote-argument (file-name-sans-extension name)))))

Then I needed to get the timestamps of the beeps in the recording. I subtracted a little bit (0.82 seconds) based on comparing the waveform with the results.

filename = "animation.wav"
from scipy.io import wavfile
from scipy import signal
import numpy as np
import re
rate, source = wavfile.read(filename)
peaks = signal.find_peaks(source, height=1000, distance=1000)
base_times = (peaks[0] / rate) - 0.82
print(base_times)

I noticed that the first question didn't seem to get beeped properly, so I tweaked the times. Then I wrote some code to generate a very long ffmpeg command that used trim and tpad to select the segments and extend them to the right durations. There was some drift when I did it without the audio track, but the timestamps seemed to work right when I included the Q&A audio track as well.

import webvtt
import subprocess
chapters_filename =  "emacsconf-2022-rms--what-id-like-to-see-in-emacs--answers--chapters.vtt"
answers_filename = "answers.wav"
animation_filename = "animation.webm"
def get_length(filename):
    result = subprocess.run(["ffprobe", "-v", "error", "-show_entries",
                             "format=duration", "-of",
                             "default=noprint_wrappers=1:nokey=1", filename],
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT)
    return float(result.stdout)

def get_frames(filename):
    result = subprocess.run(["ffprobe", "-v", "error", "-select_streams", "v:0", "-count_packets",
                             "-show_entries", "stream=nb_read_packets", "-of",
                             "csv=p=0", filename],
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT)
    return float(result.stdout)

answers_length = get_length(answers_filename)
# override base_times
times = np.asarray([  1.515875,  13.50, 52.32125 ,  81.368625, 116.66625 , 146.023125,
       161.904875, 182.820875, 209.92125 , 226.51525 , 247.93875 ,
       260.971   , 270.87375 , 278.23325 , 303.166875, 327.44925 ,
       351.616375, 372.39525 , 394.246625, 409.36325 , 420.527875,
       431.854   , 440.608625, 473.86825 , 488.539   , 518.751875,
       544.1515  , 555.006   , 576.89225 , 598.157375, 627.795125,
       647.187125, 661.10875 , 695.87175 , 709.750125, 717.359875])
fps = 30.0
times = np.append(times, get_length(animation_filename))
anim_spans = list(zip(times[:-1], times[1:]))
chapters = webvtt.read(chapters_filename)
if chapters[0].start_in_seconds == 0:
    vtt_times = [[c.start_in_seconds, c.text] for c in chapters]
else:
    vtt_times = [[0, "Introduction"]] + [[c.start_in_seconds, c.text] for c in chapters] 
vtt_times = vtt_times + [[answers_length, "End"]]
# Add ending timestamps
vtt_times = [[x[0][0], x[1][0], x[0][1]] for x in zip(vtt_times[:-1], vtt_times[1:])]
test_rate = 1.0

i = 0
concat_list = ""
groups = list(zip(anim_spans, vtt_times))
import ffmpeg
animation = ffmpeg.input('animation.webm').video
audio = ffmpeg.input('rms.opus')

for_overlay = ffmpeg.input('color=color=black:size=1280x720:d=%f' % answers_length, f='lavfi')
params = {"b:v": "1k", "vcodec": "libvpx", "r": "30", "crf": "63"}
test_limit = 1
params = {"vcodec": "libvpx", "r": "30", "copyts": None, "b:v": "1M", "crf": 24}
test_limit = 0
anim_rate = 1
import math
cursor = 0
if test_limit > 0:
    groups = groups[0:test_limit]
clips = []

# cursor is the current time
for anim, vtt in groups:
    padding = vtt[1] - cursor - (anim[1] - anim[0]) / anim_rate
    if (padding < 0):
        print("Squeezing", math.floor((anim[1] - anim[0]) / (anim_rate * 1.0)), 'into', vtt[1] - cursor, padding)
        clips.append(animation.trim(start=anim[0], end=anim[1]).setpts('PTS-STARTPTS')) 
    elif padding == 0:
        clips.append(animation.trim(start=anim[0], end=anim[1]).setpts('PTS-STARTPTS'))
    else:
        print("%f to %f: Padding %f into %f - pad: %f" % (cursor, vtt[1], (anim[1] - anim[0]) / (anim_rate * 1.0), vtt[1] - cursor, padding))
        cursor = cursor + padding + (anim[1] - anim[0]) / anim_rate
        clips.append(animation.trim(start=anim[0], end=anim[1]).setpts('PTS-STARTPTS').filter('tpad', stop_mode="clone", stop_duration=padding))
    for_overlay = for_overlay.overlay(animation.trim(start=anim[0], end=anim[1]).setpts('PTS-STARTPTS+%f' % vtt[0]))
    clips.append(audio.filter('atrim', start=vtt[0], end=vtt[1]).filter('asetpts', 'PTS-STARTPTS'))
args = ffmpeg.concat(*clips, v=1, a=1).output('output.webm', **params).overwrite_output().compile()
print(' '.join(f'"{item}"' for item in args))

Anyway, it's here for future reference. =)

View org source for this post

You can e-mail me at sacha@sachachua.com.

Re-encoding the EmacsConf videos with FFmpeg and GNU Parallel

Dec 23, 2021| geek, linux, emacsconf, ffmpeg, video

It turns out that using -crf 56 compressed the EmacsConf a little too aggressively, losing too much information in the video. We wanted to reencode everything, maybe going back to the default value of -crf 32. My laptop would have taken a long time to do all of those videos. Fortunately, one of the other volunteers shared a VM on a machine with 12 cores, and I had access to a few other systems. It was a good opportunity to learn how to use GNU Parallel to send jobs to different machines and retrieve the results.

First, I updated the compression script, compress-video-low.sh:

Q=$1
WIDTH=1280
HEIGHT=720
AUDIO_RATE=48000
VIDEO_FILTER="scale=w=${WIDTH}:h=${HEIGHT}:force_original_aspect_ratio=1,pad=${WIDTH}:${HEIGHT}:(ow-iw)/2:(oh-ih)/2,fps=25,colorspace=all=bt709:iall=bt601-6-625:fast=1"
FILE=$2
SUFFIX=$Q
shift
shift
ffmpeg -y -i "$FILE"  -pixel_format yuv420p -vf $VIDEO_FILTER -colorspace 1 -color_primaries 1 -color_trc 1 -c:v libvpx-vp9 -b:v 0 -crf $Q -aq-mode 2 -tile-columns 0 -tile-rows 0 -frame-parallel 0 -cpu-used 8 -auto-alt-ref 1 -lag-in-frames 25 -g 240 -pass 1 -f webm -an -threads 8 /dev/null &&
if [[ $FILE =~ "webm" ]]; then
    ffmpeg -y -i "$FILE" $*  -pixel_format yuv420p -vf $VIDEO_FILTER -colorspace 1 -color_primaries 1 -color_trc 1 -c:v libvpx-vp9 -b:v 0 -crf $Q -tile-columns 2 -tile-rows 2 -frame-parallel 0 -cpu-used -5 -auto-alt-ref 1 -lag-in-frames 25 -pass 2 -g 240 -ac 2 -threads 8 -c:a copy "${FILE%.*}--compressed$SUFFIX.webm"
else
    ffmpeg -y -i "$FILE" $*  -pixel_format yuv420p -vf $VIDEO_FILTER -colorspace 1 -color_primaries 1 -color_trc 1 -c:v libvpx-vp9 -b:v 0 -crf $Q -tile-columns 2 -tile-rows 2 -frame-parallel 0 -cpu-used -5 -auto-alt-ref 1 -lag-in-frames 25 -pass 2 -g 240 -ac 2 -threads 8 -c:a libvorbis "${FILE%.*}--compressed$SUFFIX.webm"
fi

I made an originals.txt file with all the original filenames. It looked like this:

emacsconf-2020-frownies--the-true-frownies-are-the-friends-we-made-along-the-way-an-anecdote-of-emacs-s-malleability--case-duckworth.mkv
emacsconf-2021-montessori--emacs-and-montessori-philosophy--grant-shangreaux.webm
emacsconf-2021-pattern--emacs-as-design-pattern-learning--greta-goetz.mp4
...

I set up a ~/.parallel/emacsconf profile with something like this so that I could use three computers and my laptop, sending one job each and displaying progress:

--sshlogin computer1 --sshlogin computer2 --sshlogin computer3 --sshlogin : -j 1 --progress --verbose --joblog parallel.log

I already had SSH key-based authentication set up so that I could connect to the three remote computers.

Then I spread the jobs over four computers with the following command:

cat originals.txt | parallel -J emacsconf \
                             --transferfile {} \
                             --return '{=$_ =~ s/\..*?$/--compressed32.webm/=}' \
                             --cleanup \
                             --basefile compress-video-low.sh \
                             bash compress-video-low.sh 32 {}

It copied each file over to the computer it was assigned to, processed the file, and then copied the file back.

It was also helpful to occasionally do echo 'killall -9 ffmpeg' | parallel -J emacsconf -j 1 --onall if I cancelled a run.

It still took a long time, but less than it would have if any one computer had to crunch through everything on its own.

This was much better than my previous way of doing things, which involved copying the files over, running ffmpeg commands, copying the files back, and getting somewhat confused about which directory I was in and which file I assigned where and what to do about incompletely-encoded files.

I sometimes ran into problems with incompletely-encoded files because I'd cancelled the FFmpeg process. Even though ffprobe said the files were long, they were missing a large chunk of video at the end. I added a compile-media-verify-video-frames function to compile-media.el so that I could get the last few seconds of frames, compare them against the duration, and report an error if there was a big gap.

Then I changed emacsconf-publish.el to use the new filenames, and I regenerated all the pages. For EmacsConf 2020, I used some Emacs Lisp to update the files. I'm not particularly fond of wrangling video files (lots of waiting, high chance of error), but I'm glad I got the computers to work together.

You can view 2 comments or e-mail me at sacha@sachachua.com.

Using word-level timing information when editing subtitles or captions in Emacs

Mar 18, 2021| emacs, subed, video

2022-10-26: Merged word-level timing support into subed.el, so I don't need my old caption functions.

2022-04-18: Switched to using yt-dlp.

I like to split captions at logical points, such as at the end of a phrase or sentence. At first, I used subed.el to play the video for the caption, pausing it at the appropriate point and then calling subed-split-subtitle to split at the playback position. Then I modified subed-split-subtitle to split at the video position that's proportional to the text position, so that it's roughly in the right spot even if I'm not currently listening. That got me most of the way to being able to quickly edit subtitles.

It turns out that word-level timing is actually available from YouTube if I download the autogenerated SRV2 file using yt-dlp, which I can do with the following function:

(defun my-caption-download-srv2 (id)
  (interactive "MID: ")
  (require 'subed-word-data)
  (when (string-match "v=\\([^&]+\\)" id) (setq id (match-string 1 id)))
  (let ((default-directory "/tmp"))
    (call-process "yt-dlp" nil nil nil "--write-auto-sub" "--write-sub" "--no-warnings" "--sub-lang" "en" "--skip-download" "--sub-format" "srv2"
                  (concat "https://youtu.be/" id))
    (subed-word-data-load-from-file (my-latest-file "/tmp" "\\.srv2\\'"))))

2022-10-26: I can also generate a SRV2-ish file using torchaudio, which I can then load with subed-word-data-load-from-file.

(defun my-caption-fix-common-errors (data)
  (mapc (lambda (o)
          (mapc (lambda (e)
                  (when (string-match (concat "\\<" (regexp-opt (if (listp e) (seq-remove (lambda (s) (string= "" s)) e)
                                                                  (list e)))
                                              "\\>")
                                      (alist-get 'text o))
                    (map-put! o 'text (replace-match (car (if (listp e) e (list e))) t t (alist-get 'text o)))))
                my-subed-common-edits))
        data))

Assuming I start editing from the beginning of the file, then the part of the captions file after point is mostly unedited. That means I can match the remainder of the current caption with the word-level timing to try to figure out the time to use when splitting the subtitle, falling back to the proportional method if the data is not available.

(defun subed-avy-set-up-actions ()
  (interactive)
  (make-local-variable 'avy-dispatch-alist)
  (add-to-list
   'avy-dispatch-alist
   (cons ?, 'subed-split-subtitle)))

(use-package subed
  :if my-laptop-p
  :load-path "~/vendor/subed/subed"
  :hook
  (subed-mode . display-fill-column-indicator-mode)
  (subed-mode . subed-avy-set-up-actions)
  :bind
  (:map subed-mode-map
        ("M-," . subed-split-subtitle)
        ("M-." . subed-merge-with-next)
        ("M-p" . avy-goto-char-timer)
        ("M-e" . avy-goto-char-timer)))

That way, I can use the word-level timing information for most of the reformatting, but I can easily replay segments of the video if I'm unsure about a word that needs to be changed.

If I want to generate a VTT based on the caption data, breaking it at certain words, these functions help:

(defvar my-caption-breaks
  '("the" "this" "we" "we're" "I" "finally" "but" "and" "when")
  "List of words to try to break at.")
(defun my-caption-make-groups (list &optional threshold)
  (let (result
        current-item
        done
        (current-length 0)
        (limit (or threshold 70))
        (lower-limit 30)
        (break-regexp (concat "\\<" (regexp-opt my-caption-breaks) "\\>")))
    (while list
      (cond
       ((null (car list)))
       ((string-match "^\n*$" (alist-get 'text (car list)))
        (push (cons '(text . " ") (car list)) current-item)
        (setq current-length (1+ current-length)))
       ((< (+ current-length (length (alist-get 'text (car list)))) limit)
        (setq current-item (cons (car list) current-item)
              current-length (+ current-length (length (alist-get 'text (car list))) 1)))
       (t (setq done nil)
          (while (not done)
          (cond
           ((< current-length lower-limit)
            (setq done t))
           ((and (string-match break-regexp (alist-get 'text (car current-item)))
                 (not (string-match break-regexp (alist-get 'text (cadr current-item)))))
            (setq current-length (- current-length (length (alist-get 'text (car current-item)))))
            (push (pop current-item) list)
            (setq done t))
           (t
            (setq current-length (- current-length (length (alist-get 'text (car current-item)))))
            (push (pop current-item) list))))
          (push nil list)
          (setq result (cons (reverse current-item) result) current-item nil current-length 0)))
      (setq list (cdr list)))
    (reverse result)))

(defun my-caption-format-as-subtitle (list &optional word-timing)
  "Turn a LIST of the form (((start . ms) (end . ms) (text . s)) ...) into VTT.
If WORD-TIMING is non-nil, include word-level timestamps."
  (format "%s --> %s\n%s\n\n"
          (subed-vtt--msecs-to-timestamp (alist-get 'start (car list)))
          (subed-vtt--msecs-to-timestamp (alist-get 'end (car (last list))))
          (s-trim (mapconcat (lambda (entry)
                               (if word-timing
                                   (format " <%s>%s"
                                           (subed-vtt--msecs-to-timestamp (alist-get 'start entry))
                                           (string-trim (alist-get 'text entry)))
                                 (alist-get 'text entry)))
                             list ""))))

(defun my-caption-to-vtt (&optional data)
  (interactive)
  (with-temp-file "captions.vtt"
    (insert "WEBVTT\n\n"
            (mapconcat
             (lambda (entry) (my-caption-format-as-subtitle entry))
             (my-caption-make-groups
              (or data (my-caption-fix-common-errors subed-word-data--cache)))
             ""))))

This is part of my Emacs configuration.

You can e-mail me at sacha@sachachua.com.

How I animate sketches with Autodesk Sketchbook Pro and Camtasia Studio

Posted: Mar 7, 2014 - Modified: Mar 2, 2014| drawing, process, video

Spoken words can be much more effective when accompanied with animation, so my clients have been asking me to put together short animations for them. Here's my workflow in case you're interested in doing this too.

Step 1: Draw the images and get them approved.

Make your canvas roughly the same size as your final image so that you can save frames if needed. The bottom layer should be your background colour (ex: white). You can use a grid to line things up, then hide the grid when you're ready to export. Use one layer per scene in your animation.

Draw the image – get it approved if necessary

Step 2: Prepare for animation.

Hide everything but the first scene and your background layer. Add a white layer at 90% opacity above your sketch. This allows you to trace over your sketch while making it easy to remove the pre-sketch in Camtasia Studio. Using a translucent white layer allows you to fade your other scenes without adjusting the opacity for each of them.

Prepare for animation

Step 3: Lay out your screen.

Zoom in as close to 100% as possible. Use TAB to hide the Autodesk Sketchbook interface and position your sketch so that the important parts are not obscured by the little lagoon controller on the left side. You can turn the title bar off, too. Set Camtasia Recorder to record your screen without that little controller – you can either record only part of your screen, or add a white callout afterwards.

Lay out your screen

If you need to create HD video, a high-resolution monitor will give you the space you need. My Cintiq 12WX has a resolution of 1280×800, and my laptop has a resolution of 1366×768. When I need to record at 1920×1080, I use my Cintiq as a graphics tablet for an external monitor instead.

It's probably a good idea to turn audio off so that you don't have to split it out and remove it later.

Cintiq buttons

This is also a good time to set up convenient keyboard shortcuts or buttons. The Cintiq 12WX has some programmable buttons, so here's how I set mine up:

Left button: Ctrl-z – handy for quickly undoing things instead of flipping over to the eraser.
Middle right button: TAB – hides and shows the interface.
Bottom button: Ctrl-Shift-F8 – the keyboard shortcut I set up my Camtasia Studio with, so I can pause and resume recording.

This makes it easier for me to pause (bottom), show the interface (middle right), change colours or brushes, hide the interface (middle right), and resume (bottom). That reduces the editing I need to do afterwards.

Step 4: Record!

Because the pre-sketch shows you where things should go and you've already fiddled with the layout to make sure things fit, it's easy to draw quickly and confidently. Use TAB to hide or show the interface. When you're starting out, you may find it easier to record in one go and then edit out the segments when you're switching brushes or colours. As you become more comfortable with switching back and forth between full-screen drawing and using the Autodesk Sketchbook Pro interface, try the workflow that involves pausing the screen, showing the interface, hiding the interface, and then resuming the recording.

Step 5: Edit and synchronize in Camtasia Studio.

Save and edit the video. Set it to the recording dimensions of your final output, and set the background colour to white.

Use Visual Effects > Remove a Color to remove the pre-sketch. Now it looks like you're drawing on a blank canvas. See my previous notes for a demo.

Now synchronize the video with the audio. You may want to add markers to your audio so that you can easily tell where the significant points are. Use the timeline to find out the duration between markers. Split your video at the appropriate points by selecting the video and typing s. Use clip speed (right-click on the segment) to adjust the speed until the video duration matches what you need.

Note that at high clip speeds, Camtasia drops a lot of frames. If this bothers you, you can render the sketch at 400% speed using Camtasia or Movie Maker, produce that as an AVI or MP4, re-import that media, and continue compressing it at a maximum of 400% speed each time until you get the speed you want.

If you need to cover up a mistake, a simple white rectangular callout can hide that effectively. If you need to make something longer, extend the frame. Because you can't extend frames into video that's already there, you may want to drag the segment onto a different track, and then split or cut the excess.

Produce the synchronized video in your required output format (ex: MP4, MOV…) and you're done!

Hope this workflow helps you get into doing more animated sketches with Autodesk Sketchbook Pro and Camtasia Studio on a laptop or desktop computer. Do you use other tools or other workflows? Please share!

You can view 2 comments or e-mail me at sacha@sachachua.com.

Things I’m learning about sharing other people’s knowledge, or why you should show me what you’ve been meaning to teach others

Posted: Apr 12, 2013 - Modified: Apr 11, 2013| kaizen, learning, video

Many conferences don’t record sessions or share videos promptly, so I was delighted to find that the Emacs Conference 2013 was not only going to be recorded but also livestreamed. Jon (the venue contact) even brought a small camera for recording close-ups. Since the zero-budget conference didn’t have a professional videographer, I volunteered to process the videos and get them out there. I also took sketchnotes and shared them during the conference itself.

It’s important to me that people who weren’t able to make it to the conference can still learn from it. So much knowledge evaporates into nothingness if not shared. Besides, it would be wonderful for people to get a sense of the people in the Emacs community, and that’s something that’s hard to pick up from just slides or transcripts. I had selfish reasons, too. I wanted to be able to go back and remember what being around a hundred Emacs geeks is like. (It was awesome!)

It took me 8.5 hours spread over a week to process and upload the videos from the conference. It was an excellent use of that time, and people have been super-appreciative. I’m planning to transcribe John Wiegley’s talk on Emacs Lisp development because it was full of great tips. I may transcribe the other talks (or coordinate with other people?) if that’s something people would find really, really useful too.

There’s a lot of good stuff in people’s heads, and most people are really bad at getting things out there where other people can learn from them. There’s the fear of writing or public speaking, of being wrong, of not being an expert, of embarrassing yourself. I write a ton, and I’m comfortable giving presentations. (Both skills are really useful introvert hacks.) It’s easy for me to share what I know, and I’m learning even more each day. So that’s good – but it might be even more interesting to pick other people’s brains and help them get their thoughts out there. I suspect that even if I spend the rest of my life sharing just what other people know, that would still be a great way to make life better.

I’m getting the hang of amplifying the good ideas that people have, helping them reach more people. Sketchnotes, videos, transcription, writing, podcasts and video chats, screencasts, blogging, visual book reviews… I get to indulge my curiosity, help other people learn, get conversations going.

This is good. This means I don’t have to stress out about being original or being an expert. I can be a conduit for other people’s ideas and lessons, while inevitably creating something of my own along the way. I’m sometimes divided on this. Shouldn’t I use my 5-year experiment time to pursue my own ideas instead of just channeling other people’s thoughts? But I learn so much by helping people share, and I get to see the interconnections among so many different things. And then ideas bubble up – things I haven’t read or heard, things that I do differently that I notice only when people ask – and these ideas demand to be created and shared. The choice isn’t one or the other. By helping people share what they know, I can get even better at making new things. =)

Anyway, on to lessons learned:

What worked well?

Sketchnoting and sharing during the conference itself: Great way to help people in person and online. Because there were lots of abstract topics to cover and I was helping with technical issues as well, my live notes were pretty text-heavy. I edited the sketchnotes after the event in order to add highlights and extra information. Tech-wise, I used WinSCP to upload the images in the background, and then used NextGen Gallery’s rescan folder feature to pull them in. This meant that I didn’t have to fuss with web server errors.
Using multiple tools for recording my presentation: I remembered to set up recording audio on my phone, recording video on my tablet, and recording my screen using Camtasia Studio. The audio recording worked, and both video recording and screenrecording failed. (Sigh.) But at least there’s audio of the keynote! I might recreate the presentations if people think that’s valuable.
Copying the conference videos before leaving the venue: Soooooo much faster than downloading them over the Internet
Volunteering to handle the videos: Because otherwise it could take forever (or it might not have happened). Besides, I really like Emacs, and helping out with this is a good way to build the community.
Setting aside time to follow up: It was great to have the space to work on this here and there instead of getting caught up in other work.
Splicing in secondary video: Jon took close-up videos of many of the presentations, which I added using Camtasia. This was great because the screen was difficult or impossible to read over the livestream.
Separating rendering from publishing: In the beginning, I used Camtasia Studio’s YouTube support to publish videos directly to the Internet. This broke after the first few videos, so I used to save the videos from the error dialog and then upload them myself. When I switched to producing the MP4s directly, then uploading them to YouTube using my browser, uploading was around five times faster. Uploading videos through my browser also allowed me to process the next video instead of tying up Camtasia Studio during the publishing process.

What would make this even better in terms of sharing knowledge from conferences?

Doing a livestream tech check and having guidance for speakers: The keynote wasn’t livestreamed because we had technical issues, and many of the presentations were unreadable because of the glare from a white background. Coordinating with the venue to do a technology check beforehand might help us avoid these issues in the future, and it’ll also tell us what we need to work around when we prepare our presentations.
Asking the venue organizer which files had the livestream video: The livestream videos were confusingly named with a .ps extension, but Alex found them by using the file command.
Bringing a personal video camera and a tripod: That might make travel a little more difficult, but it’s good to have more video backups, and the quality might be better too.
Editing the videos using a proper video editing tool instead of Camtasia Studio and Windows Movie Maker: Might be more reliable, as Camtasia occasionally crashed.
More hard disk space: I can move processed videos to secondary storage knowing that I have YouTube or Vimeo as a backup.
Bringing a large USB drive to conferences: Great for efficiently transferring files between computers. (Good old-fashioned sneakernet!)
Making sure Camtasia Studio doesn’t crash next time I want to record my presentation: This probably had something to do with not having audio sources. If I can reliably reproduce this and figure out how not to reproduce it, that should be good.
Learning how to cut: Editing to pick out highlights or make things flow more smoothly can help me save other people time and make information more accessible to people who can’t sit down and listen to something for an hour. I’ve done a little audio editing to remove ums and ahs before, but it might be interesting to do more radical cuts. I don’t particularly enjoy doing this yet because I vastly prefer visual/verbal learning over auditory learning (and used to regularly fall asleep in class, although I managed to graduate somehow!), but maybe that’s just a matter of practice, familiarity, and material. We’ll see. After I learn how to cut, maybe I can learn how to make audio and video even more engaging with music and effects. Someday!

I love it when evolving skills and interests come together coherently and become a platform for going from strength to strength. I started blogging almost eleven years ago as a way to learn more effectively, and now I see how I can scale that up even further. I wonder what this will look like in a decade.

Here are a few ways you can help me get even better at sharing what you and other people know:

Ask me questions. =)
Teach me what I should ask you so that I can learn a lot from you.
Suggest ways I can organize or share things even more effectively.
Tell me where I’m on the right track, and what “even better” might look like.

This is fun!

You can e-mail me at sacha@sachachua.com.