Categories: geek

RSS - Atom - Subscribe via email

Hyperlinking SVGs

| drawing, supernote, emacs
Text and links from sketch

Hyperlinking SVGs - 2025-01-17-01

I like drawing my notes. I can jump around, draw connections, doodle for fun.

A sketch can only fit so much, though. (even if I write really small)

Idea: Links: They can be signposts for other trails.

Process:

I want to make maps for myself and other people.

This is easy to do because:

  • SVGs are XML, a text format
  • Emacs has code for XML and SVG manipulation, display
  • You can use Emacs to build a simple user interface.
  • Ideas
  • TO-DO: update sketch viewer
    • prioritize SVG
    • display Org

SuperNote also has its own hyperlinks, but:

  • typing long URLS on on-screen keyboards is not fun
  • I can't figure out how to convert those links to SVG
  • Rects are more compact

Preprocessing the image

This isn't the focus of this blog post, but I thought I'd include the code anyway in case someone might find it useful.

The fastest way to get a single file off the Supernote is to enable Browse & Access by swiping down from the top. It's the icon that looks like a two-way arrow between waves.

2025-01-21_10-28-16.png
Figure 1: Browse and Access

I have some Emacs Lisp code for downloading the latest exported file using the Supernote's web server.

my-supernote-get-exported-files
(defvar my-supernote-ip-address "192.168.1.221")
(defun my-supernote-get-exported-files ()
  (condition-case nil
      (let ((data (plz 'get (format "http://%s:8089/EXPORT" my-supernote-ip-address)))
            (list))
        (when (string-match "const json = '\\(.*\\)'" data)
          (sort
           (alist-get 'fileList (json-parse-string (match-string 1 data) :object-type 'alist :array-type 'list))
           :key (lambda (o) (alist-get 'date o))
           :lessp 'string<
           :reverse t)))
    (error nil)))

my-supernote-download-latest-exported-file: Save exported file in downloads dir.
(defun my-supernote-download-latest-exported-file ()
  "Save exported file in downloads dir."
  (interactive)
  (let* ((info (car (my-supernote-get-exported-files)))
         (dest-dir my-download-dir)
         (new-file (and info (expand-file-name (file-name-nondirectory (alist-get 'name info)) dest-dir)))
         renamed)
    (when info
      (copy-file
       (plz 'get (format "http://%s:8089%s" my-supernote-ip-address
                         (alist-get 'uri info))
         :as 'file)
       new-file
       t)
      new-file)))

Once I've downloaded the file, I process it:

  1. my-image-recognize: use Google Cloud Vision to recognize the text, rename it based on the ID
  2. my-sketch-rename: rename the file based on the ID if I've written one on the sketch
  3. my-sketch-convert-pdf: convert to SVG, copying over the links from the previous SVG if one exists
  4. my-sketch-clean: remove any images or templates
  5. my-sketch-color-to-hex: change the hex values for easier replacement and tinkering
  6. my-sketch-add-bg: add a plain white background rectangle
  7. my-sketch-change-fill-to-style: make the attributes more consistent
  8. my-sketch-recolor: change the highlight colour from gray to light yellow
  9. my-image-store: store it in either my private-sketches directory or my sketches directory, depending on the tags in the filename; leave untitled sketches in the same directory

my-supernote-process-sketch
(defun my-supernote-process-sketch (file)
  (interactive "FFile: ")
  (my-image-recognize file)
  (setq file (my-sketch-rename file))
  (pcase (file-name-extension file)
    ("pdf"
     (setq file
           (my-image-store
            (my-sketch-svg-prepare file))))
    ("png"
     (setq file
           (my-image-store
            (my-image-autorotate
             (my-image-autocrop
              (my-sketch-recolor-png
               file)))))))
  file)

my-sketch-svg-prepare: Clean up SVG for publishing.
(defvar my-debug-buffer (get-buffer-create "*temp*"))
(defun my-sketch-convert-pdf (pdf-file)
  "Returns the SVG filename."
  (interactive "FPDF: ")
  (if-let ((links (and (file-exists-p (concat (file-name-sans-extension pdf-file) ".svg"))
                       (dom-by-tag
                        (car (xml-parse-file (concat (file-name-sans-extension pdf-file) ".svg")))
                        'a))))
      ;; copy links over
      (let ((temp-file (concat (make-temp-name "svg-conversion") ".svg"))
            new-file)
        (unwind-protect
            (progn
              (call-process "pdftocairo" nil my-debug-buffer nil "-svg" (expand-file-name pdf-file)
                            temp-file)
              (setq new-file (car (xml-parse-file temp-file)))
              (dolist (link links)
                (dom-append-child new-file link))
              (with-temp-file (file-exists-p (concat (file-name-sans-extension pdf-file) ".svg"))
                (svg-print new-file)))
          (error
           (delete-file temp-file))))
    (delete-file (concat (file-name-sans-extension pdf-file) ".svg"))
    (call-process "pdftocairo" nil my-debug-buffer nil "-svg" (expand-file-name pdf-file)
                  (expand-file-name (concat (file-name-sans-extension pdf-file) ".svg"))))
  (concat (file-name-sans-extension pdf-file) ".svg"))

(defun my-sketch-change-fill-to-style (dom)
  "Inkscape handles these better when we split paths."
  (dolist (path (dom-by-tag dom 'path))
    (when (dom-attr path 'fill)
      (dom-set-attribute
       path 'style
       (if (dom-attr path 'style)
           (concat (dom-attr path 'style) ";fill:" (dom-attr path 'fill))
         (concat "fill:" (dom-attr path 'fill))))
      (dom-remove-attribute path 'fill)))
  dom)

(defun my-sketch-recolor (dom color-map &optional selector)
  "Colors are specified as ((\"#input\" . \"#output\") ...)."
  (if (symbolp color-map)
      (setq color-map
            (assoc-default color-map my-sketch-color-map)))
  (let ((map-re (regexp-opt (mapcar 'car color-map))))
    (dolist (path (if selector (dom-search dom selector)
                    (dom-by-tag dom 'path)))
      (dolist (attr '(style fill))
        (when (and (dom-attr path attr)
                   (string-match map-re (dom-attr path attr)))
          (dom-set-attribute
           path attr
           (replace-regexp-in-string
            map-re
            (lambda (match)
              (assoc-default match color-map))
            (or (dom-attr path attr) "")))))))
  dom)

(defun my-sketch-add-bg (dom)
  ;; add background rectangle
  (unless (dom-search dom (lambda (elem) (and (dom-attr elem 'class) (string-match "\\<background\\>" (dom-attr elem 'class)))))
    (let* ((view-box (mapcar 'string-to-number (split-string (dom-attr dom 'viewBox))))
           (bg-node (dom-node 'rect `((x . 0)
                                      (y . 0)
                                      (class . "background")
                                      (width . ,(elt view-box 2))
                                      (height . ,(elt view-box 3))
                                      (fill . "#ffffff")))))
      (if (dom-by-id dom "surface1")
          (push bg-node (cddr (car (dom-by-id dom "surface1"))))
        (push bg-node (cddr (car dom))))))
  dom)

(defun my-sketch-clean (dom)
  "Remove USE and IMAGE tags."
  (dolist (use (dom-by-tag dom 'use))
    (dom-remove-node dom use))
  (dolist (use (dom-by-tag dom 'image))
    (dom-remove-node dom use))
  dom)

(defun my-sketch-rotate (dom)
  (let* ((old-width (dom-attr dom 'width))
         (old-height (dom-attr dom 'height))
         (view-box (mapcar 'string-to-number (split-string (dom-attr dom 'viewBox))))
         (rotate (format "rotate(90) translate(0 %s)" (- (elt view-box 3)))))
    (dom-set-attribute dom 'width old-height)
    (dom-set-attribute dom 'height old-width)
    (dom-set-attribute dom 'viewBox (format "0 0 %d %d" (elt view-box 3) (elt view-box 2)))
    (dolist (g (dom-by-tag dom 'g))
      (dom-set-attribute g 'transform rotate)))
  dom)

(defun my-sketch-mix-blend-mode-darken (dom &optional selector)
  (dolist (p (if (functionp selector) (dom-search dom selector) (or selector (dom-by-tag dom 'path))))
    (when (and (dom-attr p 'style)
               (not (string-match "mix-blend-mode" (dom-attr p 'style))))
      (dom-set-attribute
       p 'style
       (replace-regexp-in-string ";;\\|^;" ""
                                 (concat
                                  (or (dom-attr p 'style) "")
                                  ";mix-blend-mode:darken")))))
  dom)

(defun my-sketch-color-to-hex (dom &optional selector)
  (dolist (p (if (functionp selector) (dom-search dom selector)
               (or selector (dom-search dom
                                        (lambda (p) (or (dom-attr p 'style)
                                                        (dom-attr p 'fill)))))))
    (dolist (attr '(style fill))
      (when (dom-attr p attr)
        (dom-set-attribute
         p attr
         (replace-regexp-in-string
          "rgb(\\([0-9\\.]+\\)%, *\\([0-9\\.%]+\\)%, *\\([0-9\\.]+\\)%)"
          (lambda (s)
            (color-rgb-to-hex
             (* 0.01 (string-to-number (match-string 1 s)))
             (* 0.01 (string-to-number (match-string 2 s)))
             (* 0.01 (string-to-number (match-string 3 s)))
             2))
          (dom-attr p attr))))))
  dom)

;; default for now, but will support more colour schemes someday
(defvar my-sketch-color-map
  '((blue
     ("#9d9d9d" . "#2b64a9")
     ("#9c9c9c" . "#2b64a9")
     ("#c9c9c9" . "#b3e3f1")
     ("#c8c8c8" . "#b3e3f1")
     ("#cacaca" . "#b3e3f1")
     ("#a6d2ff" . "#ffffff"))
    (t
     ("#9d9d9d" . "#888888")
     ("#9c9c9c" . "#888888")
     ("#cacaca" . "#f6f396")
     ("#c8c8c8" . "#f6f396")
     ("#a6d2ff" . "#ffffff")
     ("#c9c9c9" . "#f6f396"))))

(cl-defun my-sketch-svg-prepare (file &key color-map color-scheme new-file)
  "Clean up SVG for publishing."
  (when (string= (file-name-extension file) "pdf")
    (setq file (my-sketch-convert-pdf file)))
  (let ((dom (xml-parse-file file)))
    (setq dom (my-sketch-clean dom))
    (setq dom (my-sketch-color-to-hex dom))
    (setq dom (my-sketch-add-bg dom))
    (setq dom (my-sketch-change-fill-to-style dom))
    (setq dom (my-sketch-recolor dom
                                 (or color-map
                                     color-scheme
                                     t)))
    (with-temp-file (or new-file file) (svg-print (car dom)))
    (or new-file file)))

Editing and linking text

I've started keeping the text of the sketch in the same directory so that I can someday have full-text search for images. I have a keyboard shortcut for jumping to the text file. I like to open it in Org Mode.

my-org-sketch-open-text-file
(defun my-org-sketch-open-text-file (sketch)
  (interactive (list (my-complete-sketch-filename)))
  (find-file (concat (file-name-sans-extension sketch) ".txt"))
  (with-current-buffer (find-file-noselect sketch)
    (display-buffer-in-side-window
     (current-buffer)
     '((window-width . 0.5)
       (side . right)))))

The raw text from Google Cloud Vision is reasonably accurate but jumbled. I can move lines around with M-S-up and M-S-down in Org (org-shiftmetaup and org-shiftmetadown), which drag lines around. Once I add newlines, I can reorganize paragraphs with M-up and M-down (org-metaup and org-metadown). I can move list elements with M-S-right and M-S-left. (Idea: Avy probably has some awesome line-management functions I could get the hang of using.)

Once I've reorganized and cleaned up the text, I add links. Between my consult-omni shortcut and the new bookmarks I'm trying out (I should make a post about that), it's pretty easy.

Prompting for rectangles

Then it's a quick trip to Inkscape to draw rectangles over the things I want to link. It's easy to see where to draw the links because Org Mode highlights the links in the text. The style of the rectangles doesn't matter. After I save the SVG, I hop back into Emacs to turn them into links. This is the fun new part I just added.

Linkify rects

I like this because I got to reuse some code I'd written before to identify and reorder paths for easier animation of SVG topic maps. Using the links I defined in the previous step, all I needed to do was go through the rects (excluding the background rectangle) and offer completing-read on the titles and URLs. Then I createed the link elements and restyled the rectangles.

my-svg-linkify-rects
(defun my-svg-display (buffer-name svg &optional highlight-id full-window)
  "HIGHLIGHT-ID is a string ID or a node."
  (with-current-buffer (get-buffer-create buffer-name)
    (when highlight-id
      ;; make a copy
      (setq svg (with-temp-buffer (svg-print svg) (car (xml-parse-region (point-min) (point-max)))))
      (if-let* ((path (if (stringp highlight-id) (dom-by-id svg highlight-id) highlight-id))
                (view-box (split-string (dom-attr svg 'viewBox)))
                (box (my-svg-bounding-box path))
                (parent (car path)))
          (progn
            ;; find parents for possible rotation
            (while (and parent (not (dom-attr parent 'transform)))
              (setq parent (dom-parent svg parent)))
            (dom-set-attribute path 'style
                               (concat (dom-attr path 'style) "; stroke: 1px red; fill: #ff0000 !important"))
            ;; add a crosshair
            (dom-append-child
             (or parent svg)
             (dom-node 'path
                       `((d .
                            ,(format "M %f,0 V %s M %f,0 V %s M 0,%f H %s M 0,%f H %s"
                                     (elt box 0)
                                     (elt view-box 3)
                                     (elt box 2)
                                     (elt view-box 3)
                                     (elt box 1)
                                     (elt view-box 2)
                                     (elt box 3)
                                     (elt view-box 2)))
                         (stroke-dasharray . "5,5")
                         (style . "fill:none;stroke:gray;stroke-width:3px")))))
        (error "Could not find %s" highlight-id)))
    (let* ((inhibit-read-only t)
           (image (svg-image svg))
           (edges (window-inside-pixel-edges (get-buffer-window))))
      (erase-buffer)
      (if full-window
          (progn
            (delete-other-windows)
            (switch-to-buffer (current-buffer)))
        (display-buffer (current-buffer)))
      (insert-image (append image
                            (list :max-width
                                  (floor (* 0.8 (- (nth 2 edges) (nth 0 edges))))
                                  :max-height
                                  (floor (* 0.8 (- (nth 3 edges) (nth 1 edges)))) )))
      ;; (my-svg-resize-with-window (selected-window))
      ;; (add-hook 'window-state-change-functions #'my-svg-resize-with-window t)
      (current-buffer))))

(cl-defun my-svg-identify-paths (filename &key selector node-func dom)
  "Prompt for IDs for each path in FILENAME."
  (interactive (list (read-file-name "SVG: " nil nil
                                     (lambda (f)
                                       (or (string-match "\\.svg$" f)
                                           (file-directory-p f))))))
  (let* ((dom (or dom (car (xml-parse-file filename))))
         (paths (if (functionp selector)
                    (dom-search dom selector)
                  (or selector
                      (dom-by-tag dom 'path))))
         (vertico-count 3)
         (ids (seq-keep (lambda (path)
                          (and (dom-attr path 'id)
                               (unless (string-match "\\(path\\|rect\\)[0-9]+"
                                                     (or (dom-attr path 'id) "path0"))
                                 (dom-attr path 'id))))
                        paths))
         (edges (window-inside-pixel-edges (get-buffer-window)))
         id)
    (my-svg-display "*image*" dom nil t)
    (dolist (path paths)
      ;; display the image with an outline
      (unwind-protect
          (progn
            (my-svg-display "*image*" dom (dom-attr path 'id) t)
            (if (functionp node-func)
                (funcall node-func path dom)
              (setq id (completing-read
                        (format "ID (%s): " (dom-attr path 'id))
                        ids))
              ;; already exists, merge with existing element
              (if-let* ((old (dom-by-id dom id)))
                  (progn
                    (dom-set-attribute
                     old
                     'd
                     (concat (dom-attr (dom-by-id dom id) 'd)
                             " "
                             ;; change relative to absolute
                             (replace-regexp-in-string "^m" "M"
                                                       (dom-attr path 'd))))
                    (dom-remove-node dom path)
                    (setq id nil))
                (dom-set-attribute path 'id id)
                (add-to-list 'ids id)))))
      ;; save the image just in case we get interrupted halfway through
      (with-temp-file filename
        (svg-print dom)))))

(defun my-svg-identify-rects (filename)
  (interactive (list (read-file-name "SVG: " nil nil
                                     (lambda (f)
                                       (or (string-match "\\.svg$" f)
                                           (file-directory-p f))))))
  (my-svg-identify-paths
   filename
   :selector
   (lambda (elem)
     (and (eq (dom-tag elem) 'rect)
          (not (and (dom-attr elem 'class)
                    (string-match "\\<background\\>" (dom-attr elem 'class))))))))

(defun my-org-links-from-file (filename)
  "Return a list of (description . link) of the Org links in FILENAME."
  (when (file-exists-p filename)
    (let (results)
      (with-temp-buffer
        (insert-file-contents filename)
        (goto-char (point-min))
        (while (re-search-forward org-link-any-re nil t)
          (push (cons (match-string-no-properties 3)
                      (or (match-string-no-properties 2)
                          (match-string-no-properties 0)))
                results)))
      (reverse results))))

(defun my-svg-linkify-rects (filename)
  (interactive (list (read-file-name "SVG: " nil nil
                                     (lambda (f)
                                       (or (string-match "\\.svg$" f)
                                           (file-directory-p f))))))
  (let ((dom (car (xml-parse-file filename)))
        (links-from-text (my-org-links-from-file (concat (file-name-sans-extension filename) ".txt"))))
    (my-svg-identify-paths
     filename
     :dom
     dom
     :selector
     (append
      ;; not yet linked
      (dom-search dom
                  (lambda (elem)
                    (and (eq (dom-tag elem) 'rect)
                         (not (and (dom-attr elem 'class)
                                   (string-match "\\<background\\|link-rect\\>" (dom-attr elem 'class)))))))
      ;; linked
      (dom-search dom
                  (lambda (elem)
                    (and (eq (dom-tag elem) 'rect)
                         (string-match "\\<link-rect\\>" (or (dom-attr elem 'class) ""))))))

     :node-func
     (lambda (elem dom)
       (let* ((current-link-node (my-dom-closest dom elem 'a))
              (current-title-node (or (dom-by-tag elem 'title)
                                      (dom-by-tag current-link-node 'title)))
              (title (string-trim
                      (completing-read
                      "Title: "
                      (mapcar 'car links-from-text)
                      nil nil
                      (dom-text current-title-node))))
              (link (string-trim
                     (read-string
                       "URL: "
                       (or (dom-attr current-link-node 'href)
                           (assoc-default title links-from-text 'string=)))
                     )))
         (cond
          ((and current-link-node (not (string= link "")))
           (dom-set-attribute elem
                              'style
                              "stroke: blue; stroke-dasharray: 4; fill: #006fff; fill-opacity: 0.25")
           (dom-set-attribute current-link-node 'href link))
          ((and current-link-node (string= link ""))
           (dom-add-child-before
            (dom-parent dom current-link-node)
            elem)
           (dom-remove-node current-link-node))
          ((and (null current-link-node) (not (string= link "")))
           (setq current-link-node (dom-node
                                    'a
                                    `((href . ,link)
                                      (class . "link"))))
           (dom-add-child-before (dom-parent dom elem) current-link-node elem)
           (dom-remove-node dom elem)
           (dom-append-child current-link-node elem)
           (dom-remove-attribute elem 'fill)
           (dom-set-attribute elem
                              'style
                              "stroke: blue; stroke-dasharray: 4; fill: #006fff; fill-opacity: 0.25")
           (dom-set-attribute
            elem
            'class
            (if (dom-attr elem 'class)
                (concat (dom-attr elem 'class) " link-rect")
              "link-rect"))))
         (cond
          ((and (string= title "") current-title-node)
           (dom-remove-node current-title-node))
          ((and (not (string= title "")) (not current-title-node))
           (dom-append-child current-link-node (dom-node 'title nil title)))
          ((and (not (string= title "")) current-title-node)
           (setf (car (dom-children current-title-node))
                 title))))))))

(defun my-svg-update-links-from-text (filename)
  (interactive (list (read-file-name
                      "SVG: " nil
                      (if (file-exists-p (concat (file-name-sans-extension (buffer-file-name)) ".svg"))
                          (concat (file-name-sans-extension (buffer-file-name)) ".svg")
                        (cdr (my-embark-image)))
                      (lambda (f)
                        (or (string-match "\\.svg$" f)
                            (file-directory-p f))))))
  (let ((dom (car (xml-parse-file filename)))
        (links-from-text (my-org-links-from-file (concat (file-name-sans-extension filename) ".txt"))))
    (dolist (link (dom-by-tag dom 'a))
      (when (and
             (assoc-default (dom-text (dom-by-tag link 'title))
                            links-from-text)
             (not (string=
                   (dom-attr link 'href)
                   (assoc-default (dom-text (dom-by-tag link 'title))
                                  links-from-text))))
        (dom-set-attribute
         link
         'href
         (assoc-default (dom-text (dom-by-tag link 'title))
                        links-from-text))))
    (with-temp-file filename
      (svg-print dom))))



Writing about the sketch

I tweaked my function for drafting a blog post about a sketch. I added panning and zooming capabilities using Javascript, included the sketch text, and added any sections that I referred to using anchors. (TODO: Come to think of it, I should rewrite those to be absolute links using the permalink so that they'll still make sense even if people bookmark them from the main page of my blog.)

my-write-about-sketch
(defun my-insert-sketch-and-text (sketch)
  (interactive (list (my-complete-sketch-filename)))
  (insert
   (format "#+begin_panzoom\n%s\n#+end_panzoom\n\n"
           (if (string= (file-name-extension sketch) "svg")
               (org-link-make-string (concat "file:" sketch))
             (org-link-make-string (concat "sketchFull:" sketch)))))
  (let ((links (my-org-links-from-file (concat (file-name-sans-extension sketch) ".txt")))
        (subheading-level (1+ (org-current-level))))
    (insert (if links
                "#+begin_my_details Text and links from sketch\n"
              "#+begin_my_details Text from sketch\n"))
    (my-sketch-insert-text sketch)
    (unless (bolp) (insert "\n"))
    (insert "#+end_my_details")
    (dolist (section (seq-filter (lambda (entry) (string-match "^#" (cdr entry)))
                                 links))
      (org-end-of-subtree)
      (insert "\n\n")
      (org-insert-heading nil nil subheading-level)
      (insert (car section))
      (org-entry-put (point) "CUSTOM_ID" (substring (cdr section) 1)))))

(defun my-write-about-sketch (sketch)
  (interactive (list (my-complete-sketch-filename)))
  (shell-command "make-sketch-thumbnails")
  (find-file "~/sync/orgzly/posts.org")
  (goto-char (point-min))
  (unless (org-at-heading-p) (outline-next-heading))
  (org-insert-heading nil nil t)
  (insert (file-name-base sketch) "\n\n")
  (my-insert-sketch-and-text sketch)
  (delete-other-windows)
  (save-excursion
    (with-selected-window (split-window-horizontally)
      (find-file sketch))))

And then I can export the image as an inline SVGs in Org Mode HTML and Markdown exports, yay!

Other functions not included above are probably somewhere in my Emacs config.

Using an SVG as a sticky table of contents

… and now I can make the image a sticky table of contents as you scroll down, by wrapping it in something like this:

#+begin_sticky-toc-after-scrolling
#+begin_panzoom
file:/home/sacha/sync/sketches/2025-01-17-01 Hyperlinking SVGs -- drawing supernote inkscape svg.svg
#+end_panzoom
#+end_sticky-toc-after-scrolling

Mwahahaha! (Now I just need to make it highlight different sections as we scroll…)

Here's the snippet from my misc.js:

Sticky table of contents after scrolling
function stickyTocAfterScrolling() {
  const elements = document.querySelectorAll('.sticky-toc-after-scrolling');
  let lastScroll = window.scrollY;
  const cloneMap = new WeakMap();

  elements.forEach(element => {
    const clone = element.cloneNode(true);
    clone.setAttribute('class', 'sticky-toc');
    cloneMap.set(element, clone);
    element.parentNode.insertBefore(clone, element.nextSibling);
    const zoom = panZoom = svgPanZoom(clone.querySelector('svg'));
    zoom.resetZoom();
  });

  const observer = new IntersectionObserver(
    (entries) => {
      const currentScroll = window.scrollY;
      const scrollingDown = currentScroll > lastScroll;
      lastScroll = currentScroll;

      entries.forEach(entry => {
        const element = entry.target;
        const clone = cloneMap.get(element);

        if (!entry.isIntersecting && scrollingDown) {
          clone.setAttribute('class', 'sticky-toc');
          clone.style.display = 'block';
        } else if (entry.isIntersecting && !scrollingDown) {
          element.style.visibility = 'visible';
          clone.style.display = 'none';
        }
      });
    },
    {
      root: null,
      threshold: 0,
      rootMargin: '-10px 0px 0px 0px'
    }
  );

  elements.forEach(element => {
    observer.observe(element);
  });

  window.addEventListener('resize', () => {
    elements.forEach(element => {
      const clone = cloneMap.get(element);
      if (clone.style.display != 'none') {
        // reset didn't seem to work
        svgPanZoom(clone.querySelector('svg')).destroy();
        addPanZoomToElement(clone.querySelector('svg'));
      }
    });
  }, { passive: true });
}

stickyTocAfterScrolling();
View org source for this post

Revisiting wearable computing

Posted: - Modified: | geek

[2025-01-21 Tue]: As it turns out, I can still use the voice assistant when I'm recording audio, which is great. If I do that with my earbuds, I can hear the audio output. If it's just the lapel mic plugged in, I can look at my phone for the visual output. That means I can still set timers and possibly do quick conversions or lookups even when I'm doing a braindump. That also means that I probably don't have to use a separate voice recorder, which is good because I've been finding it hard to keep track of an extra device.

Also, it turns out that my Pixel 8 has some kind of Voice Access tool, which I'm looking forward to digging into.

Text and links from sketch

Revisiting wearable computing 2025-01-18-01

  • I experimented with wearable computing in 2002 because I wanted to explore life-logging and access to more info on the go.
  • I started with an M1 display, but I eventually switched to an earphone plugged into a laptop running Emacspeak in my backpack. Much more discreet and still pretty powerful.
  • The iPhone was launched in 2007. Now smartphones are totally normal. Wireless earbuds and and hands-free talking, too.
  • My life has also changed. I spend more time off my laptop: waiting at playdates, walking on errands, squeezing. in thoughts here and there.

What do I want to explore now?

Things I want to do:

and someday…

  • I want to learn and review.
  • I want to compensate for cognitive weaknesses (attentional hiccups, tip-of-the-tongue, memory)
    • Audio might be a good starting point
    • Maybe this is where LLMs might be worthwhile

Also, audio & alternative input may be useful for adapting to age-related changes, so that makes sense to me.

How to do it:

For now:

Then and now

Nudged by @cweber's post about The DIY FOSS cyborg (Emacs and Guix, yay!), and this Mastodon conversation, I wanted to revisit the ideas of wearable computing now that tech is a lot more wearable.

It was a very different world when I first experimented with wearable computing in 2002, and my life was also very different from what it is now. I remember mostly wanting to:

  • Capture notes on the go
    • I can jot things down on my smartphone now. Also, with WhisperX speech recognition being pretty accurate and fast, dictation is now a way I can get lots of stuff down.
  • Learn about interesting things, even when I can't read
    • Podcasts are still fairly niche even today, but YouTube was founded in 2005, and there are plenty of videos now. Back then, it wasn't as easy to find interesting things to listen to, so that's why running text-to-speech on info manuals and text files was helpful. I liked the ease of navigating between pages and chapters, and sometimes I even searched for specific text. Podcasts and videos aren't quite like that yet, but maybe someday.
  • Look up stuff
    • … and now we've got smartphones that respond to voice queries (unless I'm recording, in which case I just add an audio placeholder and then look things up on my phone or at home). Finding stuff on the Web is still relatively straightforward, although AI slop might make that iffy. Publishing as many of my notes as possible makes them a little easier to find if I search manually. Searching private notes can usually wait until I get home. It'd be nice to have good private search someday.

Here are my current off-laptop computing opportunities:

  • Maybe a 30-minute solo walk, or some audio braindumping if I'm doing chores by myself, or a listening opportunity if I'm doing chores while other people are in the same room. This needs to be mostly hands-free.
  • Possibly an hour or two of waiting at playdates, where I usually like to stand and listen to the other grown-ups chat or walk around to get my own exercise; frequently interrupted by A+'s desires for hugs, especially if the other kids are playing tag or other games she doesn't enjoy. I might be able to use a small input device in my pocket. Alternatively, I can knit or crochet at this time.
  • An unknown length of time when I'm waiting for A+. Sometimes we're at home and I can squeeze in some writing or coding on my laptop or some drawing on my Supernote. Sometimes we're outside and I can record or write on my phone.
  • Doing an audio braindump or a quick review as part of my evening routines or before I go to bed.

Thinking out loud, capturing notes

When I have a moment, I tend to prioritize untangling my thoughts or writing more than listening. Coming up with a list of things to think about is fairly easy. I can think of many things off the top of my head, and if I somehow manage to exhaust that list, a quick browse of my inbox or a search for the todump tag will turn up more. Some things that I want to think about are best thought of in front of a computer where I can try out source code and look up manuals. Still, even if I'm out walking on my own, there are things I can think through.

Audio braindumps help me figure out what I care about enough to write more about, so that when I actually get some writing or drawing time, I can focus on the thoughts that have some depth to them, where I have something to say.

I'm not very smooth at thinking out loud. I meander. I rephrase. I interrupt myself and go on tangents. I'm slowly getting better at collecting a phrase in my head before I say it. I'm working my way up to sentences and eventually paragraphs. But I definitely don't talk in a straightforward and coherent manner. Then again, I don't write or draw in a linear manner either. I jump around. That's what editing is for. I'm working on learning how to structure things better up front (figuring out a quick outline aloud, or maybe jotting things down in my Orgzly inbox; declaring the sections using keywords; mentally rehearsing my sentences and phrases; marking new paragraphs with "new paragraph" to add line breaks) so that I need to do less editing.

Sometimes an idea bounces back and forth between audio, words, and drawings. An audio breakdown might help me identify the rough outline of things that I want to talk about, the drawing helps me figure out the order and the relationships between parts, and then looking at the drawing while I braindump helps me then flesh it out in even more detail. Then I can edit large parts of that transcript into the text that will go along with the sketchnote. (Example: How do I want to get better at learning out loud? Part 1 of 4: Starting, of which I've only managed to do one part so far…)

Recording

When I'm recording audio, I'm not looking at the recording or the lapel mic, so it can be hard to tell if I've run out of battery. I haven't lost anything I missed – mostly just ephemera – but it would be nice to be able to trust in the system. I'm considering upgrading to a lapel mic set that comes with a charging case, two microphones, and a battery level indicator so I can be reasonably sure I'll have a charged lapel mic.

Update: As it turns out, I can record audio and use the voice assistant at the same time, so you can ignore this part below. =) If I have my earbuds in, I can hear the audio feedback from the request. If I don't, the response is just on my phone screen, which is still manageable. This means I probably don't have to fuss around with a separate voice recorder, which is good because I was having a hard time keeping track of extra devices.

If I use the lapel mic to record on my phone, the Google voice assistant doesn't work, so it's a bit of a tradeoff. I mostly use the voice assistant to set timers, check the time or weather, and do quick conversions. If I shift audio recording to a separate device, I can still use voice interactions with my phone. I have an Olympus WS-311M digital voice recorder that uses an AAA battery and transfers the recorded WMA files via USB. I can try putting that in my overalls pocket or on a lanyard, and I can see if the recording quality is good enough to be transcribed. If there's too much noise from clothing, I can get a lapel mic with an audio jack. A few wireless lapel mics with charging cases also come with a combination receiver that can plug into either an audio jack, USB C, or a Lightning port. Since I'm not sure yet, I'm going to hold off on upgrading my lapel mic until I get a sense of whether the external recorder works for me.

I've also been contemplating getting a new audio recorder. Many audio recorders can now also connect to phones via Bluetooth in order to record calls. If I get one, it'll be easier to interview my mom and collect stories. This is low priority at the moment, though.

Anyway, I think I record more than I can efficiently process at this point. The bottleneck isn't really getting things in, it's processing things and turning them into things I can share.

Looking things up

Most of the things I want to look up can be handled on my phone or saved for when I get back to my laptop.

These days, looking things up in my personal notes on the go is mostly a matter of checking my people.org file to help me remember the names of grown-ups and kids at the playdates, and to keep some notes on their interests. I still have a hard time remembering names or telling some people apart, so I try to jot down details that can help me remember. If I want to get even fancier (and less stick-figure-y), it could be interesting to try using those police-sketch-type apps to capture a sense of someone's likeness without taking a picture of them with my phone.

Learning and input

It's easy to find things to listen to these days, but it's also easy to think that I'm learning something but not actually absorb it. I want to think about stuff while I'm listening to it or shortly afterwards, connecting the ideas with other things I've learned and things I want to try. I like braindumping after a podcast or book, but I'd like to get better at capturing quotations and more specific thoughts. u/LordLuvMuffin recommends recording audio while listening to a podcast, which should work pretty well with my current workflow for braindumping. If I rewind a bit or pause, I can note some words to find in the transcript.

Reviewing might be a better use of my time compared to listening to podcasts. I've been recording 1-minute Emacs notes for myself so that I can shuffle them and remember things I've been tinkering with.

Spaced repetition is even more effective than random shuffling. That's where some kind of reliable eyes-free input might be helpful. For that, I might be able to use Anki (possibly with org-anki), org-drill, or org-fc. People seem to like using the 8BitDo gamepad together with Anki, and the 8BitDo Micro looks small enough to tuck into a pocket. I might even be able to use the buttons with thin gloves. I can imagine having speech-synthesized questions to jog my memory, pressing a key to hear the answer, and then noting whether I got that right and how easily I answered. (Apparently there are also multi-button rings people like to use as Tiktok remotes. Another possibility…)

I can start with the hardware I've got. I like the idea of the triple-tap headphone gesture that snipd uses to save key moments. The Indy Evo earbuds I'm using support triple-taps and apparently I can assign Tasker to handle that, so I can probably figure out some kind of Tasker profile that does some useful thing. Maybe I can triple-tap to hear a random speech-synthesized review/thinking prompt extracted from my Org files. Tasker is a little cumbersome to work with, so if I can get Tasker to send an intent to Emacs (maybe via Termux, if I set up Emacs on Android and Termux correctly?), maybe I can do more of my automation within Emacs.

Also, the micro:bit can act as a Bluetooth human interface device (keyboard, mouse, media keys, or gamepad) for my Android phone and maybe the laptop, which is promising. I have a gamepad and a board that has a bunch of buttons on it. I think tinkering with hardware would be a good skill to develop (and model for A+), so that's probably a good step. If I can get that working, then I can see if I can make it pocketable enough, and whatever UI I come up with along the way will probably still be translatable to something like the 8BitDo Micro if I decide that's smoother.

There's also this interesting Reddit thread about hand-held devices that people run Emacs on, if I want something more easily tweakable than my phone.

I'd love to have Emacspeak working on my phone (or failing that, a device I can tuck into my vest). Espeak works on Termux, so it might be doable. I think that could lend itself to lots of fun experiments. Hah, worst-case scenario, I could maybe SSH to my virtual private server, get Emacspeak running on it, and maybe use icecast to stream the audio from it to my phone. Wouldn't that be crazy?

Working around attentional hiccups and memory

If I leave the recorder running while I'm doing chores, I can try to build a habit of narrating a quick note when I'm putting things away, especially for things that are infrequently used or are not in their usual place. Alternatively, I can make a quick voice-to-text note in Orgzly so that it's more searchable and so that it doesn't get lost in my transcript inbox.

For stuff, I should probably do an inventory of the house anyway. It could be a good opportunity to declutter.

Smart glasses

Getting smart glasses with speakers/microphones is tempting, because then I don't have to fuss around with a lapel mic or earbuds. The ones with cameras might be fun for capturing quick moments, too. I have a small face, though, so it might be the sort of thing to try in person. Also, privacy and tweakability, hmm…

Next steps for me

  • Practise thinking in phrases, then sentences, then paragraphs.
  • Experiment with using a separate voice recorder so that I can also try out voice interfaces.
  • Slow down and capture more notes. Do an inventory and declutter along the way.
  • Improve workflows for braindumping and review
  • Experiment with simple speech synthesis and input:
    • Triple-tap, Tasker profiles
    • micro:bit as Bluetooth keyboard
    • Consider Bluetooth gamepad if I want something more polished/pocketable
  • See if I can figure out Emacspeak on my phone.

One of the things I'm trying to practise is limiting the scope of a blog post when I notice I'm starting to get carried away. =) There's plenty for me to think about and follow up on in this one!

Related:

View org source for this post

2025-01-20 Emacs news

| emacs, emacs-news

Links from reddit.com/r/emacs, r/orgmode, r/spacemacs, r/planetemacs, Mastodon #emacs, Bluesky #emacs, Hacker News, lobste.rs, kbin, programming.dev, lemmy.world, lemmy.ml, communick.news, planet.emacslife.com, YouTube, the Emacs NEWS file, Emacs Calendar, and emacs-devel. Thanks to Andrés Ramírez for emacs-devel links. Do you have an Emacs-related link or announcement? Please e-mail me at sacha@sachachua.com. Thank you!

View org source for this post

Changing planet.emacslife.com

| emacs

For the longest time, planet.emacslife.com was generated with the Planet Venus aggregator, which required Python 2.7. My recent blog post about Yay Emacs 8: which-key-replacement-alist had some emojis that the parser couldn't handle, though (unichr() arg not in range(0x10000) (narrow Python build)). I decided to rewrite the aggregator using NodeJS.

Here are some differences between the current implementation and the previous ones:

  • The website shows the last two weeks of posts, since I can filter by date. It should also ignore future-dated posts.
  • The list of feeds on the right side is now sorted by last post date, so it's easier to see active blogs.
  • I can now filter a general feed by a regular expression.
  • I've removed a number of unreachable blogs.
  • The feed list is loaded from a JSON instead of an INI.

The Atom feed and the OPML file should validate, but let me know at sacha@sachachua.com if there are any hiccups (or if you have an Atom/RSS feed we can add to the aggregator =) ).

View org source for this post

2025-01-13 Emacs news

| emacs, emacs-news

Links from reddit.com/r/emacs, r/orgmode, r/spacemacs, r/planetemacs, Mastodon #emacs, Bluesky #emacs, Hacker News, lobste.rs, programming.dev, lemmy.world, lemmy.ml, communick.news, planet.emacslife.com, YouTube, the Emacs NEWS file, Emacs Calendar, and emacs-devel. Thanks to Andrés Ramírez for emacs-devel links. Do you have an Emacs-related link or announcement? Please e-mail me at sacha@sachachua.com. Thank you!

View org source for this post

Treemap visualization of an Org Mode file

| org, visualization

[2025-01-12 Sun]: u/dr-timeous posted a treemap_org.py · GitHub that makes a coloured treemap that displays the body on hover. (Reddit) Also, I think librsvg doesn't support wrapped text, so that might mean manually wrapping if I want to figure out the kind of text density that webtreemap has.

One of the challenges with digital notes is that it's hard to get a sense of volume, of mass, of accumulation. Especially with Org Mode, everything gets folded away so neatly and I can jump around so readily with C-c j (org-goto) or C-u C-c C-w (org-refile) that I often don't stumble across the sorts of things I might encounter in a physical notebook.

Treemaps are a quick way to visualize hierarchical data using nested rectangles or squares, giving a sense of relative sizes. I was curious about what my main organizer.org file would look like as a treemap, so I wrote some code to transform it into the kind of data that https://github.com/danvk/webtreemap wants as input. webtreemap creates an HTML file that uses Javascript to let me click on nodes to navigate within them.

For this treemap prototype, I used org-map-entries to go over all the headings and make a report with the outline path and the size of the heading. To keep the tree visualization manageable, I excluded done/cancelled tasks and archived headings. I also wanted to exclude some headings from the visualization, like the way my Parenting subheading has lots of personal information underneath it. I added a :notree: tag to indicate that a tree should not be included.

Screencast of exploring a treemap

Reflections

2025-01-11_19-54-47.png
Figure 1: Screenshot of the treemap for my organizer.org

The video and the screenshot above show the treemap for my main Org Mode file, organizer.org. I feel like the treemap makes it easier to see projects and clusters where I'd accumulated notes, both in terms of length and quantity. (I've omitted some trees like "Parenting" which take up a fairly large chunk of space.)

To no one's surprise, Emacs takes up a large part of my notes and ideas. =)

When I look at this treemap, I notice a bunch of nodes I need to mark as DONE or CANCELLED because I forgot to update my organizer.org. That usually happens when I come up with an idea, don't remember that I'd come up with it before, put it in my inbox.org file, and do it from there or from the organizer.org location I've refiled it to without bumping into the first idea. Once in a blue moon, I go through my whole organizer.org file and clean out the cruft. Maybe a treemap like this will make it easier to quickly scan things.

Interestingly, "Explore AI" takes up a disproportionately large chunk of my "Inactive Projects" visualization, even though I spend more time and attention on other things. Large language models make it easy to generate a lot of text, but I haven't really done the work to process those. I've also collected a lot of links that I haven't done much with.

It might be neat to filter the headings by timestamp so that I can see things I've touched in the last 6 months.

Hmm, looking at this treemap reminds me that I've got "organizer.org/Areas/Ideas for things to do with focused time/Writing/", which probably should get moved to the posts.org file that I tend to use for drafts. Let's take look at the treemap for that file. (Updated: cleared it out!)

2025-01-11_20-10-18.png
Figure 2: Drafts in my posts.org

Unlike my organizer.org file, my posts.org file tends to be fairly flat in terms of hierarchy. It's just a staging ground for ideas before I put them on my blog. I usually try to keep posts short, but a few of my posts have sub-headings. Since the treemap makes it easy to see nodes that are larger or more complex, that could be a good nudge to focus on getting those out the door. Looking at this treemap reminds me that I've got a bunch of EmacsConf posts that I want to finish so that I can document more of our processes and tools.

2025-01-11_14-52-28.png
Figure 3: Treemap of my inbox

My inbox.org is pretty flat too, since it's really just captured top-level notes that I'll either mark as done or move somewhere else (usually organizer.org). Because the treemap visualization tool uses / as a path separator, the treemap groups headings that are plain URLs together, grouped by domain and path.

2025-01-12_08-30-44.png
Figure 4: Treemap of my Emacs configuration

My Emacs configuration is organized as a hierarchy. I usually embed the explanatory blog posts in it, which explains the larger nodes. I like how the treemap makes it easy to see the major components of my configuration and where I might have a lot of notes/custom code. For example, my config has a surprising amount to do with multimedia considering Emacs is a text editor, and that's mostly because I like to tinker with my workflow for sketchnotes and subtitles. This treemap would be interesting to colour based on whether something has been described in a blog post, and it would be great to link the nodes in a published SVG to the blog post URLs. That way, I can more easily spot things that might be fun to write about.

The code

This assumes https://github.com/danvk/webtreemap is installed with npm install -g webtreemap-cli.

(defvar my-org-treemap-temp-file "~/treemap.html") ; Firefox inside Snap can't access /tmp
(defvar my-org-treemap-command "treemap" "Executable to generate a treemap.")

(defun my-org-treemap-include-p (node)
  (not (or (eq (org-element-property :todo-type node) 'done)
           (member "notree" (org-element-property :tags node))
           (org-element-property-inherited :archivedp node 'with-self))))

(defun my-org-treemap-data (node &optional path)
  "Output the size of headings underneath this one."
  (let ((sub
         (apply
          'append
          (org-element-map
              (org-element-contents node)
              '(headline)
            (lambda (child)
              (if (my-org-treemap-include-p child)
                  (my-org-treemap-data
                   child
                   (append path
                           (list
                            (org-no-properties
                             (org-element-property :raw-value node)))))
                (list
                 (list
                  (-
                   (org-element-end child)
                   (org-element-begin child))
                  (string-join
                   (cdr
                    (append path
                            (list
                             (org-no-properties
                              (org-element-property :raw-value node))
                             (org-no-properties
                              (org-element-property :raw-value child)))))
                   "/")
                  nil))))
            nil nil 'headline))))
    (append
     (list
      (list
       (-
        (org-element-end node)
        (org-element-begin node)
        (apply '+ (mapcar 'car sub))
        )
       (string-join
        (cdr
         (append path
                 (list
                  (org-no-properties (org-element-property :raw-value node)))))
        "/")
       (my-org-treemap-include-p node)))
     sub)))

(defun my-org-treemap ()
  "Generate a treemap."
  (interactive)
  (save-excursion
    (goto-char (point-min))
    (let ((file (expand-file-name (expand-file-name my-org-treemap-temp-file)))
          (data (cdr (my-org-treemap-data (org-element-parse-buffer)))))
      (with-temp-file file
        (call-process-region
         (mapconcat
          (lambda (entry)
            (if (elt entry 2)
                (format "%d %s\n" (car entry)
                        (replace-regexp-in-string org-link-bracket-re "\\2" (cadr entry)))
              ""))
          data
          "")
         nil
         my-org-treemap-command nil t t))
      (browse-url (concat "file://" (expand-file-name my-org-treemap-temp-file))))))

There's another treemap visualization tool that can produce squarified treemaps as coloured SVGs, so that style might be interesting to explore too.

Next steps

I think there's some value in being able to look at and think about my outline headings with a sense of scale. I can imagine a command that shows the treemap for the current subtree and allows people to click on a node to jump to it (or maybe shift-click to mark something for bulk action), or one that shows subtrees summing up :EFFORT: estimates or maybe clock times from the logbook, or one limited by a timestamp range, or one that highlights matching entries as you type in a query, or one that visualizes s-exps or JSON or project files or test coverage.

It would probably be more helpful if the treemap were in Emacs itself, so I could quickly jump to the Org nodes and read more or mark something as done when I notice it. boxy-headings uses text to show the spatial relationships of nested headings, which is neat but probably not up to handling this kind of information density. Emacs can also display SVG images in a buffer, animate them, and handle mouse-clicks, so it could be interesting to implement a general treemap visualization which could then be used for all sorts of things like disk space usage, files in project modules, etc. SVGs would probably be a better fit for this because that allows increased text density and more layout flexibility.

It would be useful to browse the treemap within Emacs, export it as an SVG so that I can include it in a webpage or blog post, and add some Javascript for web-based navigation.

The Emacs community being what it is (which is awesome!), I wouldn't be surprised if someone's already figured it out. Since a quick search for treemap in the package archives and various places doesn't seem to turn anything up, I thought I'd share these quick experiments in case they resonate with other people. I guess I (or someone) could figure out the squarified treemapping algorithm or the ordered treemap algorithm in Emacs Lisp, and then we can see what we can do with it.

I've also thought about other visualizations that can help me see my Org files a different way. Network graphs are pretty popular among the org-roam crew because org-roam-ui makes them. Aside from a few process checklists that link to headings that go into step-by-step detail and things that are meant to graph connections between concepts, most of my Org Mode notes don't intentionally link to other Org Mode notes. (There are also a bunch of random org-capture context annotations I haven't bothered removing.) I tend to link to my public blog posts, sketches, and source code rather than to other headings, so that's a layer of indirection that I'd have to custom-code. Treemaps might be a good start, though, as they take advantage of the built-in hierarchy. Hmm…

View org source for this post

Automatically correcting phrasing and misrecognized words in speech-to-text captions by using a script

| speechtotext, subed, emacs

I usually write my scripts with phrases that could be turned into the subtitles. I figured I might as well combine that information with the WhisperX transcripts which I use to cut out my false starts and oopses. To do that, I use the string-distance function, which calculates how similar strings are, based on the Levenshtein [distance] algorithm. If I take each line of the script and compare it with the list of words in the transcription, I can add one transcribed word at a time, until I find the number with the minimum distance from my current script phrase. This lets me approximately match strings despite misrecognized words. I use oopses to signal mistakes. When I detect those, I look for the previous script line that is closest to the words I restart with. I can then skip the previous lines automatically. When the script and the transcript are close, I can automatically correct the words. If not, I can use comments to easily compare them at that point. Even though I haven't optimized anything, it runs well enough for my short videos. With these subtitles as a base, I can get timestamps with subed-align and then there's just the matter of tweaking the times and adding the visuals.

Text from sketch

Matching a script with a transcript 2025-01-09-01

  • script
  • record on my phone
  • WhisperX transcript (with false starts and recognition errors)

My current implementation is totally unoptimized (n²) but it's fine for short videos.

Process:

  • While there are transcript words to process
    • Find the script line that has the minimum distance to the words left in the transcript. restart after oopses
  • Script
  • Transcript: min. distance between script phrase & transcript
  • Restarting after oops: find script phrase with minimum distance
  • Ex. script phrase: The Emacs text editor
  • Transcript: The Emax text editor is a…
  • Bar graph of distance decreasing, and then increasing again
  • Minimum distance
  • Oops?
    • N: Use transcript words, or diff > threshold?
      • Y: Add script words as comment
      • N: Correct minor errors
    • Y: Mark caption for skipping and look for the previous script line with minimum distance.

Result:

  • Untimed captions with comments
  • Aeneas
  • Timed captions for editing

This means I can edit a nicely-split, mostly-corrected file.

I've included the links to various files below so you can get a sense of how it works. Let's focus on an excerpt from the middle of my script file.

it runs well enough for my short videos.
With these subtitles as a base,
I can get timestamps with subed-align

When I call WhisperX with large-v2 as the model and --max_line_width 50 --segment_resolution chunk --max_line_count 1 as the options, it produces these captions corresponding to that part of the script.

01:25.087 --> 01:29.069
runs well enough for my short videos. With these subtitles

01:29.649 --> 01:32.431
as a base, I can get... Oops. With these subtitles as a base, I

01:33.939 --> 01:41.205
can get timestamps with subedeline, and then there's just

Running subed-word-data-use-script-file results in a VTT file containing this excerpt:

00:00:00.000 --> 00:00:00.000
it runs well enough for my short videos.

NOTE #+SKIP

00:00:00.000 --> 00:00:00.000
With these subtitles as a base,

NOTE #+SKIP

00:00:00.000 --> 00:00:00.000
I can get... Oops.

00:00:00.000 --> 00:00:00.000
With these subtitles as a base,

NOTE
#+TRANSCRIPT: I can get timestamps with subedeline,
#+DISTANCE: 0.14

00:00:00.000 --> 00:00:00.000
I can get timestamps with subed-align

There are no timestamps yet, but subed-align can add them. Because subed-align uses the Aeneas forced alignment tool to figure out timestamps by lining up waveforms for speech-synthesized text with the recorded audio, it's important to keep the false starts in the subtitle file. Once subed-align has filled in the timestamps and I've tweaked the timestamps by using the waveforms, I can use subed-record to create an audio file that omits the subtitles that have #+SKIP comments.

The code is available as subed-word-data-use-script-file in subed-word-data.el. I haven't released a new version of subed.el yet, but you can get it from the repository.

In addition to making my editing workflow a little more convenient, I think it might also come in handy for applying the segmentation from tools like sub-seg or lachesis to captions that might already have been edited by volunteers. (I got sub-seg working on my system, but I haven't figured out lachesis.) If I call subed-word-data-use-script-file with the universal prefix arg C-u, it should set keep-transcript-words to true and keep any corrections we've already made to the caption text while still approximately matching and using the other file's segments. Neatly-segmented captions might be more pleasant to read and may require less cognitive load.

There's probably some kind of fancy Python project that already does this kind of false start identification and script reconciliation. I just did it in Emacs Lisp because that was handy and because that way, I can make it part of subed. If you know of a more robust or full-featured approach, please let me know!

View org source for this post