Categories: geek

RSS - Atom - Subscribe via email

Using Emacs Lisp to export TXT/EPUB/PDF from Org Mode to the Supernote via Browse and Access

| supernote, org, emacs

I've been experimenting with the Supernote's Browse and Access feature because I want to be able to upload files quickly instead of waiting for Dropbox to synchronize. First, I want to store the IP address in a variable:

my-supernote-ip-address
(defvar my-supernote-ip-address "192.168.1.221")

Here's how to upload:

(defun my-supernote-upload (filename &optional supernote-path)
  (interactive "FFile: ")
  (setq supernote-path (or supernote-path "/INBOX"))
  (let* ((boundary (mml-compute-boundary '()))
         (url-request-method "POST")
         (url-request-extra-headers
          `(("Content-Type" . ,(format "multipart/form-data; boundary=%s" boundary))))
         (url-request-data
          (mm-url-encode-multipart-form-data
           `(("file" . (("name" . "file")
                        ("filename" . ,(file-name-nondirectory filename))
                        ("content-type" . "application/octet-stream")
                        ("filedata" . ,(with-temp-buffer
                                         (insert-file-contents-literally filename)
                                         (buffer-substring-no-properties (point-min) (point-max)))))))
           boundary)))
    (with-current-buffer
        (url-retrieve-synchronously
         (format "http://%s:8089%s" my-supernote-ip-address supernote-path))
      (re-search-backward "^$")
      (prog1 (json-read)
        (kill-buffer)))))

HTML isn't supported. Text works, but it doesn't support annotation. PDF or EPUB could work. It would make sense to register this as an export backend so that I can call it as part of the usual export process.

(defun my-supernote-org-upload-as-text (&optional async subtree visible-only body-only ext-plist)
  "Export Org format, but save it with a .txt extension."
  (interactive (list nil current-prefix-arg))
  (let ((filename (org-export-output-file-name ".txt" subtree))
        (text (org-export-as 'org subtree visible-only body-only ext-plist)))
    ;; consider copying instead of exporting so that #+begin_export html etc. is preserved
    (with-temp-file filename
      (insert text))
    (my-supernote-upload filename)))

(defun my-supernote-org-upload-as-pdf (&optional async subtree visible-only body-only ext-plist)
  (interactive (list nil current-prefix-arg))
  (my-supernote-upload (org-latex-export-to-pdf async subtree visible-only body-only ext-plist)))

(defun my-supernote-org-upload-as-epub (&optional async subtree visible-only body-only ext-plist)
  (interactive (list nil current-prefix-arg))
  (my-supernote-upload (org-epub-export-to-epub async subtree visible-only ext-plist)))

(org-export-define-backend
    'supernote nil
    :menu-entry '(?s "Supernote"
                     ((?s "as PDF" my-supernote-org-upload-as-pdf)
                      (?e "as EPUB" my-supernote-org-upload-as-epub)
                      (?o "as Org" my-supernote-org-upload-as-text))))

Adding this line to my Org file allows me to use \spacing{1.5} for 1.5 line spacing, so I can write in more annotations..

#+LATEX_HEADER+: \usepackage{setspace}

Sometimes I use custom blocks for HTML classes. When LaTeX complains about undefined environments, I can define them like this:

#+LATEX_HEADER+: \newenvironment{whatever_my_custom_environment_is_called}

Now I can export a subtree or file to my Supernote for easy review.

I wonder if multimodal AI models can handle annotated images with editing marks…

This is part of my Emacs configuration.
View org source for this post

Include inline SVGs in Org Mode HTML and Markdown exports

Posted: - Modified: | emacs, org
  • [2024-10-14 Mon]: Fixed path when inlining file URLs.
  • [2024-10-07 Mon]: Now I can specify #+ATTR_HTML :data-link t to make it link instead of include.
  • [2024-09-26 Thu]: Whoops, forgot to make sure ox-11ty is also covered.

In my Org Mode HTML and Markdown exports, I usually want to include SVGs inline so that I can use links. Sometimes I also want to use Javascript and CSS to modify elements within the images. I used to use a my-include: link to do this, but I realized that I can also modify this behaviour by making my own functions that call org-html-link or org-md-link and then put those functions in org-export-backend-transcoders.

Here is an example of an SVG:

g Graphviz Graphviz Org Mode Org Mode Graphviz->Org Mode SVG HTML HTML Org Mode->HTML Markdown Markdown Org Mode->Markdown

The following code overrides HTML and Markdown exports to include SVGs.

(defun my-ox-link-path (link _ info)
  (let* ((raw-path (org-element-property :path link)))
    (setq raw-path
          (org-export-file-uri
           (org-publish-file-relative-name raw-path info)))
    ;; Possibly append `:html-link-home' to relative file
    ;; name.
    (let ((home (and (plist-get info :html-link-home)
                     (org-trim (plist-get info :html-link-home)))))
      (when (and home
                 (plist-get info :html-link-use-abs-url)
                 (not (file-name-absolute-p raw-path)))
        (setq raw-path (concat (file-name-as-directory home) raw-path))))
    raw-path))

(defun my-org-html-link (link desc info)
  (if (and
       (string= (org-element-property :type link) "file")
       (not (plist-get (org-export-read-attribute :attr_html (org-element-parent-element link))
                       :data-link))
       (org-export-inline-image-p link (plist-get info :html-inline-image-rules)))
      (let ((path (org-element-property :path link)))
        (if (string= (file-name-extension path) "svg")
            (with-temp-buffer
              (insert-file-contents-literally path)
              (buffer-string))
          (org-html-link link desc info)))
    (org-html-link link desc info)))

(defun my-org-md-link (link desc info)
  (if (and (string= (org-element-property :type link) "file")
           (not (plist-get (org-export-read-attribute :attr_html (org-element-parent-element link))
                       :data-link)))
      (let ((path (org-element-property :path link)))
        (if (string= (file-name-extension path) "svg")
            (with-temp-buffer
              (insert-file-contents-literally path)
              (buffer-string))
          (org-md-link link desc info)))
    (org-md-link link desc info)))

(defun my-org-11ty-link (link desc info)
  (if (and (string= (org-element-property :type link) "file")
           (not (plist-get (org-export-read-attribute :attr_html (org-element-parent-element link))
                       :data-link)))
      (let ((path (org-element-property :path link)))
        (if (string= (file-name-extension path) "svg")
            (with-temp-buffer
              (insert-file-contents-literally path)
              (buffer-string))
          (org-11ty-link link desc info)))
    (org-11ty-link link desc info)))

(with-eval-after-load 'ox-html
  (setf
   (alist-get 'link (org-export-backend-transcoders (org-export-get-backend 'html)))
   'my-org-html-link))
(with-eval-after-load 'ox-md
  (setf
   (alist-get 'link (org-export-backend-transcoders (org-export-get-backend 'md)))
   'my-org-md-link))
(with-eval-after-load 'ox-11ty
  (setf
   (alist-get 'link (org-export-backend-transcoders (org-export-get-backend '11ty)))
   'my-org-11ty-link))
This is part of my Emacs configuration.
View org source for this post

org-attaching the latest image from my Supernote via Browse and Access

Posted: - Modified: | emacs, supernote, org

[2024-09-29 Sun]: Use sketch links when possible. Recolor before cropping so that the grid is removed.

2024-09-26-01 Supernote A5X Browse and Access %23supernote.png
Figure 1: Diagram of different ways to get drawings off my Supernote A5X
Text from sketch

Supernote A5X

  • Screen mirroring (pixelated) -> Puppeteer screenshot (or maybe .mjpeg?)
  • Browse & Access (HTTP) -> latest file: recognize text, recolor, crop, upload?
  • Dropbox/Google Drive (slow) -> batch process: recognize text, recolor, upload

Bonus: Autocropping encourages me to just get stuff out there even if I haven't filled a page

ideas: remove template automatically? I wonder if I can use another color…

2024-09-26-01

I want to quickly get drawings from my Supernote A5X into Emacs so that I can include them in blog posts. Dropbox/Google Drive sync is slow because it synchronizes all the files. The Supernote can mirror its screen as an .mjpeg stream. I couldn't figure out how to grab a frame from that, but I did find out how to use Puppeteer to take an screenshot of the Supernote's screen mirror. Still, the resulting image is a little pixelated. If I turn on Browse and Access, the Supernote can serve directories and files as webpages. This lets me grab the latest file and process it. I don't often have time to fill a full A5 page with thoughts, so autocropping the image encourages me to get stuff out there instead of holding on to things.

(defvar my-supernote-ip-address "192.168.1.221")
(defun my-supernote-get-exported-files ()
  (let ((data (plz 'get (format "http://%s:8089/EXPORT" my-supernote-ip-address)))
        (list))
    (when (string-match "const json = '\\(.*\\)'" data)
      (sort
       (alist-get 'fileList (json-parse-string (match-string 1 data) :object-type 'alist :array-type 'list))
       :key (lambda (o) (alist-get 'date o))
       :lessp 'string<
       :reverse t))))

(defun my-supernote-org-attach-latest-exported-file ()
  (interactive)
  ;; save the file to the screenshot directory
  (let ((info (car (my-supernote-get-exported-files)))
        new-file
        renamed)
    ;; delete matching files
    (setq new-file (expand-file-name
                    (replace-regexp-in-string " " "%20" (alist-get 'name info) (org-attach-dir))))
    (when (file-exists-p new-file)
      (delete-file new-file))
    (org-attach-attach
     (format "http://%s:8089%s" my-supernote-ip-address
             (alist-get 'uri info))
     nil
     'url)
    (setq new-file (my-latest-file (org-attach-dir)))
    ;; recolor
    (my-sketch-recolor-png new-file)
    ;; autocrop that image
    (my-image-autocrop new-file)
    ;; possibly rename
    (setq renamed (my-image-recognize-get-new-filename new-file))
    (when renamed
      (setq renamed (expand-file-name renamed (org-attach-dir)))
      (rename-file new-file renamed t)
      (my-image-store renamed) ; file it in my archive
      (setq new-file renamed))
    ;; use a sketch link if it has an ID
    (if (string-match "^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]-[0-9][0-9] "
                      (file-name-base renamed))
        (org-insert-link nil (concat "sketchFull:" (file-name-base renamed)))
      ;; insert the link
      (org-insert-link nil (concat "attachment:" (replace-regexp-in-string "#" "%23" (file-name-nondirectory new-file)))))
    (org-redisplay-inline-images)))
This is part of my Emacs configuration.
View org source for this post

2024-09-23 Emacs news

| emacs, emacs-news

Links from reddit.com/r/emacs, r/orgmode, r/spacemacs, r/planetemacs, Mastodon #emacs, Hacker News, lobste.rs, programming.dev, lemmy.world, lemmy.ml, communick.news, planet.emacslife.com, YouTube, the Emacs NEWS file, Emacs Calendar, and emacs-devel. Thanks to Andrés Ramírez for emacs-devel links. Do you have an Emacs-related link or announcement? Please e-mail me at sacha@sachachua.com. Thank you!

View org source for this post

Quickly adding face properties to regions

Posted: - Modified: | emacs
  • [2024-09-20 Fri]: Set the first frame of the animated GIF to a reasonable backup image.
  • [2024-09-20 Fri]: Add :init-value nil to the mode.
output-2024-09-20-13:09:17.gif
Figure 1: Screencast of modifying face properties

Sometimes I just want to make some text look a little fancier in the buffer so that I can make a thumbnail or display a message. This my-add-face-text-property function lets me select a region and temporarily change its height, make it bold, or do other things. It will work in text-mode or enriched-mode buffers (not Org Mode or programming buffers like *scratch*, as those do a lot of font-locking).

(defun my-add-face-text-property (start end attribute value)
  (interactive
   (let ((attribute (intern
                     (completing-read
                      "Attribute: "
                      (mapcar (lambda (o) (symbol-name (car o)))
                              face-attribute-name-alist)))))
     (list (point)
           (mark)
           attribute
           (read-face-attribute '(()) attribute))))
  (add-face-text-property start end (list attribute value)))

enriched-mode has some keyboard shortcuts for face attributes (M-o b for bold, M-o i for italic). I can add some keyboard shortcuts for other properties even if they can't be saved in text/enriched format.

(defun my-face-text-larger (start end)
  (interactive "r")
  (add-face-text-property
   start end
   (list :height (floor (+ 50 (car (alist-get :height (get-text-property start 'face) '(100))))))))
(defun my-face-text-smaller (start end)
  (interactive "r")
  (add-face-text-property
   start end
   (list :height (floor (- (car (alist-get :height (get-text-property start 'face) '(100))) 50)))))

What's an easy way to make this keyboard shortcut available during the rare times I want it? I know, maybe I'll make a quick minor mode so I don't have to dedicate those keyboard shortcuts all the time. repeat-mode lets me change the size by repeating just the last keystroke.

  (defvar-keymap my-face-text-property-mode-map
    "M-o p" #'my-add-face-text-property
    "M-o +" #'my-face-text-larger
    "M-o -" #'my-face-text-smaller)
  (define-minor-mode my-face-text-property-mode
    "Make it easy to modify face properties."
    :init-value nil
    (repeat-mode 1))
  (defvar-keymap my-face-text-property-mode-repeat-map
    :repeat t
    "+" #'my-face-text-larger
    "-" #'my-face-text-smaller)
  (dolist (cmd '(my-face-text-larger my-face-text-smaller))
    (put cmd 'repeat-map 'my-face-text-property-mode-repeat-map))
This is part of my Emacs configuration.
View org source for this post

Archiving public toots on my blog

| mastodon, emacs, org

I want to compile my global microblog posts into weekly posts so that they're archived on my blog. It might make sense to make them list items so that I can move them around easily.

my-mastodon-insert-my-statuses-since
(defun my-mastodon-insert-my-statuses-since (date)
  (interactive (list (org-read-date "Since date: ")))
  (insert
   (format "#+begin_toot_archive\n%s\n#+end_toot_archive\n"
           (mapconcat
            (lambda (o)
              (format "- %s\n  #+begin_quote\n  #+begin_export html\n%s\n  #+end_export\n  #+end_quote\n\n"
                      (org-link-make-string (assoc-default 'url o) (assoc-default 'created_at o))
                      (org-ascii--indent-string (assoc-default 'content o) 2))
              ;; (format "#+begin_quote\n#+begin_export html\n%s\n#+end_export\n#+end_quote\n\n%s\n\n"
              ;;        (assoc-default 'content o)
              ;;        (org-link-make-string (assoc-default 'url o) (assoc-default 'created_at o)))
              )
            (seq-filter
             (lambda (o)
               (string= (assoc-default 'visibility o) "public"))
             (my-mastodon-fetch-posts-after
              (format "accounts/%s/statuses?count=40&exclude_reblogs=t&exclude_replies=t" (mastodon-auth--get-account-id))
              date))
            ""))))

Here's a little thing I used to convert a two-level list into my collapsible sections:

my-org-convert-list-to-collapsible-details
(defun my-org-convert-list-to-collapsible-details ()
  (interactive)
  (let ((list (org-list-to-lisp t)))
    (mapc (lambda (o)
            (when (stringp (car o))
              (insert
               (format
                "#+begin_my_details %s :open t\n%s#+end_my_details\n"
                (car o)
                (mapconcat
                 (lambda (s)
                   (concat "- " (string-trim (org-ascii--indent-string (car s) 2)) "\n"))
                 (cdr (cadr o)))))))
          (cdr list))))

And here are my toots from the past week, roughly categorized into collapsible sections:

EmacsConf
  • 2024-09-17T21:55:39.065Z CFP, draft schedule

    The #EmacsConf call for proposals (https://emacsconf.org/2024/cfp/) target date is this Friday (Sept 20), so I've started drafting the schedule. Thanks to my SVG schedule visualization code and function for checking availability constraints from last year ( https://sachachua.com/blog/2023/09/emacsconf-backstage-scheduling-with-svgs/), it took only about an hour to sketch this out: https://emacsconf.org/2024/organizers-notebook/#draft-schedule . We can run two tracks simultaneously and I can also slightly reduce the buffer between talks, so there's plenty of space for more #Emacs talks if people want to propose them or nudge people to propose them. =)

  • 2024-09-15T17:07:41.935Z diversity

    I feel complicated feelings about #EmacsConf and diversity. On one hand, yes, I would love to have a mix of speakers that reflects the mix of interesting stories and people I come across in the #Emacs community. (I wouldn't get rid of or discourage anyone; I just want more! :) )

    On the other hand, preparing and giving a presentation is a lot of work, and I have first-hand appreciation of how difficult it can be to find time to think - much less predict a specific time to have a conversation. (I'm only just beginning to be able to have some thinking time that isn't accompanied by the guilt of letting my kiddo binge-watch YouTube videos or the uncertainties of sacrificing my sleep, and I still rarely schedule anything for myself.)

    In addition, there are little risks that other people might not even have on their radar. All it takes is one person developing a parasocial relationship or fixation, or someone getting grumpy about someone's pronouns or personal characteristics or opinions, and then deciding to go and ruin someone's day (or life)... I'd hate to encourage someone to put themselves out there and end up with that happening to them, even if it's not at all their fault or mine.

    So yeah, it's a little hard for me to reach out. I can deal with impostor syndrome making people feel like they might not have much to say (share what you're learning! We're all figuring things out together), but I'm not so sure about the other concerns. While I'd like to think that in the Emacs community we often have a convivial atmosphere, sometimes it gets weird here too.

    I'm not sure what to do here aside from thinking out loud. I wish I could wave a magic wand and solve some structural issues that could make things more equitable, but that's waaay above my paygrade. I can keep working on figuring out how to make use of fragmented time, and maybe that will help other people too. I like working on the captions for EmacsConf; they help me a lot, too. I can experiment with workflows for sharing what I'm learning in a way that doesn't require a lot of focus time, speech fluency (I occasionally stutter and have to redo things), or a powerful computer. (Emacs is totally my nonlinear video editor.) I can make an indirect request for more people to consider proposing a talk for https://emacsconf.org/2024/cfp/ (target date is Sept 20, but I think the other organizers are considering extending it too), even with all the caveats my anxious brain suggests. (I know, I'm terrible at sales. :) ) And really, EmacsConf isn't important in the grand scheme of things, it's just a fun excuse to get together and connect with other people who like this stuff too. :)

    I wonder how this can be better. Thoughts?

Emacs
  • 2024-09-17T15:07:41.153Z consult-omni

    All right, I just got consult-omni and a Google custom search JSON API key set up so that I can call consult-omni-google, type keywords, pick the correct match, and insert it as an Org Mode link (or linkify the current region). I can think of more tweaks (embark-act on the current word or region to linkify it), but this is already pretty neat.

  • 2024-09-16T14:05:06.263Z - user-init-file

    Is there already an interactive #emacs command for opening user-init-file? I think that could be handy for newbies if we could just tell them to use "M-x visit-user-init-file" or even "Select 'Open init file' from the menu", although I suppose by the time we ask them to fiddle with the init file to add stuff to it, it's fine to encourage them to be comfortable with C-h v user-init-file and then maybe even teach them about M-x ffap at that point. Hmm...

  • 2024-09-16T13:51:12.091Z - casual-symbol-overlay

    Trying out casual-symbol-overlay (http://yummymelon.com/devnull/announcing-casual-symbol-overlay.html) by hooking it into embark-act, which I've bound to `C-.`:

    ```emacs-lisp
    (use-package casual-symbol-overlay
    :if my-laptop-p
    :init
    (with-eval-after-load 'embark
    (keymap-set embark-symbol-map "z" #'casual-symbol-overlay-tmenu)))
    ```

  • 2024-09-11T17:12:03.791Z no nested lists for Org Babel

    TIL that #OrgMode Babel only takes the top level of nested lists passed in via :var (https://orgmode.org/manual/Environment-of-a-Code-Block.html - Note that only the top-level list items are passed along. Nested list items are ignored.) When I try the manual example on my computer, I do indeed get only the top-level list items, unlike the nested data from https://mail.gnu.org/archive/html/emacs-orgmode/2020-10/msg00536.html . Of course, now that makes me want nested lists for both input and output...

Mastodon
Moving to P52
  • 2024-09-18T00:03:09.981Z WhisperX

    Now that I have word-level timestamps from WhisperX (https://sachachua.com/blog/2024/09/using-whisperx-to-get-word-level-timestamps-for-audio-editing-with-emacs-and-subed-record/), I think I'll be able to write an elisp equivalent of the merging/splitting strategies of https://github.com/jianfch/stable-ts?tab=readme-ov-file#regrouping-words to merge subtitles considering gap, duration, length, and maximum number of words.

  • 2024-09-15T17:08:17.757Z upgrade

    (reposting, forgot to make it public)

    I installed a 2TB Crucial T500 NVMe into my Lenovo P52 so that I can try dual-booting into Linux, since it was hard to figure out how I could get all my usual conveniences in WSL.

    A preliminary test with a fresh Kubuntu install showed that my 11ty static blog generation takes about the same time as it does on the X230T, which is a little surprising considering the newer processor and the faster SSD, but maybe I'll have to look for speed gains elsewhere there. I think whisper.cpp is a lot more usable on this computer though, so I'm looking forward to taking advantage of that. The P52 might also make video editing possible, and it might support more modern monitors. It is a fair bit larger and heavier, though. I might end up still using both.

    Anyway, I decided to redo the install by cloning my previous SSD. I want to see if I can skip the step of setting all those things up (although I'll need to redo the Syncthing config, of course). I don't have the extra parts that would let me install the 2.5" SSD from my X230T directly into the P52, but W- has a drive dock that works off USB 2.0. Slow and steady, but that's fine, I can run things overnight. I woke up today to find out that dd doesn't handle extended partitions and needs me to dd them one by one. That's cool, I'll just have that running in the background today.

    If the clone doesn't work or if it's too much trouble to take the clone and give it its own identity, I'll probably wipe it and do another install. Since the X230T is on Kubuntu, I think I'll keep it on Kubuntu as well, to minimize the things I need to keep in my head as I switch between computers. My home directory is in a separate partition, so I can keep it if I want to try something different.

    Now I just have to wait a few hours for these dd commands...

  • 2024-09-11T14:14:55.602Z static blog

    My "plugins is not iterable" issue got fixed when I downgraded `@11ty/eleventy` from `@beta` to `@2.0.1`. Yay, that's one thing off my list!

Other tech stuff
Parenting
  • 2024-09-17T12:28:29.082Z emotion check-in

    I appreciate my kiddo's grade 3 teacher. =) She's currently doing the morning check-in of emotions (how's everyone feeling) using 9 images of Grogu with different facial expressions, which gets the kids (1) laughing, (2) interpreting facial expressions that aren't explicitly labeled, and (3) figuring out what they're feeling.

  • 2024-09-12T12:50:02.014Z pull system

    The kiddo is 8 and I'm developing a better understanding of what "fiercely independent" means. One of the things I'm working on learning is how to shut up and trust the process. =) I've started thinking of it like the pull system of Lean manufacturing principles. Things work out better when I wait for her to ask a question (to pull from me) because at that point, she's ready to hear the answer.

As it turns out, org-list-to-org uses the Org export mechanism, so it quietly discards things like #+begin_export html blocks. I decided to hard-code assumptions about the list's structure instead, which works for now.

View org source for this post

Using WhisperX to get word-level timestamps for audio editing with Emacs and subed-record

Posted: - Modified: | speechtotext, audio, emacs, subed
  • [2024-11-16 Sat]: Removed highlight_words, made max_line_width use the environment variable if specified.
  • [2024-10-14 Mon]: Actually, WhisperX makes a JSON with word-level timing data, so let's use that instead.

I'm gradually shifting more things to this Lenovo P52 to take advantage of its newer processor, 64 GB of RAM, and 2 TB drive. (Whee!) One of the things I'm curious about is how I can make better use of multimedia. I couldn't get whisper.cpp to work on my Lenovo X230T, so I mostly relied on the automatic transcripts from Google Recorder (with timestamps generated by aeneas) or cloud-based transcription services like Deepgram.

I have a lot of silences in my voice notes when I think out loud. whisper.cpp got stuck in loops during silent parts, but WhisperX handles them perfectly. WhisperX is also fast enough for me to handle audio files locally instead of relying on Deepgram. With the default model, I can process the files faster than real-time:

File length Transcription time
42s 17s
7m48s 1m41s

I used this command to get word-level timing data. (Experimenting with options from this post)

MAX_LINE_WIDTH="${MAX_LINE_WIDTH:-50}"
~/vendor/whisperx/.venv/bin/whisperx --model large-v2 --align_model WAV2VEC2_ASR_LARGE_LV60K_960H --compute_type int8 --print_progress True --max_line_width $MAX_LINE_WIDTH --segment_resolution chunk --max_line_count 1 --language en "$@"

Among other things, it makes a text file that looks like this:

I often need to... I sometimes need to replace or navigate by symbols.
Casual symbol overlays a new package that adds those shortcuts so that I don't have to remember the other keywords for them.

and a JSON file that looks like this:

{"segments": [{"start": 0.427, "end": 7.751, "text": " I often need to... I sometimes need to replace or navigate by symbols.", "words": [{"word": "I", "start": 0.427, "end": 0.507, "score": 0.994}, {"word": "often", "start": 0.587, "end": 0.887, "score": 0.856}, {"word": "need", "start": 0.987, "end": 1.227, "score": 0.851}, {"word": "to...", "start": 1.267, "end": 1.508, "score": 0.738}, {"word": "I", "start": 4.329, "end": 4.429, "score": 0.778}, ...]}, ...]}

Sometimes I just want the text so that I can use an audio braindump as the starting point for a blog post or for notes. WhisperX is way more accurate than Google Recorder, so that will probably be easier once I update my workflow for that.

Sometimes I want to make an edited audio file that sounds smooth so that I can use it in a podcast, a video, or some audio notes. For that, I'd like word-level timing data so that I can cut out words or sections. Aeneas didn't give me word-level timestamps, but WhisperX does, so I can get the time information before I start editing. I can extract the word timestamps from the JSON like this:

(defun my-subed-word-tsv-from-whisperx-json (file)
  (interactive "FJSON: ")
  (let* ((json-array-type 'list)
         (json-object-type 'alist)
         (data (json-read-file file))
         (filename (concat (file-name-sans-extension file) ".tsv"))
         (base (seq-mapcat
                (lambda (segment)
                  (seq-map (lambda (word)
                             (let-alist word
                               (list nil
                                     (and .start (* 1000 .start))
                                     (and .end (* 1000 .end))
                                     .word)))
                           (alist-get 'words segment)))
                (alist-get 'segments data)))
         (current base)
         (last-end 0))
     ;; numbers at the end of a sentence sometimes don't end up with times
     ;; so we need to fix them
    (while current
      (unless (elt (car current) 1)           ; start
        (setf (elt (car current) 1) (1+ last-end)))
      (unless (elt (car current) 2)
        (setf (elt (car current) 2) (1- (elt (cadr current) 1))))
      (setq
       last-end (elt (car current) 2)
       current (cdr current)))
    (subed-create-file
     filename
     base
     t
     'subed-tsv-mode)
    (find-file filename)))

Here's my old code for parsing the highlighted VTT or SRT files that underline each word:

(defun my-subed-load-word-data-from-whisperx-highlights (file)
  "Return a list of word cues from FILE.
FILE should be a VTT or SRT file produced by whisperx with the
--highlight_words True option."
  (seq-keep (lambda (sub)
              (when (string-match "<u>\\(.+?\\)</u>" (elt sub 3))
                (setf (elt sub 3) (match-string 1 (elt sub 3)))
                sub))
            (subed-parse-file file)))

(defun my-subed-word-tsv-from-whisperx-highlights (file)
  (interactive "FVTT: ")
  (with-current-buffer (find-file-noselect (concat (file-name-nondirectory file) ".tsv"))
    (erase-buffer)
    (subed-tsv-mode)
    (subed-auto-insert)
    (mapc (lambda (sub) (apply #'subed-append-subtitle nil (cdr sub)))
          (my-subed-load-word-data-from-whisperx-highlights file))
    (switch-to-buffer (current-buffer))))

I like to use the TSV format for this one because it's easy to scan down the right side. Incidentally, this format is compatible with Audacity labels, so I could import that there if I wanted. I like Emacs much more, though. I'm used to having all my keyboard shortcuts at hand.

0.427000	0.507000	I
0.587000	0.887000	often
0.987000	1.227000	need
1.267000	1.508000	to...
4.329000	4.429000	I
4.469000	4.869000	sometimes
4.950000	5.170000	need
5.210000	5.410000	to
5.530000	6.090000	replace

Once I've deleted the words I don't want to include, I can merge subtitles for phrases so that I can keep the pauses between words. A quick heuristic is to merge subtitles if they don't have much of a pause between them.

(defvar my-subed-merge-close-subtitles-threshold 500)
(defun my-subed-merge-close-subtitles (threshold)
  "Merge subtitles with the following one if there is less than THRESHOLD msecs gap between them."
  (interactive (list (read-number "Threshold in msecs: " my-subed-merge-close-subtitles-threshold)))
  (goto-char (point-min))
  (while (not (eobp))
    (let ((end (subed-subtitle-msecs-stop))
          (next-start (save-excursion
                        (and (subed-forward-subtitle-time-start)
                             (subed-subtitle-msecs-stop)))))
      (if (and end next-start (< (- next-start end) threshold))
          (subed-merge-with-next)
        (or (subed-forward-subtitle-end) (goto-char (point-max)))))))

Then I can use subed-waveform-show-all to tweak the start and end timestamps. Here I switch to another file I've been editing…

2024-09-17-12-06-12.svg
Figure 1: Screenshot of subed-waveform

After that, I can use subed-record to compile the audio into an .opus file that sounds reasonably smooth.

I sometimes need to replace or navigate by symbols. casual-symbol-overlay is a package that adds a transient menu so that I don't have to remember the keyboard shortcuts for them. I've added it to my embark-symbol-keymap so I can call it with embark-act. That way it's just a C-. z away.

I want to make lots of quick audio notes that I can shuffle and listen to in order to remember things I'm learning about Emacs (might even come up with some kind of spaced repetition system), and I'd like to make more videos someday too. I think WhisperX, subed, and Org Mode will be fun parts of my workflow.

This is part of my Emacs configuration.
View org source for this post