Categories: geek

RSS - Atom - Subscribe via email

Small steps towards using OpenAI-compatible text-to-speech services with speechd-el or emacspeak

| emacs

Speech synthesis has come a long way since I first tried out Emacspeak in 2002. Kokoro TTS and Piper offer more natural-sounding voices now, although the initial delay in loading the models and generating speech mean that they aren't quite ready to completely replace espeak, which is faster but more robotic. I've been using the Kokoro FastAPI through my own functions for working with various speech systems. I wanted to see if I could get Kokoro and other OpenAI-compatible text-to-speech services to work with either speechd-el or Emacspeak just in case I could take advantage of the rich functionality either provides for speech-synthesized Emacs use. speechd-el is easier to layer on top of an existing Emacs if you only want occasional speech, while emacspeak voice-enables many packages to an extent beyond speaking simply what's on the screen.

Speech synthesis is particularly helpful when I'm learning French because I can use it as a reference for what a paragraph or sentence should sound like. It's not perfect. Sometimes it uses liaisons that my tutor and Google Translate don't use. But it's a decent enough starting point. I also used it before to read out IRC mentions and compile notifications so that I could hear them even if I was paying attention to a different activity.

Here's a demonstration of speechd reading out the following lines using the code I've just uploaded to https://codeberg.org/sachac/speechd-ai:

  • The quick brown fox jumps over the lazy dog.
  • Now let's set the language to French so we can read the next line.
  • Bonjour, je m'appelle Emacs.

Screencast showing speechd-el

There's about a 2-second delay between the command and the start of the audio for the sentence.

Note that speechd-speak-read-sentence fails in some cases where (forward-sentence 1) isn't the same place as (backward-sentence 1) (forward-sentence 1), which can happen when you're in an Org Mode list. I've submitted a patch upstream.

Aside from that, speechd-speak-set-language, speechd-speak-read-paragraph and speechd-speak-read-region are also useful commands. I think the latency makes this best-suited for reading paragraphs, or for shadowing sentences for language learning.

I'm still trying to figure out how to get speechd-speak to work as smoothly as I'd like. I think I've got it set up so that the server falls back to espeak for short texts so that it can handle words or characters better, and uses the specified server for longer ones. I'd like to get to the point where it can handle all the things that speechd usually does, like saying lines as I navigate through them or giving me feedback as I'm typing. Maybe it can use espeak for fast feedback character by character and word by word, and then use Kokoro TTS for the full sentence when I finish. Then it will be possible to use it to type things without looking at the screen.

After putting this together, I still find myself leaning towards my own functions because they make it easy to see the generated speech output to a file, which is handy for saving reference audio that I can play on my phone and for making replays almost instant. That could also be useful for pre-generating the next paragraph to make it flow more smoothly. Still, it was interesting making something that is compatible with existing protocols and libraries.

Posting it in case anyone else wants to use it as a starting point. The repository also contains the starting point for an Emacspeak-compatible speech server. See See speechd-ai/README.org for more details.

https://codeberg.org/sachac/speechd-ai

View Org source for this post

Emacs Lisp and NodeJS: Getting the bolded words from a section of a Google Document

Posted: - Modified: | french, js, emacs
  • : Cleaned up links from Google
  • : Simplified getting a section or finding the bolded text by using the Org Mode format instead.

During the sessions with my French tutor, I share a Google document so that we can mark the words where I need to practice my pronunciation some more or tweak the wording. Using Ctrl+B to make the word as bold is an easy way to make it jump out.

I used to copy these changes into my Org Mode notes manually, but today I thought I'd try automating some of it.

First, I need a script to download the HTML for a specified Google document. This is probably easier to do with the NodeJS library rather than with oauth2.el and url-retrieve-synchronously because of various authentication things.

require('dotenv').config();
const { google } = require('googleapis');

async function download(fileId) {
  const auth = new google.auth.GoogleAuth({
    scopes: ['https://www.googleapis.com/auth/drive.readonly'],
  });
  const drive = google.drive({ version: 'v3', auth });
  const htmlRes = await drive.files.export({
    fileId: fileId,
    mimeType: 'text/html'
  });
  return htmlRes.data;
}

async function main() {
  console.log(await download(process.argv.length > 2 ? process.argv[2] : process.env['DOC_ID']));
}

main();

Then I can wrap a little bit of Emacs Lisp around it.

(defvar my-google-doc-download-command
  (list "nodejs" (expand-file-name "~/bin/download-google-doc-html.cjs")))

(defun my-google-doc-html (doc-id)
  (when (string-match "https://docs\\.google\\.com/document/d/\\(.+?\\)/" doc-id)
    (setq doc-id (match-string 1 doc-id)))
  (with-temp-buffer
    (apply #'call-process (car my-google-doc-download-command)
           nil t nil (append (cdr my-google-doc-download-command) (list doc-id)))
    (buffer-string)))

(defun my-google-doc-clean-html (html)
  "Remove links on spaces, replace Google links."
  (let ((dom (with-temp-buffer
               (insert html)
               (libxml-parse-html-region))))
    (dom-search
     dom
     (lambda (o)
       (when (eq (dom-tag o) 'a)
         (when (and (dom-attr o 'href)
                    (string-match "https://\\(www\\.\\)?google\\.com/url\\?q=" (dom-attr o 'href)))
           (let* ((parsed (url-path-and-query
                           (url-generic-parse-url (dom-attr o 'href))))
                  (params (url-parse-query-string (cdr parsed))))
             (dom-set-attribute o 'href (car (assoc-default "q" params #'string=)))))
         (let ((text (string= (string-trim (dom-text o)) "")))
           (when (string= text "")
             (setf (car o) 'span))))
       (when (and
              (string-match "font-weight:700" (or (dom-attr o 'style) ""))
              (not (string-match "font-style:normal" (or (dom-attr o 'style) ""))))
         (setf (car o) 'strong))
       (when (dom-attr o 'style)
         (dom-remove-attribute o 'style))))
    ;; bold text is actually represented as font-weight:700 instead
    (with-temp-buffer
      (svg-print dom)
      (buffer-string))))

(defun my-google-doc-org (doc-id)
  "Return DOC-ID in Org Mode format."
  (pandoc-convert-stdio (my-google-doc-clean-html (my-google-doc-html doc-id)) "html" "org"))

I have lots of sections in that document, including past journal entries, so I want to get a specific section by name.

(defun my-org-get-subtree-by-name (org-text heading-name)
  "Return ORG-TEXT subtree for HEADING-NAME."
  (with-temp-buffer
    (insert org-text)
    (org-mode)
    (goto-char (point-min))
    (let ((org-trust-scanner-tags t))
      (car (delq nil
                 (org-map-entries
                  (lambda ()
                    (when (string= (org-entry-get (point) "ITEM") heading-name)
                      (buffer-substring (point) (org-end-of-subtree))))))))))

Now I can get the bolded words from a section of my notes, with just a sentence for context. I use pandoc to convert it to Org Mode syntax.

(defvar my-lang-words-for-review-context-function 'sentence-at-point)
(defvar my-lang-tutor-notes-url nil)
(defun my-lang-tutor-notes (section-name)
  (my-org-get-subtree-by-name
   (my-google-doc-org my-lang-tutor-notes-url)
   section-name))

(defun my-lang-words-for-review (section)
  "List the bolded words for review in SECTION."
  (let* ((section (my-lang-tutor-notes section))
         results)
    (with-temp-buffer
      (insert section)
      (org-mode)
      (goto-char (point-min))
      (org-map-entries
       (lambda ()
         (org-end-of-meta-data t)
         (while (re-search-forward "\\*[^* ].*?\\*" nil t)
           (cl-pushnew
            (replace-regexp-in-string
             "[ \n ]+" " "
             (funcall my-lang-words-for-review-context-function))
            results
            :test 'string=)))))
    (nreverse results)))

For example, when I run it on my notes on artificial intelligence, this is the list of bolded words and the sentences that contain them.

(my-lang-words-for-review "Sur l'intelligence artificielle")
  • Je l'ai aussi utilisée pour faire des recherches.
  • Je peux consacrer une petite partie de mon budget à des essais, mais je ne veux pas travailler davantage pour rentabiliser une dépense plus importante.
  • Je n'ai pas le temps de concentration nécessaire pour justifier l'investissement dans mon propre matériel, et sinon, les progrès sont trop rapides pour m'engager dans une configuration spécifique.
  • J'ai une conscience aiguë des limites cognitives ou physiques à cause des difficultés de santé de ma mère et de ma sœur, et de mes expériences avec mes limitations à cause du fait que je suis la personne principalement en charge de ma fille.
  • Je lis très vite, mais je n'ai pas assez de patience pour les longs contenus vidéo ou audio.
  • Je n'aime pas les textes qui contiennent beaucoup de remplissage.
  • Beaucoup de gens ont une réaction forte contre l'IA pour plusieurs raisons qui incluent le battage médiatique excessif dont elle fait l'objet, son utilisation à mauvais escient, et l'inondation de banalité qu'elle produit.
  • Je réécris souvent la majorité du logiciel à l'exception d'un ou deux morceaux parce que ce code ne me convient pas.
  • Je ne veux pas l'utiliser pour les correctifs que je veux soumettre à d'autres projets parce que le code ne me semble pas correct et je ne veux pas gaspiller le temps d'autres bénévoles.
  • J'aime pouvoir lui donner trois dépôts git et des instructions pour générer un logiciel à partir d'un dépôt pour un autre via le troisième dépôt.
  • Mais je ne veux pas le publier avant de réécrire et tout comprendre.
  • Sans l'IA, je pourrais peut-être apprendre plus lentement avec l'aide d'Internet, qui a beaucoup de ressources commehttps://vitrinelinguistique.oqlf.gouv.qc.ca/Vitrine linguistique.
  • Je veux profiter davantage, apprendre davantage avec l'aide de vraies personnes, complétée par l'aide de l'IA.
  • J'adore les sous-titres simultanés, mais je n'ai pas toujours trouvé une méthode ou un système qui me convienne.

I can then go into the WhisperX transcription JSON file and replay those parts for closer review.

I can also tweak the context function to give me less information. For example, to limit it to the containing phrase, I can do this:

(defun my-split-string-keep-delimiters (string delimiter)
  (when string
    (let (results pos)
      (with-temp-buffer
        (insert string)
        (goto-char (point-min))
        (setq pos (point-min))
        (while (re-search-forward delimiter nil t)
          (push (buffer-substring pos (match-beginning 0)) results)
          (setq pos (match-beginning 0)))
        (push (buffer-substring pos (point-max)) results)
        (nreverse results)))))

(ert-deftest my-split-string-keep-delimiters ()
 (should
  (equal (my-split-string-keep-delimiters
          "Beaucoup de gens ont une réaction forte contre l'IA pour plusieurs raisons qui *incluent* le battage médiatique excessif dont elle fait l'objet, son utilisation à mauvais escient, et *l'inondation de banalité* qu'elle produit."
          ", \\| que \\| qui \\| qu'ils? \\| qu'elles? \\| qu'on "
          )
 )))

(defun my-lang-words-for-review-phrase-context (&optional s)
  (setq s (replace-regexp-in-string " " " " (or s (sentence-at-point))))
  (string-join
   (seq-filter (lambda (s) (string-match "\\*" s))
               (my-split-string-keep-delimiters s ", \\| parce que \\| que \\| qui \\| qu'ils? \\| qu'elles? \\| qu'on \\| pour "))
   " ... "))

(ert-deftest my-lang-words-for-review-phrase-context ()
  (should
   (equal (my-lang-words-for-review-phrase-context
           "Je peux consacrer une petite partie de mon *budget* à des essais, mais je ne veux pas travailler davantage pour rentabiliser une dépense plus importante.")
          "Je peux consacrer une petite partie de mon *budget* à des essais")))
(let ((my-lang-words-for-review-context-function 'my-lang-words-for-review-phrase-context))
  (my-lang-words-for-review "Sur l'intelligence artificielle"))
  • pour faire des recherches.
  • Je peux consacrer une petite partie de mon budget à des essais
  • , et sinon
  • J'ai une conscience aiguë des limites cognitives ou physiques à cause des difficultés de santé de ma mère et de ma sœur
  • pour les longs contenus vidéo ou audio.
  • Je n'aime pas les textes qui contiennent beaucoup de remplissage.
  • qui incluent le battage médiatique excessif dont elle fait l'objet … , et l'inondation de banalité
  • Je réécris souvent la majorité du logiciel à l'exception d'un ou deux morceaux
  • pour les correctifs … parce que le code ne me semble pas correct et je ne veux pas gaspiller le temps d'autres bénévoles.
  • pour un autre via le troisième dépôt.
  • Mais je ne veux pas le publier avant de réécrire et tout comprendre.
  • , je pourrais peut-être apprendre plus lentement avec l'aide d'Internet
  • , apprendre davantage avec l'aide de vraies personnes, complétée par l'aide de l'IA.
  • qui me convienne.

Now that I have a function for retrieving the HTML or Org Mode for a section, I can use that to wdiff against my current text to more easily spot wording changes.

(defun my-lang-tutor-notes-wdiff-org ()
  (interactive)
  (let ((section (org-entry-get (point) "ITEM")))
    (my-wdiff-strings
     (replace-regexp-in-string
      " " " "
      (my-org-subtree-text-without-blocks))
     (replace-regexp-in-string
      " " " "
      (my-lang-tutor-notes section)))))

Related:

Screenshot:

2026-03-12_11-28-24.png
Figure 1: wdiff
This is part of my Emacs configuration.
View Org source for this post

2026-03-09 Emacs news

| emacs, emacs-news

If you use kubernetes-el, don't update for now, and you might want to check your installation if you updated it recently. The repo was compromised a few days ago.

I've occasionally wanted to tangle a single Org Mode source block to multiple places, so I'm glad to hear that ob-tangle has just added support for multiple targets. Niche, but could be handy. I'm also curious about using clime to write command-line tools in Emacs Lisp that handle argument parsing and all the usual stuff.

If you're looking for something to write about, why not try this month's Emacs Carnival theme of mistakes and misconceptions?

Enjoy!

Links from reddit.com/r/emacs, r/orgmode, r/spacemacs, Mastodon #emacs, Bluesky #emacs, Hacker News, lobste.rs, programming.dev, lemmy.world, lemmy.ml, planet.emacslife.com, YouTube, the Emacs NEWS file, Emacs Calendar, and emacs-devel. Thanks to Andrés Ramírez for emacs-devel links. Do you have an Emacs-related link or announcement? Please e-mail me at sacha@sachachua.com. Thank you!

View Org source for this post

Expanding yasnippets by voice in Emacs and other applications

| emacs, audio, speech-recognition

Yasnippet is a template system for Emacs. I want to use it by voice. I'd like to be able to say things like "Okay, define interactive function" and have that expand to a matching snippet in Emacs or other applications. Here's a quick demonstration of expanding simple snippets:

Screencast of expanding snippets by voice in Emacs and in other applications

Transcript
  • 00:00 So I've defined some yasnippets with names that I can say. Here, for example, in this menu, you can see I've got "define interactive function" and "with a buffer that I'll display." And in fundamental mode, I have some other things too. Let's give it a try.
  • 00:19 I press my shortcut. "Okay, define an interactive function." You can see that this is a yasnippet. Tab navigation still works.
  • 00:33 I can say, "OK, with a buffer that I'll display," and it expands that also.
  • 00:45 I can expand snippets in other applications as well, thanks to a global keyboard shortcut.
  • 00:50 Here, for example, I can say, "OK, my email." It inserts my email address.
  • 01:02 Yasnippet definitions can also execute Emacs Lisp. So I can say, "OK, date today," and have that evaluated to the actual date.
  • 01:21 So that's an example of using voice to expand snippets.

This is handled by the following code:

(defun my-whisper-maybe-expand-snippet (text)
  "Add to `whisper-insert-text-at-point'."
  (if (and text
           (string-match
            "^ok\\(?:ay\\)?[,\\.]? \\(.+\\)" text))
    (let* ((name
            (downcase
             (string-trim
              (replace-regexp-in-string "[,\\.]" "" (match-string 1 text)))))
           (matching
            (seq-find (lambda (o)
                        (subed-word-data-compare-normalized-string-distance
                         name
                         (downcase (yas--template-name o))))
                      (yas--all-templates (yas--get-snippet-tables)))))
      (if matching
          (progn
            (if (frame-focus-state)
                (progn
                  (yas-expand-snippet matching)
                  nil)
              ;; In another application
              (with-temp-buffer
                (yas-minor-mode)
                (yas-expand-snippet matching)
                (buffer-string))))
        text))
    text))

This code relies on my fork of whisper.el, which lets me specify a list of functions for whisper-insert-text-at-point. (I haven't asked for upstream review yet because I'm still testing things, and I don't know if it actually works for anyone else yet.) It does approximate matching on the snippet name using a function from subed-word-data.el which just uses string-distance. I could probably duplicate the function in my config, but then I'd have to update it in two places if I come up with more ideas.

The code for inserting into other functions is defined in my-whisper-maybe-type, which is very simple:

(defun my-whisper-maybe-type (text)
  "If Emacs is not the focused app, simulate typing TEXT.
Add this function to `whisper-insert-text-at-point'."
  (when text
    (if (frame-focus-state)
        text
      (make-process :name "xdotool" :command
                    (list "xdotool" "type"
                          text))
      nil)))

Someday I'd like to provide alternative names for snippets. I also want to make it easy to fill in snippet fields by voice. I'd love to be able to answer minibuffer questions from yas-choose-value, yas-completing-read, and other functions by voice too. Could be fun!

Related:

This is part of my Emacs configuration.
View Org source for this post

Emacs Lisp: defvar-keymap hints for which-key

| emacs

Emacs has far too many keyboard shortcuts for me to remember, so I use which-key to show me a menu if I pause for too long and which-key-posframe to put it somewhere close to my cursor.

(use-package which-key :init (which-key-mode 1))
(use-package which-key-posframe :init (which-key-posframe-mode 1))

I've used which-key-replacement-alist to rewrite the function names and re-sort the order to make them a little easier to scan, but that doesn't cover the case where you've defined an anonymous function ((lambda ...)) for those quick one-off commands. It just displays "function".

Pedro A. Aranda Gutiérrez wanted to share this tip about defining hints by using cons. Here's his example:

(defun insert-surround (opening &optional closing)
 "Insert OPENING and CLOSING and place the cursor before CLOSING.

Default CLOSING is \"}\"."
  (insert opening)
  (save-excursion
    (insert (or closing "}"))))

(defvar-keymap tex-format-map
  :doc "My keymap for text formatting"
  "-"  (cons "under" (lambda() (interactive) (insert-surround
"\\underline{")))
  "b"  (cons "bold"  (lambda() (interactive) (insert-surround "\\textbf{")))
  "e"  (cons "emph"  (lambda() (interactive) (insert-surround "\\emph{")))
  "i"  (cons "ital"  (lambda() (interactive) (insert-surround "\\textit{")))
  "m"  (cons "math"  (lambda() (interactive) (insert-surround "$" "$")))
  "s"  (cons "sans"  (lambda() (interactive) (insert-surround "\\textsf{")))
  "t"  (cons "tty" (lambda() (interactive) (insert-surround "\\texttt{")))
  "v"  (cons "Verb"  (lambda() (interactive) (insert-surround "\\Verb{")))
  "S"  (cons "small" (lambda() (interactive) (insert-surround "\\small{"))))
(fset 'tex-format-map tex-format-map)

Let's try it out:

(with-eval-after-load 'tex-mode
  (keymap-set tex-mode-map "C-c t" 'tex-format-map))
2026-03-02_14-45-51.png
Figure 1: Screenshot with hints

This works for named functions as well. Here's how I've updated my config:

(defvar-keymap my-french-map
  "l" (cons "🔍 lookup" #'my-french-lexique-complete-word)
  "w" (cons "📚 wordref" #'my-french-wordreference-lookup)
  "c" (cons "✏️ conj" #'my-french-conjugate)
  "f" (cons "🇫🇷 fr" #'my-french-consult-en-fr)
  "s" (cons "🗨️ say" #'my-french-say-word-at-point)
  "t" (cons "🇬🇧 en" #'my-french-translate-dwim))
(fset 'my-french-map my-french-map)

(with-eval-after-load 'org
  (keymap-set org-mode-map "C-," 'my-french-map)
  (keymap-set org-mode-map "C-c u" 'my-french-map))
2026-03-02_13-57-23.png
Figure 2: Before: Without the cons, which-key uses the full function name
2026-03-02_14-42-42.png
Figure 3: After: Might be easier to skim?

In case you're adding to an existing keymap, you can use keymap-set with cons.

(keymap-set my-french-map "S" (cons "sentence" #'my-french-say-sentence-at-point))

This is also different from the :hints that show up in the minibuffer when you have a repeating map. Those are defined like this:

(defvar-keymap my-french-map
  :repeat (:hints ((my-french-lexique-complete-word . "lookup")
                   (my-french-consult-en-fr . "fr")
                   (my-french-translate-dwim . "en")
                   (my-french-say-word-at-point . "say")))
  "l" (cons "🔍 lookup" #'my-french-lexique-complete-word)
  "w" (cons "📚 wordref" #'my-french-wordreference-lookup)
  "c" (cons "✏️ conj" #'my-french-conjugate)
  "f" (cons "🇫🇷 fr" #'my-french-consult-en-fr)
  "s" (cons "🗨️ say" #'my-french-say-word-at-point)
  "t" (cons "🇬🇧 en" #'my-french-translate-dwim))

and those appear in the minibuffer like this:

2026-03-02_13-59-42.png
Figure 4: Minibuffer repeat hints

Menus in Emacs are also keymaps, but the labels work differently. These ones are defined with easymenu.

(with-eval-after-load 'org
(define-key-after
 org-mode-map
 [menu-bar french-menu]
 (cons "French"
       (easy-menu-create-menu
        "French"
        '(["🕮Grammalecte" my-flycheck-grammalecte-setup t]
          ["✓Gptel" my-lang-gptel-flycheck-setup t]
          ["🎤Subed-record" my-french-prepare-subed-record t])))
 'org))

Using your own hints is like leaving little breadcrumbs for yourself.

Thanks to Pedro for the tip!

View Org source for this post

2026-03-02 Emacs news

| emacs, emacs-news

Hello folks! Last month's Emacs Carnival about completion had 17 posts (nice!), and Philip Kaludercic is hosting this month's Emacs Carnival: Mistakes and Misconceptions. Looking forward to reading your thoughts!

Links from reddit.com/r/emacs, r/orgmode, r/spacemacs, Mastodon #emacs, Bluesky #emacs, Hacker News, lobste.rs, programming.dev, lemmy.world, lemmy.ml, planet.emacslife.com, YouTube, the Emacs NEWS file, Emacs Calendar, and emacs-devel. Thanks to Andrés Ramírez for emacs-devel links. Do you have an Emacs-related link or announcement? Please e-mail me at sacha@sachachua.com. Thank you!

View Org source for this post

Emacs Carnival Feb 2026 wrap-up: Completion

| emacs

Check out all the wonderful entries people sent in for the Emacs Carnival Feb 2026 theme of Completion:

Also, this one about completing the loop:

Sometimes I miss things, so if you wrote something and you don't see it here, please let me know! Please e-mail me at sacha@sachachua.com or DM me via Mastodon with a link to your post(s). If you like the idea but didn't get something together in time for February, it's never too late. Even if you come across this years later, feel free to write about the topic if it inspires you. I'd love to include a link to your notes in Emacs News.

I added a ton of links from the Emacs News archives to the Resources and Ideas section, so check those out too.

I had a lot of fun learning together with everyone. I already have a couple of ideas for March's Emacs Carnival theme of Mistakes and Misconceptions (thanks to Philip Kaludercic for hosting!), and I can't wait to see what people will come up with next!

View Org source for this post