Speech synthesis has come a long way since I first tried out Emacspeak in 2002. Kokoro TTS and Piper offer more natural-sounding voices now, although the initial delay in loading the models and generating speech mean that they aren't quite ready to completely replace espeak, which is faster but more robotic. I've been using the Kokoro FastAPI through my own functions for working with various speech systems. I wanted to see if I could get Kokoro and other OpenAI-compatible text-to-speech services to work with either speechd-el or Emacspeak just in case I could take advantage of the rich functionality either provides for speech-synthesized Emacs use. speechd-el is easier to layer on top of an existing Emacs if you only want occasional speech, while emacspeak voice-enables many packages to an extent beyond speaking simply what's on the screen.
Speech synthesis is particularly helpful when I'm
learning French because I can use it as a
reference for what a paragraph or sentence should sound
like. It's not perfect. Sometimes it uses liaisons
that my tutor and Google Translate don't use. But
it's a decent enough starting point. I also used
it before to read out IRC mentions and compile
notifications so that I could hear them even if I
was paying attention to a different activity.
Now let's set the language to French so we can read the next line.
Bonjour, je m'appelle Emacs.
Screencast showing speechd-el
There's about a 2-second delay between the command and the start of the audio for the sentence.
Note that speechd-speak-read-sentence fails in some cases where (forward-sentence 1) isn't the same place as (backward-sentence 1) (forward-sentence 1), which can happen when you're in an Org Mode list. I've submitted a patch upstream.
Aside from that, speechd-speak-set-language, speechd-speak-read-paragraph and
speechd-speak-read-region are also useful
commands. I think the latency makes this best-suited for reading paragraphs, or for shadowing sentences for language learning.
I'm still trying to figure out how to get speechd-speak to work as smoothly as I'd like. I think I've got it set up so that the server falls back to espeak for short texts so that it can handle words or characters better, and uses the specified server for longer ones. I'd like to get to the point where it can handle all the things that speechd usually does, like saying lines as I navigate through them or giving me feedback as I'm typing. Maybe it can use espeak for fast feedback character by character and word by word, and then use Kokoro TTS for the full sentence when I finish. Then it will be possible to use it to type things without looking at the screen.
After putting this together, I still find myself leaning towards my own functions because they make it easy to see the generated speech output to a file, which is handy for saving reference audio that I can play on my phone and for making replays almost instant. That could also be useful for pre-generating the next paragraph to make it flow more smoothly. Still, it was interesting making something that is compatible with existing protocols and libraries.
Posting it in case anyone else wants to use it as a starting point. The repository also contains the starting point for an Emacspeak-compatible speech server. See See speechd-ai/README.org for more details.
: Simplified getting a section or finding the bolded text by using the Org Mode format instead.
During the sessions with my French tutor, I share a Google document so that we can mark the words where I need to practice my pronunciation some more or tweak the wording. Using Ctrl+B to make the word as bold is an easy way to make it jump out.
I used to copy these changes into my Org Mode notes manually, but today I thought I'd try automating some of it.
First, I need a script to download the HTML for a specified Google document. This is probably easier to do with the NodeJS library rather than with oauth2.el and url-retrieve-synchronously because of various authentication things.
Je peux consacrer une petite partie de mon budget à des essais, mais je ne veux pas travailler davantage pour rentabiliser une dépense plus importante.
Je n'ai pas le temps de concentration nécessaire pour justifier l'investissement dans mon propre matériel, et sinon, les progrès sont trop rapides pour m'engager dans une configuration spécifique.
J'ai une conscience aiguë des limites cognitives ou physiques à cause des difficultés de santé de ma mère et de ma sœur, et de mes expériences avec mes limitations à cause du fait que je suis la personne principalement en charge de ma fille.
Je lis très vite, mais je n'ai pas assez de patience pour les longs contenus vidéo ou audio.
Je n'aime pas les textes qui contiennent beaucoup de remplissage.
Beaucoup de gens ont une réaction forte contre l'IA pour plusieurs raisons qui incluent le battage médiatique excessif dont elle fait l'objet, son utilisation à mauvais escient, et l'inondation de banalité qu'elle produit.
Je réécris souvent la majorité du logiciel à l'exception d'un ou deux morceaux parce que ce code ne me convient pas.
Je ne veux pas l'utiliser pour les correctifs que je veux soumettre à d'autres projets parce que le code ne me semble pas correct et je ne veux pas gaspiller le temps d'autres bénévoles.
J'aime pouvoir lui donner trois dépôts git et des instructions pour générer un logiciel à partir d'un dépôt pour un autre via le troisième dépôt.
Mais je ne veux pas le publier avant de réécrire et tout comprendre.
Je veux profiter davantage, apprendre davantage avec l'aide de vraies personnes, complétée par l'aide de l'IA.
J'adore les sous-titres simultanés, mais je n'ai pas toujours trouvé une méthode ou un système qui me convienne.
I can then go into the WhisperX transcription JSON file and replay those parts for closer review.
I can also tweak the context function to give me less information. For example, to limit it to the containing phrase, I can do this:
(defunmy-split-string-keep-delimiters (string delimiter)
(when string
(let (results pos)
(with-temp-buffer
(insert string)
(goto-char (point-min))
(setq pos (point-min))
(while (re-search-forward delimiter nil t)
(push (buffer-substring pos (match-beginning 0)) results)
(setq pos (match-beginning 0)))
(push (buffer-substring pos (point-max)) results)
(nreverse results)))))
(ert-deftestmy-split-string-keep-delimiters ()
(should
(equal (my-split-string-keep-delimiters
"Beaucoup de gens ont une réaction forte contre l'IA pour plusieurs raisons qui *incluent* le battage médiatique excessif dont elle fait l'objet, son utilisation à mauvais escient, et *l'inondation de banalité* qu'elle produit."", \\| que \\| qui \\| qu'ils? \\| qu'elles? \\| qu'on "
)
)))
(defunmy-lang-words-for-review-phrase-context (&optional s)
(setq s (replace-regexp-in-string " "" " (or s (sentence-at-point))))
(string-join
(seq-filter (lambda (s) (string-match "\\*" s))
(my-split-string-keep-delimiters s ", \\| parce que \\| que \\| qui \\| qu'ils? \\| qu'elles? \\| qu'on \\| pour "))
" ... "))
(ert-deftestmy-lang-words-for-review-phrase-context ()
(should
(equal (my-lang-words-for-review-phrase-context
"Je peux consacrer une petite partie de mon *budget* à des essais, mais je ne veux pas travailler davantage pour rentabiliser une dépense plus importante.")
"Je peux consacrer une petite partie de mon *budget* à des essais")))
Je peux consacrer une petite partie de mon budget à des essais
, et sinon
J'ai une conscience aiguë des limites cognitives ou physiques à cause des difficultés de santé de ma mère et de ma sœur
pour les longs contenus vidéo ou audio.
Je n'aime pas les textes qui contiennent beaucoup de remplissage.
qui incluent le battage médiatique excessif dont elle fait l'objet … , et l'inondation de banalité
Je réécris souvent la majorité du logiciel à l'exception d'un ou deux morceaux
pour les correctifs … parce que le code ne me semble pas correct et je ne veux pas gaspiller le temps d'autres bénévoles.
pour un autre via le troisième dépôt.
Mais je ne veux pas le publier avant de réécrire et tout comprendre.
, je pourrais peut-être apprendre plus lentement avec l'aide d'Internet
, apprendre davantage avec l'aide de vraies personnes, complétée par l'aide de l'IA.
qui me convienne.
Now that I have a function for retrieving the HTML or Org Mode for a section, I can use that to wdiff against my current text to more easily spot wording changes.
If you use kubernetes-el, don't update for now, and you might want to check your installation if you updated it recently. The repo was compromised a few days ago.
I've occasionally wanted to tangle a single Org Mode source block to multiple places, so I'm glad to hear that ob-tangle has just added support for multiple targets. Niche, but could be handy. I'm also curious about using clime to write command-line tools in Emacs Lisp that handle argument parsing and all the usual stuff.
If you're looking for something to write about, why not try this month's Emacs Carnival theme of mistakes and misconceptions?
Yasnippet is a template system for Emacs. I want to use it by voice. I'd like to be able to say things like "Okay, define interactive function" and have that expand to a matching snippet in Emacs or other applications. Here's a quick demonstration of expanding simple snippets:
Screencast of expanding snippets by voice in Emacs and in other applications
Transcript
00:00 So I've defined some yasnippets with names that I can say. Here, for example, in this menu, you can see I've got "define interactive function" and "with a buffer that I'll display." And in fundamental mode, I have some other things too. Let's give it a try.
00:19 I press my shortcut. "Okay, define an interactive function." You can see that this is a yasnippet. Tab navigation still works.
00:33 I can say, "OK, with a buffer that I'll display," and it expands that also.
00:45 I can expand snippets in other applications as well, thanks to a global keyboard shortcut.
00:50 Here, for example, I can say, "OK, my email." It inserts my email address.
01:02 Yasnippet definitions can also execute Emacs Lisp. So I can say, "OK, date today," and have that evaluated to the actual date.
01:21 So that's an example of using voice to expand snippets.
This code relies on my fork of whisper.el, which lets me specify a list of functions for whisper-insert-text-at-point. (I haven't asked for upstream review yet because I'm still testing things, and I don't know if it actually works for anyone else yet.) It does approximate matching on the snippet name using a function from subed-word-data.el which just uses string-distance. I could probably duplicate the function in my config, but then I'd have to update it in two places if I come up with more ideas.
The code for inserting into other functions is defined in my-whisper-maybe-type, which is very simple:
(defunmy-whisper-maybe-type (text)
"If Emacs is not the focused app, simulate typing TEXT.Add this function to `whisper-insert-text-at-point'."
(when text
(if (frame-focus-state)
text
(make-process :name"xdotool":command
(list "xdotool""type"
text))
nil)))
Someday I'd like to provide alternative names for snippets. I also want to make it easy to fill in snippet fields by voice. I'd love to be able to answer minibuffer questions from yas-choose-value, yas-completing-read, and other functions by voice too. Could be fun!
Emacs has far too many keyboard shortcuts for me to remember, so I use which-key to show me a menu if I pause for too long and which-key-posframe to put it somewhere close to my cursor.
I've used which-key-replacement-alist to rewrite the function names and re-sort the order to make them a little easier to scan, but that doesn't cover the case where you've defined an anonymous function ((lambda ...)) for those quick one-off commands. It just displays "function".
Pedro A. Aranda Gutiérrez wanted to share this tip about defining hints by using cons. Here's his example:
Hello folks! Last month's Emacs Carnival about completion had 17 posts (nice!), and Philip Kaludercic is hosting this month's Emacs Carnival: Mistakes and Misconceptions. Looking forward to reading your thoughts!
Sometimes I miss things, so if you wrote something and you don't see it here, please let me know! Please e-mail me at sacha@sachachua.com or DM me via Mastodon with a link to your post(s).
If you like the idea but didn't get something together in time for February, it's never too late. Even if you come across this years later, feel free to write about the topic if it inspires you. I'd love to include a link to your notes in Emacs News.
I added a ton of links from the Emacs News archives to the Resources and Ideas section, so check those out too.
I had a lot of fun learning together with everyone. I already have a couple of ideas for March's Emacs Carnival theme of Mistakes and Misconceptions (thanks to Philip Kaludercic for hosting!), and I can't wait to see what people will come up with next!