Categories: geek » emacs

View topic page - RSS - Atom - Subscribe via email
Recommended links:

Emacs Carnival February 2026: Completion

| stream, emacs

For the Emacs Carnival theme for February, let's learn more about completion together. There are all sorts of cheesy puns one can make about completion and Emacs and Valentine's Day, like "You complete me," but beyond the jokes, it's actually a really good topic to help us work with Emacs more efficiently.

First, what's the Emacs Carnival?

From Christian Tietze:

A blog carnival is a fun way to tie together a community with shared writing prompts, and marvel at all the creative interpretations of the topic of the month.

You can get a sense of previous Emacs Carnivals by checking out the previous ones:

Month Host Topic
June 2025 ctietze "Take Two"
July gnewman "Writing Experience"
August takeonrules "Your Elevator Pitch for Emacs"
September rodiongoritskov "Obscure packages"
October AndyDrop "Maintenance, server or home or garden"
November donaldh "An ode to org-babel"
December GeorgeJones "The People of Emacs"
January 2026 ctietze "This year, I'll…"

You don't have to be an expert in order to post. In fact, this is a great way for all of us (beginners and otherwise) to focus on a topic together. Let's treat it like a kind of book club where we can share our notes as we learn.

What do we mean by completion in Emacs?

Completion can make it faster to enter text and to reduce errors. You can use it to find Emacs commands even if you don't know their full names or keyboard shortcuts. You can use it to expand abbreviations or even fix the typos you usually make. You can use it when you code and when you write. I've heard some people define common abbreviations across different programming languages so they don't have to remember the differences between syntaxes, and minibuffer-completion-based interfaces like consult-ripgrep let you flip through search results astoundingly quickly.

Let's start by talking about two types of completion:

  • minibuffer completion, which happens in the small window at the bottom of the screen whenever you use M-x, find a file, etc. This is where you can type a little and then find matching options so that you don't have to remember the full names of commands or files. For lots of tips, check out Understanding Minibuffer Completion - Mastering Emacs.

    For example, here's my minibuffer for M-x using vertico for the display and marginalia for annotations on the side:

    2026-01-30_12-44-23.png
    Figure 1: Screenshot of minibuffer completion
  • in-buffer completion, like when you expand an abbreviation, insert a snippet, or fill in the rest of a variable name.

    2026-01-30_14-17-45.png
    Figure 2: Screenshot of in-buffer completion

Here are some ideas for things to explore. Pick an idea or come up with your own and write a post sharing what you're figuring out!

  • Minibuffer completion
    • Do you know about S-M-x (execute-extended-command-for-buffer - available with Emacs 28.1 or higher), which suggests commands relevant to the current mode?
    • Have you gotten the hang of using M-p to go back through your history? (Did you know you can interactively search through that history with C-s and C-r?)
    • Do you know about using M-n to go into the future history?
    • Have you tried saving your minibuffer history with savehist?
    • Do you want to experiment with recursive minibuffers so that you can do something else in the middle of a completion?
    • Do you have nicer completion set up, like icomplete-vertical-mode, fido-mode or fido-vertical-mode, ido-mode or ido-vertical-mode, ivy, or vertico? This makes things like M-x (execute-extended-command) and M-y (yank-pop) soo much nicer!
    • Have you experimented with other completion styles like orderless so that you can type parts of the completion name in any order?
    • Have you checked out the convenient search and navigation commands in more complex completion frameworks like consult, counsel, or helm?
    • Have you experimented with other sort orders like the built-in historical option or more complex sorts with prescient.el?
    • Do you want to see additional information when you're choosing completions? Try out marginalia.
    • Have you checked out embark for doing other things with your completion like inserting a file name instead of opening it, or changing the command that you wanted to do, or acting on multiple items?
    • If you use Org Mode, do you want to make your own custom Org link type with completion? (I really like being able to quickly link to blog posts, parts of my config, or project files with completion)
    • Do you want to define your own completion commands, maybe even with previews, dynamic collections or asynchronous data?
  • In-buffer completion
    • Have you set up your own abbreviations to fix common typos or expand text quickly?
    • Have you tried using dabbrev-expand to expand words based on what you have in the current buffer or in other buffers?
    • Do you want to try hippie-expand to try different functions for expansion?
    • Have you defined your own snippets for prose or code? (Yasnippet is popular.)
    • Have you tried icomplete-in-buffer, corfu, company, or some other in-buffer completion framework?
      • If you use Yasnippet and you've just added completion at point, have you added your snippets to the completions with something like yasnippet-capf?
    • Do you want context-sensitive completions for your shell commands in Emacs? Try pcomplete - you can even define your own.
    • If you code, do you have LSP, Eglot, or something similar set up to offer you completions in your programming languages?
  • Meta: What else can you bring into Emacs so that you can take advantage of all the completions that you've set up, like note-taking or e-mail? (Ex: mastodon.el + org-contacts + a little code to insert a Mastodon handle with completion = I can think of people by name instead of by handle!)

Things I want to learn about

For example, this month, I want to…

  • Minibuffer:
    • Figure out some kind of approximate speech-based minibuffer completion for commands
    • Create a custom Org Mode link type for emacswiki and other things I refer to frequently
    • Write about the completion functions I'm using to help me learn French
  • In-buffer completion:
    • Notice where I keep typing the same kinds of things and define more snippets for them
    • Figure out some kind of speech interface for expanding snippets
    • Sort out completion in programming buffers so that I can finally take advantage of LSP
    • Complete French words in-buffer ignoring accented characters
  • Organize tons of completion-related links from Emacs News onto EmacsWiki: Category Completion and other pages
  • Revisit the completion-related code in my config to dust off things that I can update, remember to use, or document with gif-screencast

I'll publish my notes on my blog and I'll add them to this post as well. I'd love to check out your notes too!

How to submit your entry/entries

Please e-mail me at sacha@sachachua.com or DM me via Mastodon with a link to your post(s) by February 28 so that I can add them to this post. I'm happy to link to multiple posts. For example, here are some things you might like to write about:

  • what you're thinking of figuring out (in case other people have suggestions)
  • your notes along the way
  • your current setup
  • things you're particularly proud of

Looking forward to hearing from you!

View org source for this post

Using Silero voice activity detection to automatically queue multiple transcriptions with natrys/whisper.el

| audio, speech-recognition, emacs

I can queue multiple transcriptions with whisper.el so that they get processed sequentially with backup audio. It catches up when I pause to think. Now I want to use Silero voice activity detection to do that kind of segmentation for me automatically.

First, I need a Python server that can print out events when it notices the start or stop of a speech segment. If I print out the timestamps, I might be able to cross-reference it someday with interestingthings. For now, even just paying attention to the end of a segment is enough for what I want to do.

Python script for printing out events
import sounddevice as sd
import numpy as np
import torch
import sys
from datetime import datetime, timedelta

SILENCE_DURATION = 500
SAMPLING_RATE = 16000
CHUNK_SIZE = 512
model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
                              model='silero_vad',
                              force_reload=False)

(get_speech_timestamps, save_audio, read_audio, VADIterator, collect_chunks) = utils
vad_iterator = VADIterator(model, threshold=0.5, min_silence_duration_ms=SILENCE_DURATION)

stream_start_time = None

def format_iso_with_offset(offset_seconds):
    if stream_start_time is None:
        return "PENDING"
    event_time = stream_start_time + timedelta(seconds=offset_seconds)
    return event_time.astimezone().isoformat(timespec='milliseconds')

def audio_callback(indata, frames, time, status):
    global stream_start_time
    if status:
        print(status, file=sys.stderr)
    if stream_start_time is None:
        stream_start_time = datetime.now()
    tensor_input = torch.from_numpy(indata.copy()).flatten()
    speech_dict = vad_iterator(tensor_input, return_seconds=True)
    if speech_dict:
        if "start" in speech_dict:
            print(f"START {format_iso_with_offset(speech_dict['start'])}", flush=True)
        if "end" in speech_dict:
            print(f"END {format_iso_with_offset(speech_dict['end'])}", flush=True)
try:
    with sd.InputStream(samplerate=SAMPLING_RATE,
                        channels=1,
                        callback=audio_callback,
                        blocksize=CHUNK_SIZE):
        while True:
            pass
except KeyboardInterrupt:
    print("\nStopping...")

Then I can start this process from Emacs:

(defvar my-vad-events-process nil)
(defvar my-vad-events-dir "~/proj/speech/")
(defvar my-vad-events-command `(,(expand-file-name ".venv/bin/python" my-vad-events-dir)
                                "vad-events.py"))

(defun my-vad-events-filter (proc string)
  (when (and (string-match "^END" string)
             (process-live-p whisper--recording-process))
    (message "Noticed speech turn: %s" string)
    (my-whisper-continue)))

(defun my-vad-events-ensure ()
  "Start the process if it's not already running."
  (interactive)
  (unless (process-live-p my-vad-events-process)
    (let ((default-directory my-vad-events-dir)
          (process-environment
           (cons
            (format
             "PULSE_PROP=node.description='%s' media.name='%s' node.name='%s'"
             "vad" "vad" "vad")
            process-environment)))
      (setq my-vad-events-process
            (make-process
             :name "vad-events"
             :command my-vad-events-command
             :buffer (get-buffer-create "*vad-events*")
             :stderr (get-buffer-create "*vad-events-err*")
             :filter #'my-vad-events-filter)))))

Because I added Pulse properties to the process environment, I can easily use epwgraph to rewire the input so that it gets the input from my VirtualMicSink instead of the default system audio device. (Someday I'll figure out how to specify that as the input automatically.)

Now I can press my shortcut for my-whisper-continue to start the process. As I keep talking, it will continue to record. When I pause for more than a second between sentences, then it will send that chunk to the server for transcription without me having to press another button, while still listening for more speech.

How is this different from the streaming approach that many real-time speech recognition services offer? I think this gives me a bit more visibility into and control of the process. For my personal use, I don't need to have everything processed as quickly as possible, and I'm not trying to replicate live captions. I just want to be able to look back over the last five minutes to try to remember what I was talking about. I usually have a lot of quiet time as I think through my next steps, and it's fine to have it catch up then. I also like that I can save time-stamped audio files for later processing, divided according to the speech segments. Those might be a little bit easier to work with when I get around to compositing them into a video.

This is part of my Emacs configuration.
View org source for this post

Emacs: This year, I will... / Cette année, je vais...

| emacs
In English

Inspired by the Emacs Carnival theme for January, this year, I will:

  • take more notes, perhaps with the help of speech recognition
    • Now I can manage my time better since the kiddo can study more independently, but I still have to manage interruptions from my life and from my own brain. If I write or dictate notes while I think or work, I can get back into things more easily. Speech recognition allows me to capture more thoughts, even if the results need a fair bit of reviewing and revising. I'm looking forward to checking out the ideas others have shared, such as configuring more transient.el interfaces and adding more tweaks to my Org Mode publishing.
  • stream to think out loud
    • My daily routines are gradually becoming more predictable, so I might be able to schedule streams or jump onto one to share ideas. If I do a live broadcast, I can learn from the questions and comments I get. Fridays at 10:30 AM seems like a good time for that, and when possible, I can also do it spontaneously.
  • record videos to share what I learn, and package my functions to make them reusable when possible
    • Like the previous points, sharing my ideas and my work can help others and start conversations. If I improve subed-record.el to combine video and audio recordings with screenshots and subtitles, I might be able to create videos more easily.
  • write more in French and improve my environment for this purpose
    • I like the mental stimulation of writing in another language. Lots of other people are learning languages too, so we might be able to help each other. For example, my functions for highlighting new words and grammatical errors could be useful.
  • practice speaking by making audio recordings with the help of subed-record and speech synthesis to improve my pronunciation
    • The advantage of implementing this in Emacs is that I can customize my workflow. For example, I want to write my draft and then record the audio sentence by sentence after listening to an example. I also want to see if I can more easily edit a recording of my session with my tutor so that I can keep only my last pronunciation attempts.
  • improve processes for EmacsConf and Emacs News
    • I might be able to organize much of it by myself, but other volunteers might also be able to help, which would be even better. I want to create the infrastructure to manage several virtual meetings simultaneously, probably with speech recognition, audio spatialization, and lots of keyboard shortcuts. I also want to improve my subtitling process.

The more I do in Emacs, the more I think of ideas for improvement…

En français

Sur le thème du Carnaval d'Emacs pour janvier - Cette année, je vais :

  • prendre plus de notes, peut-être avec l'aide de la reconnaissance vocale
    • Même si je peux mieux gérer mon temps maintenant que ma fille peut étudier de manière plus indépendante, je dois toujours gérer les interruptions de ma vie et de mon propre cerveau. Si j'écris ou dicte des notes pendant que je pense ou que je travaille, je reprendrai le fil de mes pensées plus facilement. La reconnaissance vocale me permet de capter plus de pensées, même si les résultats nécessitent de revoir et de réviser. Je suis enthousiasmée par les idées que d'autres partagent, comme la configuration de l'interface transient.el et l'amélioration de la publication d'Org Mode.
  • streamer pour réfléchir à voix haute
    • Mes routines quotidiennes deviennent plus prévisibles petit à petit, je peux donc planifier des événements ou me lancer dans le streaming pour partager des idées. Si je fais une diffusion en direct, je peux profiter des questions et des commentaires que je reçois. Le vendredi à 10h30 semble être un bon moment pour ça, et quand c'est possible, je peux aussi le faire spontanément.
  • enregistrer des vidéos pour partager ce que j'apprends, et packager mes fonctions pour les rendre réutilisables quand c'est possible
    • Comme les éléments précédents, partager mes idées et mon travail peut aider d'autres personnes et lancer des conversations. Si j'améliore subed-record.el pour combiner des enregistrements vidéo et audio avec des captures d'écran et des sous-titres, je peux créer des vidéos plus facilement.
  • écrire davantage en français et améliorer mon environnement à cet effet
    • J'apprécie la stimulation mentale d'écrire dans une autre langue. Beaucoup de gens apprennent d'autres langues, donc nous pouvons nous aider. Par exemple, mes fonctions pour surligner les nouveaux mots et les erreurs grammaticales pourraient être utiles.
  • pratiquer l'expression orale en faisant des enregistrements audio avec l'aide de subed-record et la synthèse vocale pour améliorer ma prononciation
    • L'avantage de l'implémentation dans Emacs est que je peux personnaliser mon flux de travail. Par exemple, je veux écrire mon brouillon, puis enregistrer l'audio phrase par phrase après l'avoir écouté. Je veux aussi modifier un enregistrement de mon rendez-vous avec ma tutrice pour que je garde seulement mes dernières essaies de prononciation.
  • améliorer les processus pour EmacsConf et Emacs News
    • Il se peut que je puisse l'organiser en grande partie par moi-même, mais il se peut aussi que d'autres bénévoles m'aident, ce qui serait encore mieux. Je veux créer l'infrastructure pour gérer plusieurs réunions virtuelles simultanément, probablement grâce à la reconnaissance vocale, la spatialisation audio, et de nombreux raccourcis clavier. Je veux aussi améliorer mon processus de sous-titrage.

Plus j'en fais dans Emacs, plus je pense à des idées d'amélioration…

Thanks to Christian Tietze for hosting!

View org source for this post

2026-01-26 Emacs news

| emacs, emacs-news
View org source for this post

Queuing multiple transcriptions with whisper.el speech recognition

| audio, speech-recognition, emacs

I want to be able to talk out loud and have the ideas go into Emacs. I can do this in a number of different ways:

  1. I briefly demonstrated a step-by-step approach with natrys/whisper.el with a single file. I press a keyboard shortcut to start the recording, another shortcut to stop the recording, and it transcribes it in the background. But the way whisper.el is set up is that if I press the keyboard shortcut to start recording again it will offer to interrupt the transcription process, which is not what I want. I want to just keep talking and have it process results as things come in.
  2. I'm also experimenting with Google Chrome's web speech API to do continuous speech recognition, which I can get into Emacs using a web socket.
  3. What I've just figured out is how to layer a semi-continuous interface for speech recognition on top of whisper.el so that while it's processing in the background, I can just press a keyboard shortcut (I'm using numpad 9 to call my-whisper-continue) to stop the previous recording, queue it for processing, and start the next recording. If I use this keyboard shortcut to separate my thoughts, then Whisper has a much easier time making sense of the whole sentence or paragraph or whatever, instead of trying to use the sliding 30 second context window that many streaming approaches to speech recognition try to use.

Question: Did you fix the keyboard delay you've got while speech catches what you're saying?

Sometimes, when the speed recognition kicks in, my computer gets busy. When my computer gets really busy, it doesn't process my keystrokes in the right order, which is very annoying because then I have to delete the previous word and retype it. I haven't sorted that out yet, but it seems like I probably have to lower the priority on different processes. On the plus side, as I mentioned, if I dictate things instead of typing them, then I don't run into that problem at all.

Also, other notes on delays: The continuous speech recognition via Google Chrome shows up fairly quickly, but it's not very precise, and it doesn't have punctuation. Even if there's a little bit of a delay, as long as I press the my-whisper-continue shortcut after each thought, then I can get that text into my Emacs buffer using the nicer transcription from my selected model. There is going to be a bit of a delay for that one because it gets processed at the end of the thought. Also, I need to start thinking in complete sentences instead of just adding one cause after the other as my brain goes on all of these tangents. I think it's pretty promising. There's the continuous speech recognition via Google Chrome if I don't mind the lower accuracy and lack of punctuation, and I can still get the pretty version on the other side.

Why talk out loud? I liked the Bookclub Tapas presentation that Maddie Sullivan did at EmacsConf 2025. Talking out loud helps me be a lot more verbose about what I'm saying, compared to typing things out or even like having to switch to my notes or interrupting my screen with an Org capture buffer. Of course I want to clean that up for putting into a blog post, but given that my life still sometimes has random interruptions from a kiddo who must have my attention at that very minute, having that kind of record that I can at least try to reread afterwards to reconstruct what I was thinking about sounds like it might be helpful.

Still, making sense out loud is hard. I'm not actually used to talking to people that much now. This is probably a good reason for me to experiment with streaming more. Then I get the practice in talking out loud, there are backup recordings, and people can ask questions when things are unclear.

Of course, sometimes the text doesn't quite make sense because of the speech recognition errors. I can usually figure it out from the context. I save the audio as well so that I can go back and listen to it again if I really need to.

Anyway, here's the code for sending the current recording to whisper in the background and starting another recording. It assumes a lot about how things are set up. For example, I'm only testing this with a local speaches server instead of whisper.cpp. You might need to look at my other speech related configuration blog posts and sections in order to make sense of it.

Code for queuing whisper.el requests to a local server
(defvar my-whisper--queue nil)
(defun my-whisper-continue (&optional arg)
  "Send what we've got so far for transcription and then continue recording.
Call with \\[universal-argument] to signal that we can stop."
  (interactive "P")
  (require 'whisper)
  (if arg
      (my-whisper-done)
    (setq whisper--marker (point-marker) whisper--point-buffer (current-buffer))
    (when (process-live-p whisper--recording-process)
      ;; queue only if the last one is not asking for the same file
      (unless
          (string=
           (plist-get
            (car
             (last my-whisper--queue))
            :file)
           whisper--temp-file)
        (add-to-list
         'my-whisper--queue
         (list :file whisper--temp-file
               :buffer
               (format "*result: %s*" (file-name-base whisper--temp-file)))
         t))
      ;; Remove the sentinel; handle results ourselves
      (set-process-sentinel whisper--recording-process
                            (lambda (process event)
                              (my-whisper-process-queue)))
      (interrupt-process whisper--recording-process))
    (run-hooks 'whisper-before-transcription-hook)
    (whisper--setup-mode-line :show 'recording)
    (whisper--record-audio)))

(defun my-whisper-discard ()
 "Ignore the previous recording."
  (interactive)
  (when (process-live-p whisper--recording-process)
    ;; Remove the sentinel; handle results ourselves
    (set-process-sentinel whisper--recording-process
                          (lambda (process event)
                            (when (file-exists-p whisper--temp-file)
                              (delete-file whisper--temp-file))
                            (my-whisper-process-queue)))
    (interrupt-process whisper--recording-process)))

(defun my-whisper-discard-and-continue ()
 "Ignore the previous recording and continue."
  (interactive)
  (if (process-live-p whisper--recording-process)
      (progn
        ;; Remove the sentinel; handle results ourselves
        (set-process-sentinel whisper--recording-process
                              (lambda (process event)
                                (my-whisper-process-queue)
                                (my-whisper-continue)))
        (interrupt-process whisper--recording-process))
    (my-whisper-continue)))

(defun my-whisper-done ()
  (interactive)
  (when (process-live-p whisper--recording-process)
    (add-to-list
     'my-whisper--queue
     (list :file whisper--temp-file
           :buffer
           (format "*result: %s*" (file-name-base whisper--temp-file)))
     t)
    ;; Remove the sentinel; handle results ourselves
    (set-process-sentinel whisper--recording-process
                          (lambda (process event)
                            (my-whisper-process-queue)))
    (whisper--setup-mode-line :hide 'recording)
    (interrupt-process whisper--recording-process)))

(defun my-whisper-process-queue-result ()
  "Process the first part of the queue that already has results."
  (while (plist-get (car my-whisper--queue) :results)
    (let ((o (pop my-whisper--queue)))
      (unless my-whisper-target-markers
        (setq whisper--marker (point-marker)
              whisper--point-buffer (current-buffer)))
      (with-current-buffer (plist-get o :buffer)
        (erase-buffer)
        (insert (plist-get o :results)))
      ;; Only works with my fork: https://github.com/sachac/whisper.el/tree/whisper-insert-text-at-point-function
      (whisper--handle-transcription-output nil (plist-get o :buffer)))))

(defun my-whisper-process-queue ()
  (let (o)
    (while (setq o (seq-find (lambda (o) (and (plist-get o :file)
                                              (not (plist-get o :process))
                                              (not (plist-get o :results))))
                             my-whisper--queue))
      (let* ((headers (list "Content-Type: multipart/form-data"))
             (params (list (concat "file=@"
                                   (plist-get o :file))
                           "temperature=0.0"
                           "temperature_inc=0.2"
                           "response_format=json"
                           (concat "model=" whisper-model)
                           (concat "language=" whisper-language)))
             (url (format my-whisper-url-format whisper-server-host whisper-server-port))
             (command `("curl" "-s"
                        ,url
                        ,@(mapcan (lambda (h) (list "-H" h)) headers)
                        ,@(mapcan (lambda (p) (list "-F" p)) params))))
        (with-current-buffer (get-buffer-create (plist-get o :buffer))
          (erase-buffer))
        (plist-put
         o :process
         (make-process
          :name "whisper-curl"
          :command command
          :buffer (plist-get o :buffer)
          :coding 'utf-8
          :sentinel
          (lambda (process event)
            (with-current-buffer (process-buffer process)
              (let ((current my-whisper--queue-item))
                (when (and (get-buffer (plist-get current :buffer))
                           (string-equal "finished\n" event))
                  (with-current-buffer (plist-get current :buffer)
                    (goto-char (point-min))
                    (plist-put current :results
                               (or
                                (condition-case nil
                                    (gethash "text" (json-parse-buffer))
                                  (error ""))
                                "(error)"))))))
            (my-whisper-process-queue-result))))
        (plist-put o :command (string-join command " "))
        (with-current-buffer (process-buffer (plist-get o :process))
          (setq-local my-whisper--queue-item o))))))
(defvar-local my-whisper--queue-item nil)

(defun my-whisper-reprocess-queue ()
  (interactive)
  (setq whisper--marker (point-marker) whisper--point-buffer (current-buffer))
  (mapc (lambda (o)
          (when (process-live-p (plist-get o :process))
            (kill-process (plist-get o :process)))
          (when (get-buffer (plist-get o :buffer))
            (kill-buffer (plist-get o :buffer)))
          (plist-put o :process nil)
          (plist-put o :results nil))
        my-whisper--queue)
  (my-whisper-process-queue))

(defun my-whisper-clear-queue ()
  (interactive)
  (mapc (lambda (o)
          (when (process-live-p (plist-get o :process))
            (kill-process (plist-get o :process)))
          (when (get-buffer (plist-get o :buffer))
            (kill-buffer (plist-get o :buffer)))
          (plist-put o :process nil)
          (plist-put o :results nil))
        my-whisper--queue)
  (setq my-whisper--queue nil))

(keymap-global-set "<kp-9>" #'my-whisper-continue)
(keymap-global-set "<kp-8>" #'my-whisper-discard-and-continue)
(keymap-global-set "C-<kp-9>" #'my-whisper-done)
This is part of my Emacs configuration.
View org source for this post

Emacs and whisper.el: Trying out different speech-to-text backends and models

| audio, emacs, speech-recognition

I was curious about parakeet because I heard that it was faster than Whisper on the HuggingFace leaderboard. When I installed it and got it running on my laptop (CPU only, no GPU), it seemed like my results were a little faster than whisper.cpp with the large model, but much slower than whisper.cpp with the base model. The base model is decent for quick dictation, so I got curious about other backends and other models.

In order to try natrys/whisper.el with other backends, I needed to work around how whisper.el validates the model names and sends requests to the servers. Here's the quick and dirty code for doing so, in case you want to try it out for yourself.

(defvar my-whisper-url-format "http://%s:%d/transcribe")
(defun my-whisper--transcribe-via-local-server ()
  "Transcribe audio using the local whisper server."
  (message "[-] Transcribing via local server")
  (whisper--setup-mode-line :show 'transcribing)
  (whisper--ensure-server)
  (setq whisper--transcribing-process
        (whisper--process-curl-request
         (format my-whisper-url-format whisper-server-host whisper-server-port)
         (list "Content-Type: multipart/form-data")
         (list (concat "file=@" whisper--temp-file)
               "temperature=0.0"
               "temperature_inc=0.2"
               "response_format=json"
               (concat "model=" whisper-model)
               (concat "language=" whisper-language)))))
(defun my-whisper--check-model-consistency () t)
(defun my-whisper--ensure-server () t)

(with-eval-after-load 'whisper
  (advice-add 'whisper--transcribe-via-local-server :override #'my-whisper--transcribe-via-local-server)
  (advice-add 'whisper--check-model-consistency :override #'my-whisper--check-model-consistency)
  (advice-add 'whisper--ensure-server :override #'my-whisper--ensure-server)
  )

Then I have this function for trying things out.

(defun my-test-whisper-api (url &optional args)
  (with-temp-buffer
    (apply #'call-process "curl" nil t nil "-s"
           url
         (append (mapcan
                  (lambda (h) (list "-H" h))
                  (list "Content-Type: multipart/form-data"))
                 (mapcan
                  (lambda (h) (list "-F" h))
                  (list (concat "file=@" whisper--temp-file)
                        "temperature=0.0"
                        "temperature_inc=0.2"
                        "response_format=verbose_json"
                        (concat "language=" whisper-language)))
                 args))
    (message "%s %s" (buffer-string) url)))

Here's the audio file. It is around 10 seconds long. I run the benchmark 3 times and report the average time.

Download

Code for running the benchmarks
(let ((times '3))
(mapcar
 (lambda (group)
   (let ((whisper--temp-file "/home/sacha/recordings/whisper/2026-01-19-14-17-53.wav"))
     ;; warm up the model
     (eval (cadr group))
     (list
      (format "%.3f"
              (/ (car
                  (benchmark-call (lambda () (eval (cadr group))) times))
                 times))
      (car group))))
 '(
   ("parakeet"
    (my-test-whisper-api
     (format "http://%s:%d/v1/audio/transcriptions" whisper-server-host 5092)))
   ("whisper.cpp base-q4_0"
    (my-test-whisper-api
     (format "http://%s:%d/inference" whisper-server-host 8642)))
   ("speaches whisper-base"
    (my-test-whisper-api
     (format "http://%s:%d/v1/audio/transcriptions" whisper-server-host 8001)
     (list "-F" "model=Systran/faster-whisper-base")))
   ("speaches whisper-base.en"
    (my-test-whisper-api
     (format "http://%s:%d/v1/audio/transcriptions" whisper-server-host 8001)
     (list "-F" "model=Systran/faster-whisper-base.en")))
   ("speaches whisper-small"
    (my-test-whisper-api
     (format "http://%s:%d/v1/audio/transcriptions" whisper-server-host 8001)
     (list "-F" "model=Systran/faster-whisper-small")))
   ("speaches whisper-small.en"
    (my-test-whisper-api
     (format "http://%s:%d/v1/audio/transcriptions" whisper-server-host 8001)
     (list "-F" "model=Systran/faster-whisper-small.en")))
   ("speaches lorneluo/whisper-small-ct2-int8"
    (my-test-whisper-api
     (format "http://%s:%d/v1/audio/transcriptions" whisper-server-host 8001)
     (list "-F" "model=lorneluo/whisper-small-ct2-int8")))
   ;; needed export TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=1
   ("whisperx-server Systran/faster-whisper-small"
    (my-test-whisper-api
     (format "http://%s:%d/transcribe" whisper-server-host 8002)))))
)

I tried it with:

Looks like speaches + faster-whisper-base is the winner for now. I like how speaches lets me switch models on the fly, so maybe I can use base.en generally and switch to base when I want to try dictating in French. Here's how I've set it up to use the server I just set up.

(setq whisper-server-port 8001 whisper-model "Systran/faster-whisper-base.en"
      my-whisper-url-format "http://%s:%d/v1/audio/transcriptions")

At some point, I'll override whisper--ensure-server so that starting it up is smoother.

Benchmark notes: I have a Lenovo P52 laptop (released 2018) with an Intel Core i7-8850H (6 cores, 12 threads; 2.6 GHz base / 4.3 GHz turbo) with 64GB RAM and an SSD. I haven't figured out how to get the GPU working under Ubuntu yet.

This is part of my Emacs configuration.
View org source for this post

2026-01-19 Emacs news

| emacs, emacs-news

Links from reddit.com/r/emacs, r/orgmode, r/spacemacs, Mastodon #emacs, Bluesky #emacs, Hacker News, lobste.rs, programming.dev, lemmy.world, lemmy.ml, planet.emacslife.com, YouTube, the Emacs NEWS file, Emacs Calendar, and emacs-devel. Thanks to Andrés Ramírez for emacs-devel links. Do you have an Emacs-related link or announcement? Please e-mail me at sacha@sachachua.com. Thank you!

View org source for this post