Topic - Speech recognition
I'm currently using the Speaches server along with Systran/faster-whisper-base for speech recognition into Emacs. I use my fork of natrys/whisper.el (upstream) so that I can continuously queue transcription, run the output through various functions, and save the audio.
Why I'm interested in speech recognition:
- Sometimes I have a hard time remembering my train of thought. (Limitations of working memory!) It's useful to quickly capture my thoughts faster than I can type, even if the transcription has errors, and I can replay the recording if needed.
- Speech would be a great interface for mobile computing.
- Any number of reasons can make it difficult to type, either temporarily or permanently, such as accidents, disease, or physical decline. My mom has Parkinson's and her tremors make it hard for type most of the time. I know other people who've broken bones or gotten RSI. I enjoy writing and programming, so it makes sense to figure out alternative ways of input before I really need them.
- I'd like to be able to use my computer while my hands are busy with sewing or knitting.
- A voice interface could open up interesting possibilities. I'm not quite brave enough to give AI agents full access to my Emacs, but maybe someday.
Here's what I've figured out so far:
- ☑ Save the timestamped recordings and transcripts to allow review or assembly into other clips.
- ☑ Automatically capture screenshots as well.
- ☑ Save notes to an Org Mode task that isn't displayed on screen, such as my currently clocked in task or a new task in my inbox.
- ☑ Queue multiple speech recognition tasks to simulate continuous speech recognition with higher accuracy even on my older machine (Lenovo P52).
- ☑ Use Google Web Speech API or other speech recognition engines for real-time results if needed, despite lower accuracy. Direct other applications into it as input, and direct the output into Emacs buffers, IRC channels, Etherpad shared documents, and other targets.
- ☑ Dictate short text into other applications outside Emacs using a global keyboard shortcut, while still being able to take advantage of the text substitution that I do in Emacs.
- ☑ Expand simple yasnippets by saying "Okay, expand (name of snippet)".
Some things I'd like to be able to do by voice:
- Support multiple wake words in addition to "Okay, …"
- Scroll up and down in Emacs and other applications.
- Cut, copy, and paste.
- Insert or navigate to links based on my bookmarks.
- Select and act on different logical elements such as sentences or paragraphs.
- Select a word or a range of text.
- Press keyboard shortcuts. I might need to use xdotool to simulate keypresses.
- Run M-x commands.
- Insert text into the minibuffer and potentially press Enter.
- Insert names of symbols, such as with-current-buffer.
- Answer yes or no prompts. This is a little tricky because y-or-n-p is a blocking function and Emacs is single-threaded, so I might need to have an external process that uses xdotool to simulate keystrokes.
- Select an option by number or letter. This is similar to the y-or-n-p problem, just with a wider range of choices.
- Monitor simultaneous EmacsConf BigBlueButton web conferences for keywords so I can tell when a speaker needs my help.