Topic - Speech synthesis

I first started playing around with speech synthesis in Emacs in 2002 or so. My laptop screen intermittently stopped working, and I wanted to still be able to use it to take notes in university or write programs. The Emacspeak audio desktop was already very useful back then. Even though Espeak sounded robotic, it was understandable even at high speeds. Emacspeak allowed me to confirm that I had correctly typed each character, each word, or each sentence. It was interesting to be able to take notes while my screen was flat. I also enjoyed using speech synthesis for different kinds of notifications such as having IRC mentions read out loud so that I could hear them even if I was a short distance away from my computer. When I started experimenting with wearable computing, I found that using an audio interface was a lot more discreet than the head-mounted display that I also tried. Earphones looked totally normal. Nobody asked me about them, and I didn't have to talk to or be worried about strangers.

Speech synthesis has come a long way since then. It's possible to get very natural-sounding voices or even mimic specific voices. Latency (time to first sound) for these services is still not quite as low as those older robotic systems, but it's fine for use cases like generating reference audio when I'm learning French.

Related blog post:

Other related resources:

See also: https://sachachua.com/topic/speech-recognition/

View Org source for this post
You can e-mail me at sacha@sachachua.com.