Thinking about outsourcing transcription or doing it myself

I like reading much more than I like listening to someone talk, and much, much more than listening to myself talk. Text can be quickly read and shared. Audio isn’t very searchable. Besides, I still need to work on breathing between sentences and avoiding the temptation to let a sentence run on and on because another cool idea has occurred to me. Perhaps that’s what I’d focus on next, if I ever resume Toastmasters; my prepared speeches can be nice and tight, but my ad-libbed ones wander. More pausing needed.

So. Transcription. I could do it myself. I type quickly. Unfortunately, I speak quite a bit faster than I type, so I usually need to slow it down to 50% and rewind occasionally. ExpressScribe keyboard shortcuts are handy. I’ve remapped rewind to Ctrl-H so that I don’t need to take my fingers off the home row. But there’s still the there’s the argh factor of listening to myself. This is useful for reminding me to breathe, yes, but it only takes five minutes for me to get that point. ;) The other night, it took me an hour to get through fifteen minutes, which is slower than I expected. An hour-long podcast interview should take about four hours of work, then.

I could use transcription as an excuse to train Dragon NaturallySpeaking 11, the dictation software I’d bought but for this very purpose but haven’t used as much as I thought I would. It recognizes many words, but I have a lot of training to do before I get it up to speed, and I still need to edit. This would be a time investment for uncertain rewards. I still need to time how long it takes me to dictate and edit a segment.

Foot pedals would be neat, particularly if I could reprogram them for other convenient shortcuts. Three-button pedals cost from $50-$130, not including shipping. In addition to using it to stop, play, and rewind recordings, I’d love to use it for scrolling webpages or pressing modifier keys. I often work with two laptops, so it’s tempting. (And then there’s the idea of learning how to build my own human interface device using the Arduino… ) – UPDATE: I’ve built one using the Arduino! I can’t wait to try it out.

In terms of trading money for time, I’ve been thinking about trying Casting Words, which is an Amazon Mechanical Turk-based business that slices up submitted files into short chunks. Freelancers work on transcribing these chunks, which are then reassembled and edited. The budget option costs USD 0.75 per audio minute, which means an hour-long interview will cost about USD 45 to transcribe. That option doesn’t have a guaranteed turnaround, though, so I could be waiting for weeks. In addition, I tend to talk quickly, so that might trigger a “Difficult Audio” surcharge of another USD 0.75 per minute, or about USD 90 per audio hour.

For better quality at a higher price, I could work with other transcription companies. For example, Transcript Divas will transcribe audio for CAD 1.39/minute, and they guarantee a 3-day turnaround (total for 1 hour: CAD 83.40). Production Transcripts charges USD 2.05/minute for phone interviews.

I could hire a contractor through oDesk or similar services. One of the benefits of hiring someone is that he or she can become familiar with my voice and way of speaking. Pricing is based on effort instead of a flat rate per audio minute, and it can vary quite a bit. One of my virtual assistants took 14 hours to transcribe three recordings that came to 162 minutes total. At $5.56 per work hour, that came to $0.48 per audio minute, or $28 per audio hour. oDesk contractors are usually okay with an as-needed basis, which is good because I’ve scaled down my talks a lot. (I enjoy writing more!)

So here are the options:

  • Type it myself: 4 hours of discretionary time
  • Dictation: Unknown hours of discretionary time, possible training improvements for Dragon NaturallySpeaking
  • Foot pedals: Probably down to 3.5 hours / audio hour, but requires a little money; hackability
  • Casting Words: USD 90 per audio hour, unknown timeframe
  • Transcript Divas: CAD 84 per audio hour, 3-day turnaround
  • Contractor: Can be around USD 30 per audio hour, depending on contractor

I’m going to go with dictating into Dragon NaturallySpeaking because I need to train it before I can get a sense of how good it is. It takes advantage of something I already own and am underusing. Who knows, if I can get the hang of this, I might use it to control more functionality. We’ll see!

One Pingback/Trackback

  • Hello Sacha,
    Before you start training Dragon NaturallySpeaking, please download and install the upgrade to Dragon NaturallySpeaking 11.5. This is a free upgrade to your existing software and it has some very nice features and enhancements.

    You will need to start training after installing this upgrade in order to take advantage of the new capabilities and enhancements.

    Good luck and keep writing!
    This comment was created using Dragon naturally speaking.

  • Upgraded! =) I’ll keep giving it a try. I feel a little self-conscious dictating to my computer around W-, but I’ll train it other times!

  • I used to do some transcribing for a court reporting business on a freelance basis, and found that transcription takes about three times as long as the audio (20 minutes of audio per hour of typing) when using a foot pedal. And now that you’ve got one of those, maybe you can use it with this free app:

  • Brock: Yup, I’ve downloaded that, and I’m looking forward to trying it out. =) I tried ExpressScribe with function keys and that was neat, but using my foot pedal will be even better.

  • Pingback: Notes on transcription with and without a foot pedal | sacha chua :: living an awesome life()

  • Nice sum up Sacha. I part own one of Australia’s largest transcription companies as well as a business selling digital voice recorders and transcription kits. What you have written about we get asked about everyday. We are seeing a big trend towards Dragon both on Windows (Dragon NaturallySpeaking) and on the Mac (Dragon Dictate) – Nuance leading the way now on both platforms.

    The voice training for Dragon on Windows and Mac takes around 4 minutes for basic training so you can have it up and running really quickly. The latest release (11.5 for Windows, 2.5 for Mac) lets you use your iPhone as a remote mic, so if you don’t like the on-the-head mic you can just use your iPhone.

    With Windows, the Premium and Pro editions of NaturallySpeaking can also take your recorded voice. So you don’t always have to be in front of your computer. You can have a bright idea anywhere and simply record your thoughts into a good quality voice recorder. Then this audio file can be later loaded into DNS and it will transcribe for you.

    One key thing to remember with voice recognition is you must speak clearly and have minimal background noise. This also applies to recordings if you plan to run those into DNS. So although the concept of recording away from your computer is a good one it may not always be practical, so bear that in mind.

    Finally, in my loooong rambling comment. Voice recognition technology is only good for one person speaking. So you can’t record an interview or meeting and run that into Dragon. This is the key reason transcription businesses exists. That and also the high quality of human typed transcription. Although this will change in the coming years, already sites like are testing with multi speaker audio transcribed using voice recognition.

    By the way, stumbled across your blog thanks to the SkyGrid iPhone/iPad app.


  • Dave: Thanks for weighing in! I shared more results in Notes on transcription with and without a foot pedal. I’ve just done a few quick experiments with Dragon NaturallySpeaking, reading from my blog posts to test transcription speed and not listening/thinking speed. It looks like dictating at a medium accuracy setting gets me about 23wpm and dictating at the highest accuracy setting gets me 35wpm, after editing time is considered. I’ll keep trying segments to see how much the performance will improve after training.

  • I’m a professional transcriptionist and always use Express Scribe, a footpedal and a decent pair of headphones. It would just take way too long without the footpedals!

  • rrwriter

    Avoid Transcript Divas. I am currently quite disappointed in the quality of work they did for me in transcribing three focus groups. I have stated some (not nearly all) of my concerns, and documented them, but the company owner refuses a refund.