Notes on transcription with and without a foot pedal

Posted: - Modified: | analysis, decision, kaizen, review

I finally sat down and transcribed the interview on discovering yourself through blogging, where Holly Tse puts up with my firehose braindump of things I’ve learned. It’s an hour of audio, more than 53,500 letters, and about 9,500 actual words. The words per minute measurement uses a standard of five characters per “word”. This means I clocked in at more than 180 wpm.

I like reading much more than I like listening, and a transcript makes it much easier for me to search and review what I said. After considering the options, I ended up transcribing the interview myself. I even built my own foot pedal. ;) So, here’s what I’ve learned.

I started off by trying to use ExpressScribe and Dragon NaturallySpeaking for automatic transcription. It looks like I’ll need to do a lot of training to get this ready for transcription. The fully-automated transcript was useless. I tried slowing down the recording down and speaking it into Dragon NaturallySpeaking (somewhat like simultaneous translation?). This was marginally better, but still required a lot of editing.

I gave up on dictation (temporarily) and typed the text into Emacs, using keyboard shortcuts to control rewind/stop/play in ExpressScribe.

Type Typing without a foot pedal, 50% speed
Length 15 audio minutes
Duration 60 minutes of work
Factor audio minutes x 4
Characters 14137 (~ 2800 words @ 5 characters/word)
Typing WPM ~50wpm (90 wpm input, 56% efficiency)

I took a second look at the outsourced transcription options. CastingWords had raised prices since I last checked it. Now there wasn’t much of a gap between CastingWords and TranscriptDivas, another transcription company I’d considered. With TranscriptDivas, transcribing an hour of audio would have cost around CAD 83 + tax, but I’d get it in three days.

Type Transcription company
Cost CAD 83 + tax = ~CAD 95 / audio hour

Before I signed up for the service, though, I thought I’d give transcription another try – particularly as I was curious about my DIY foot pedal.

I told myself I’d do another 15 audio minutes so that I could see what it’s like to transcribe with my foot pedal. I ended up doing the whole thing. I used ExpressScribe to play back the audio at 50% speed, and I set the following global shortcuts for my foot pedal: center-press was rewind, left was stop, and right was play. I ended up using rewind more than anything else, so it worked out wonderfully.

Type Typing with DIY foot pedal, 50% speed
Length 45 audio minutes
Duration 120 minutes of work
Factor audio minutes x 2.6
Characters 39400 (~ 7880 words)
Typing WPM ~65wpm (90 wpm input, 72% efficiency)

Discovery: Listening to myself at 50% makes it unfamiliar enough to not make me twitchy, although it can’t do anything about me being sing-song and too “like, really“. That might be improved through practice.

90wpm input was pretty okay. Faster, and I found myself pressing rewind more often so that I could re-hear speech while catching up.

Assuming sending it out to a transcription company would have cost CAD 95/audio hour and transcribing the entire thing myself would have taken 3 hours (including breaks), doing it myself results in a decent CAD 30/work hour of after-tax savings. Not bad, even though doing it myself meant I procrastinated it for two weeks. It might be cheaper if I hire a transcriptionist through oDesk or similar services. With a infrequent transcription needs, though, I’d probably spend more than two hours on screening, hiring, and delegating.

Hacking together an Arduino foot pedal was definitely a win. Transcribing with it was okay, but not my favourite activity. I might send work to a transcription company if there’s enough value in a shorter turnaround, because it took me two weeks to get around to doing this one. Good to know!

2011-08-31 Wed 21:45

You can comment with Disqus or you can e-mail me at sacha@sachachua.com.