Converting our VTT files to TTML

Nov 17, 2022| subed, emacsconf, geek, ffmpeg

I wanted to convert our VTT files to TTML files so that we might be able to use them for training lachesis for transcript segmentation. I downloaded the VTT files from EmacsConf 2021 to a directory and copied the edited captions from the EmacsConf 2022 backstage area (using head -1 ${FILE} | grep -q "captioned" to distinguish them from the automatic ones). I installed the ttconv python package. Then I used the following command to convert the TTML files:

for FILE in *.vtt; do
    BASE=$(basename -s .vtt "$FILE");
    ffmpeg -y -i $FILE $BASE.srt; tt convert -i $BASE.srt -o $BASE.ttml
done

I haven't gotten around to installing whanever I need in order to get lachesis to work under Python 2.7, since it hasn't been updated for Python 3. It'll probably be a low-priority project anyway, as EmacsConf is fast approaching. Anyway, I thought I'd stash this in my blog somewhere in case I need to make TTML files again!

You can e-mail me at sacha@sachachua.com.