Converting our VTT files to TTML
| emacsconf, geek, ffmpeg
I wanted to convert our VTT files to TTML files so that we might be
able to use them for training lachesis for transcript segmentation. I
downloaded the VTT files from EmacsConf 2021 to a directory and copied
the edited captions from the EmacsConf 2022 backstage area (using
head -1 ${FILE} | grep -q "captioned"
to distinguish them from the
automatic ones). I installed the ttconv python package. Then I used
the following command to convert the TTML files:
for FILE in *.vtt; do BASE=$(basename -s .vtt "$FILE"); ffmpeg -y -i $FILE $BASE.srt; tt convert -i $BASE.srt -o $BASE.ttml done
I haven't gotten around to installing whanever I need in order to get lachesis to work under Python 2.7, since it hasn't been updated for Python 3. It'll probably be a low-priority project anyway, as EmacsConf is fast approaching. Anyway, I thought I'd stash this in my blog somewhere in case I need to make TTML files again!