Categories: geek » emacs

View topic page - RSS - Atom - Subscribe via email

Linking to Org Babel source in a comment, and making that always use file links

| emacs, org

I've been experimenting with these default header args for Org Babel source blocks.

(setq org-babel-default-header-args
      '((:session . "none")
        (:results . "drawer replace")
        (:comments . "link")  ;; add a link to the original source
        (:exports . "both")
        (:cache . "no")
        (:eval . "never-export") ;; explicitly evaluate blocks instead of evaluating them during export
        (:hlines . "no")
        (:tangle . "no"))) ;; I have to explicitly set up blocks for tangling

In particular, :comments link adds a comment before each source block with a link to the file it came from. This allows me to quickly jump to the actual definition. It also lets me use org-babel-detangle to copy changes back to my Org file.

I also have a custom link type to make it easier to link to sections of my configuration file (Links to my config). Org Mode prompts for the link type to use when more than one function returns a link for storing, so that was interrupting my tangling with lots of interactive prompts. The following piece of advice ignores all the custom link types when tangling the link reference. That way, the link reference always uses the file: link instead of offering my custom link types.

(advice-add #'org-babel-tangle--unbracketed-link
            :around (lambda (old-fun &rest args)
                      (let (org-link-parameters)
                        (apply old-fun args))))
This is part of my Emacs configuration.
View org source for this post

EmacsConf 2024 notes

Posted: - Modified: | emacs, emacsconf

The videos have been uploaded, thank-you notes have been sent, and the kiddo has decided to play a little Minecraft on her own, so now I get to write some quick notes on EmacsConf 2024.

Stats

Talks 31
Hours 10.7
Q&A web conferences 21
Hours 7.8
  • Saturday:
    • gen: 177 peak + 14 peak lowres
    • dev: 226 peak + 79 peak lowres
  • Sunday:
    • gen: 89 peak + 10 peak lowres

Server configuration:

meet 16GB 8core dedicated peak 409% CPU (100% is 1 CPU), average 69.4%
front 32GB 8core shared peak 70.66% CPU (100% is 1 CPU)
live 64GB 16core shared peak 552% CPU (100% is 1 CPU) average 144%
res 46GB 12core peak 81.54% total CPU (100% is 12 CPUs); each OBS ~250%), mem 7GB used
media 3GB 1core  

YouTube livestream stats:

Shift Peak Avg
Gen Sat AM 46 28
Gen Sat PM 24 16
Dev Sat AM 15 7
Dev Sat PM 20 12
Gen Sun AM 28 17
Gen Sun PM 26 18

Timeline

Call for proposals [2024-06-30 Sun]
CFP deadline [2024-09-20 Fri]
Speaker notifications [2024-09-27 Fri]
Publish schedule [2024-10-25 Fri]
Video target date [2024-11-08 Fri]
EmacsConf [2024-12-07 Sat]-[2024-12-07 Sat]

We did early acceptances again this year. That was nice. I wasn't sure about committing longer periods of time early in the scheduling process, so I usually tried to nudge people to plan a 20-minute video with the option of possibly doing more, and I okayed longer talks once we figured out what the schedule looked like.

There were 82 days between the call for proposals and the CFP deadline, another 49 days from that to the video target date, and 29 days between the video target date and EmacsConf. It felt like there was a good amount of time for proposals and videos. Six videos came in before or on the target date. The rest trickled in afterwards, which was fine because we wanted to keep things low-pressure for the speakers. We had enough capacity to process and caption the videos as they came in.

Data

We continued to use an Org file to store the talk information. It would be great to add some validation functions:

  • Check permissions and ownership for files
  • Check case sensitivity for Q&A type detection
  • Check BBB redirect pages to make sure they exist
  • Check transcripts for ` because that messes up formatting; consider escaping for the wiki
  • Check files are public and readable
  • Check captioned by comment vs caption status vs captioner

Speakers uploaded their files via PsiTransfer again. I didn't get around to setting up the FTP server. I should probably rename ftp-upload.emacsconf.org to upload.emacsconf.org so that people don't get confused.

Communication

As usual, we announced the EmacsConf call for proposals on emacs-tangents, Emacs News, emacsconf-discuss, emacsconf-org, https://reddit.com/r/emacs. System Crafters, Irreal, and Emacs APAC, mentioned it, and people also posted about EmacsConf on Mastodon, X, BlueSky, and Facebook. @len@toot.si suggested submitting EmacsConf to https://foss.events, so I did. There was some other EmacsConf-related discussions in r/emacs. 200ok and Ardeo organized an in-person meetup in Switzerland, and emacs.si got together in Ljubljana.

For communicating with speakers and volunteers, I used lots of mail merge (emacsconf-mail.el). Most of the templates only needed a little tweaking from last year's code. I added a function to help me double-check delivery, since the batches that I tried to send via async sometimes ran into errors.

Next time, I think it could be interesting to add more blog posts and Mastodon toots.

Also, maybe it would be good to get in touch with podcasts like

to give a heads up on EmacsConf before it happens and also let them know when videos are available.

We continued to use Mumble for backstage coordination. It worked out well.

Schedule

The schedule worked out to two days of talks, with two tracks on the first day, and about 15-20 minutes between each talk. We were able to adapt to late submissions, last-minute cancellations, and last-minute switches from Q&A to live.

We added an open mic session on Sunday to fill in the time from a last-minute cancellation. That worked out nicely and it might be a good idea to schedule in that time next year. It was also good to move some of the usual closing remarks earlier. We were able to wrap up in a timely manner, which was great for some hosts and participants because they didn't have to stay up so late.

Sunday was single-track, so it was nice and relaxed. I was a little worried that people might get bored if the current talk wasn't relevant to their interests, but everyone managed just fine. I probably should have remembered that Emacs people are good at turning extra time into more configuration tweaks.

Most of the scheduling was determined by people's time constraints, so I didn't worry too much about making the talks flow logically. I accidentally forgot to note down one speaker's time constraints, but he caught it when we e-mailed the draft schedule and I was able to move things around for a better time for him.

There was a tiny bit of technical confusion because the automated schedule publishing on res had case-sensitive matching (case-fold-search was set to nil), so if a talk was set to "Live" Q&A, it didn't announce it as a live talk because it was looking for live. Whoops. I've added that configuration setting to my emacsconf-stream-config.el, so the ansible scripts should get it next time.

I asked Leo and Corwin if they wanted to manually control the talks this year. They opted to leave it automatically managed by crontab so that they wouldn't have to worry as much about timekeeping. It worked reliably. Hooray for automation! The only scheduling hiccup was because I turned off the crontab so that we could do Saturday closing remarks when we wanted to and I forgot to reenable autopilot the next day. We noticed when the opening remarks didn't start right on the dot, and I got everything back on track.

Like last year, I scheduled the dev track to start a little later than the gen track. That made for a less frantic morning. Also, this year we scheduled Sunday morning to start with more IRC Q&A instead of live Q&A. We didn't notice any bandwidth issues on Sunday morning this time.

It would be nice to have Javascript countdowns in some kind of web interface to make it easier for hosts, especially if we can update it with the actual time the current video will end in MPV.

I can also update the emacsconf-stream.el code to make it easier to automatically count down to the next talk or to a specific talk.

We have Javascript showing local time on the individual talk pages, but it would be nice to localize the times on all the schedule/watch pages too.

Most of my stuff (scheduling, publishing, etc.) is handled by automation with just a little bit of manual nudging every so often, so it might be possible to organize an event that's more friendly to Europe/APAC timezones.

Recorded videos

As usual, we strongly encouraged speakers to record videos to lower everyone's stress levels and allow for captioning by volunteers, so that's what most speakers did. We were able to handle a few last-minute submissions as well as a live talk. Getting videos also meant we could publish them as each talk went live, including automatically putting the videos and transcripts on the wiki.

We didn't have obvious video encoding cut-offs, so re-encoding in a screen was a reliable way to avoid interruptions this year. Also, no one complained about tiny text or low resolution, so the talk preparation instructions seem to be working out.

Automatically normalizing the audio with ffmpeg-normalize didn't work out, so Leo Vivier did a last-minute scramble to normalize the audio the day before the conference. Maybe that's something that volunteers can help with during the lead-up to the conference, or maybe I can finally figure out how to fit that into my process. I don't have much time or patience to listen to things, but it would be nice to get that sorted out early.

Next year we can try remixing the audio to mono. One of the talks had some audio moving around, which was a little distracting. Also, some people listen to the talks in one ear, so it would be good to drop things down to mono for them.

We think 60fps videos stressed the res server a bit, resulting in dropped frames. Next year, we can downsample those to 30fps and add a note to the talk preparation instructions. The hosts also suggested looking into setting up streaming from each host's computer instead of using our shared VNC sessions.

There was some colour smearing and weirdness when we played some videos with mpv on res. Upgrading MPV to v0.38 fixed it.

Some people requested dark mode (light text on dark background), so maybe we can experiment with recommending that next year.

I did a last-minute change to the shell scripts to load resources from the cache directory instead of the assets/stream directory, but I didn't get all of the file references, so sometimes the test videos played or the introductions didn't have captions. On the plus side, I learned how to use j in MPV to reload a subtitle file.

Sometimes we needed to play the videos manually. If we get the hang of starting MPV in a screen or tmux session, it might be easier for hosts to check how much time is left, or to restart a video at a specific point if needed. Leo said he'll work on figuring out the configuration and the Lua scripts.

I uploaded all the videos to YouTube and scheduled them. That was nice because then I didn't have to keep updating things during the conference. It turns out that Toobnix also has a way to schedule uploads. I just need to upload it as unlisted first, and then choose Scheduled from the visibility. I wonder if peertube-cli can be extended to schedule things. Anyway, since I didn't know about that during the conference, I just used emacsconf-publish-upload-talk function to upload videos.

It was fun playing Interview with an Emacs Enthusiast in 2023 [Colorized] - YouTube at lunch. I put together some captions for it after the conference, so maybe we can play it with captions next year.

Recorded introductions

We record introductions so that hosts don't have to worry about how to say things on air. I should probably send the intro check e-mail earlier–maybe on the original video target date, even if speakers haven't submitted their videos yet. This will reduce the last-minute scramble to correct intros.

When I switched the shell scripts to use the cache directory, I forgot to get it to do the intros from that directory as well, so some of the uncorrected intros were played.

I forgot to copy the intro VTTs to the cache directory. This should be handled by the subed-record process for creating intros, so it'll be all sorted out next year.

Captioning

We used WhisperX for speech-to-text this year. It did a great job at preparing the first drafts of captions that our wonderful army of volunteer captioners could then edit. WhisperX's built-in voice activity detection cut down a lot on the hallucinations that OpenAI Whisper had during periods of silence in last year's captions, and there was only one instance of WhisperX missing a chunk of text from a speaker that I needed to manually fill in. I upgraded to a Lenovo P52 with 64GB RAM, so I was able to handle last-minute caption processing on my computer. It might be handy to have a smaller model ready for those last-minute requests, or have something ready to go for the commercial APIs.

The timestamps were a little bit off. It was really helpful that speakers and volunteers used the backstage area to check video quality. I used Aeneas to re-align the text, but Aeneas was also confused by silences. I've added some code to subed so that I can realign regions of subtitles using Aeneas or WhisperX timestamps, and I also wrote some code to skim timestamps for easy verification.

Anush V experimented with using machine learning for subtitle segmentation, so that might be something to explore going forward.

BigBlueButton web conference

This year we set up a new BigBlueButton web conferencing server. The server with our previous BigBlueButton instance had been donated by a defunct nonprofit, so it finally got removed on October 27. After investigating whether Jitsi or Galene might be a good fit for EmacsConf, we decided to continue with BigBlueButton. There were some concerns about non-free Mongo for BBB versions >= 2.3 and < 3, so I installed BBB 3.0. This was hard to get working on a Docker on the existing res server. We decided it was worth spinning up an additional Linode virtual private server. It turned out that BBB refused to run on anything smaller than 8GB/4core, so I scaled up to that during testing, scaled back down to 1GB/1core in between, and scaled up to 16GB/8core dedicated during the conference.

I'm still not 100% sure I set everything up correctly or that everything was stable. Maybe next year BBB 3.0 will be better-tested, someone more sysad-y can doublecheck the setup, or we can try Galene.

One of the benefits of upgrading to BBB 3.0 was that we could use the smart layout feature to drag the webcam thumbnails to the side of the shared screen. This made shared screens much easier to read. I haven't automated this yet, but it was easy enough for us to do via the shared VNC session.

On the plus side, it was pretty straightforward to use the Rails console to create all the rooms. We used moderator access codes to give all the speakers moderator access. Mysteriously, superadmins didn't automatically have moderator access to all the rooms even if they were logged in, so we needed to add host access by hand so that they could start the recordings.

Since we self-hosted and were budgeting more for the full-scale node, I didn't feel comfortable scaling it up to production size until a few days before the conference. I sent the access codes with the check-in e-mails to give speakers time to try things out.

Compared to last year's stats:

  2023 2024
Max number of simultaneous users 62 107
Max number of simultaneous meetings 6 7
Max number of people in one meeting 27 25
Total unique people 84 102
Total unique talking 36 40

(Max number of simultaneous users wasn't deduplicated, since we need that number for server load planning)

Tech checks and hosting

FlowyCoder did a great job getting everyone checked in, especially once I figured out the right checklist to use. We used people's emergency contact information a couple of times.

Corwin and Leo were able to jump in and out of the different streams for hosting. Sometimes they were both in the same Q&A session, which made it more conversational especially when they were covering for technical issues. We had a couple of crashes even though the tech checks went fine, so that was weird. Maybe something's up with BBB 3.0 or how I set it up.

Next time, we can consider asking speakers what kind of facilitation style they like. A chatty host? Someone who focuses on reading the questions and then gets out of the way? Speakers reading their own questions and the host focusing on timekeeping/troubleshooting?

Streaming

I experimented with setting up the live0 streaming node as a 64GB 32core dedicated CPU server, but that was overkill, so we went back down to 64GB 16core and it still didn't approach the CPU limits.

The 480p stream seemed stable, hooray! I had set it up last year to automatically kick in as soon as I started streaming to Icecast, and that worked out. I think I changed a loop to be while true instead of making it try 5 times, so that probably helped.

I couldn't get Toobnix livestreaming to work this year. On the plus side, that meant that I could use OBS to directly stream to YouTube instead of trying to set up multicasting. I set up one YouTube livestreaming event for each shift and added the RTMP keys to our shift checklists so that I could update the settings before starting the stream. That was pretty straightforward.

This year, I wrote a little randomizer function to display things on the countdown screen. At first I just dumped in https://www.gnu.org/fun/jokes/gnuemacs.acro.exp.en.html, but some of those were not quite what I was looking for. (… Probably should've read them all first!) Then I added random packages from GNU ELPA and NonGNU ELPA, and that was more fun. I might add MELPA next time too. The code for dumping random packages is probably worth putting into a different blog post, since it's the sort of thing people might like to add to their dashboards or screensavers.

I ran into some C-s annoyances in screen even with flow control turned off, so it might be a good idea to switch to tmux instead of screen.

Next year, I think it might be a good idea to make intro images for each talk. Then we can use that as the opening slide in BigBlueButton (unless they're already sharing something else) as well as a video thumbnail.

Publishing

The automated process for publishing talks and transcripts to the wiki occasionally needed nudging when someone else had committed a change to the wiki. I thought I had a git pull in there somewhere, but maybe I need to look at it some more.

I forgot to switch the conference publishing phase and enable the inclusion of Etherpads, but fortunately Ihor noticed. I did some last-minute hacking to add them in, and then I remembered the variables I needed to set. Just need to add it to our process documentation.

Etherpad

We used Etherpad 1.9.7 to collect Q&A again this year. I didn't upgrade to Etherpad v2.x because I couldn't figure out how to get it running within the time I set aside for it, but maybe that's something for next year.

I wrote some Elisp to copy the current ERC line (unwrapped) for easier pasting into Etherpad. That worked out really well, and it let me keep up with copying questions from IRC to the pad in between other bits of running around. (emacsconf-erc-copy in emacsconf-erc.el)

Next year, I'll add pronouns and pronunciations to the Etherpad template so that hosts can refer to them easily.

If I rejig the template to move the next/previous links so that notes can be added to the end, I might be able to use the Etherpad API to add text from IRC.

IRC

We remembered to give the libera.chat people a heads-up before the conference, so we didn't run into usage limits for https://chat.emacsconf.org. Yay!

Aside from writing emacsconf-erc-copy (emacsconf-erc.el) to make it easier to add text from IRC to the Etherpad, I didn't tinker much with the IRC setup for this year. It continued to be a solid platform for discussion.

I think a keyboard shortcut for inserting a talk's URL could be handy and should be pretty easy to add to my Embark keymap.

Extracting the Q&A

We sometimes forgot to start the recording for the Q&A until a few minutes into the talk. I considered extracting the Q&A recordings from the Icecast dump or YouTube stream recordings in order to get those first few minutes, but decided it wasn't worth it since people could generally figure out the answers.

Getting the recordings off BigBlueButton was easier this year because I configured it with video as an additional processing format, so we could grab one file per session instead of combining the different streams with ffmpeg.

I did a quick pass of the Q&A transcripts and chat logs to see if people mentioned anything that they might want to take out. I also copied IRC messages and the pads, and I copied over the answers from the transcripts using the new emacsconf-extract-subed-copy-section-text function.

Audio mixing was uneven. It might be nice to figure out separate audio recordings just in case (#12302, bigbluebutton-dev). We ended up not tinkering with the audio for the Q&A, so next time, I can probably upload them without waiting to see if anyone wants to fiddle with the audio.

Trimming the Q&A was pretty straightforward. I added a subed-crop-media-file function to subed so that I can trim files easily.

Thanks to my completion functions for adding section headings based on comments, it was easy to index the Q&A this year. I didn't even put it up backstage for people to work on.

Nudged by @ctietze, I'm experimenting with adding sticky videos if Javascript is enabled so that it's easier to navigate using the transcript. There's still a bit of tinkering to do, but it's a start.

I added some conference-related variables to a .dir-locals.el file so that I can more easily update things even for past conferences. This is mostly related to publishing the captions on the wiki pages, which I do with Emacs Lisp.

Budget and donations

Costs (USD, not including 13% tax):

52.54 Extra costs for hosting in December
3.11 Extra costs for BBB testing in November
120 Hosting costs year-round (two Linode nanodes)

Total of USD 175.65 + tax, or USD 198.48 for 2024.

The Free Software Foundation also provided media.emacsconf.org for serving media files. Ry P provided res.emacsconf.org for OBS streaming over VNC sessions.

Amin Bandali was away during the conference weekend and no one else knew how to get the list of donors and current donation stats from the FSF Working Together program on short notice. Next time, we can get that sorted out beforehand so that we can thank donors properly.

Documentation and time

I think my biggest challenge was having less time to prepare for EmacsConf this year because the kiddo wanted more of my attention. In many ways, the automation that I'd been gradually building up paid off. We were able to pull together EmacsConf even though I had limited focus time.

Here's my Emacs-related time data (including Emacs News and tweaking my config):

Year Jan Feb March April May June July Aug Sept Oct Nov Dec Total
2023 23.4 15.9 16.2 11.2 4.4 11.5 6.5 13.3 36.6 86.6 93.2 113.0 432
2024 71.2 12.0 5.6 6.6 3.3 9.6 11.0 4.7 36.0 40.3 52.3 67.7 320

(and here's a longer-term analysis going back to 2012.)

I spent 92.6 hours total in October and November 2024 doing Emacs-related things, compared to 179.8 hours the previous year – so, around half the time. Part of the 2023 total was related to preparing my presentation for EmacsConf, so I was much more familiar with my scripts then. Apparently, there was still a lot more that I needed to document. As I scrambled to get EmacsConf sorted out, I captured quick tasks/notes for the things I need to add to our organizers notebook. Now I get to go through all those notes in my inbox. Maybe next year will be even smoother.

On the plus side, all the process-related improvements meant that the other volunteers could jump in pretty much whenever they wanted, including during the conference itself. I didn't want to impose firm commitments on people or bug them too much by e-mail, so we kept things very chill in terms of scheduling and planning. If people were available, we had stuff people could help with. If people were busy, that was fine, we could manage. This was nice, especially when I applied the same sort of chill approach to myself.

I'd like to eventually get to the point of being able to mostly follow my checklists and notes from the start of the conference planning process to the end. I've been moving notes from year-specific organizer notebooks to the main organizers' notebook. I plan to keep that one as the main file for notes and processes, and then to have specific dates and notes in the yearly ones.

Thanks

  • Thank you to all the speakers, volunteers, and participants, and to all those other people in our lives who make it possible through time and support.
  • Thanks to Leo Vivier and Corwin Brust for hosting the sessions, and to FlowyCoder for checking people in.
  • Thanks to our proposal review volunteers James Howell, JC Helary, and others for helping with the early acceptance process.
  • Thanks to our captioning volunteers: Mark Lewin, Rodrigo Morales, Anush, annona, and James Howell, and some speakers who captioned their own talks.
  • Thanks to Leo Vivier for fiddling with the audio to get things nicely synced.
  • Thanks to volunteers who kept the mailing lists free from spam.
  • Thanks to Bhavin Gandhi, Christopher Howard, Joseph Turner, and screwlisp for quality-checking.
  • Thanks to shoshin for the music.
  • Thanks to Amin Bandali for help with infrastructure and communication.
  • Thanks to Ry P for the server that we're using for OBS streaming and for processing videos.
  • Thanks to the Free Software Foundation for Emacs itself, the mailing lists, the media.emacsconf.org server, and handling donations on our behalf through the FSF Working Together program. https://www.fsf.org/working-together/fund
  • Thanks to the many users and contributers and project teams that create all the awesome free software we use, especially: BigBlueButton, Etherpad, Icecast, OBS, TheLounge, libera.chat, ffmpeg, OpenAI Whisper, WhisperX, the aeneas forced alignment tool, PsiTransfer, subed, and many, many other tools and services we used to prepare and host this years conference
  • Thanks to everyone!

Overall

Good experience. Lots of fun. I'd love to do it again next year. EmacsConf feels like a nice, cozy get-together where people share the cool things they've been working on and thinking about. People had fun! They said:

  • "emacsconf is absolutely knocking it out of the park when it comes to conference logistics"
  • "I think this conference has defined the terms for a successful online conference."
  • "EmacsConf is one of the big highlights of my year every year. Thank you a ton for running this 😊"

It's one of the highlights of my year too. =) Looking forward to the next one!

In the meantime, y'all can stay connected via Emacs News, meetups (online and in person), Planet Emacslife, and now emacs.tv. Enjoy!

p.s. I'd love to learn from other people's conference blog posts, EmacsConf or otherwise. I'm particularly interested in virtual conferences and how we can tinker with them to make them even better. I'm having a hard time finding posts; please feel free to send me links to ones you've liked or written!

View org source for this post

emacs.tv

Posted: - Modified: | emacs

[2024-12-28 Sat]: I got emacstv-queue-random to fill the playlist with shuffled URLs, so it's all good now! =)

I came across Ruby Video on Hacker News and thought it was a good idea, particularly the topic view. I mentioned it in a toot and that seemed to strike a chord in the #emacs community there, so I exported some of the metadata for EmacsConf videos into an Org Mode file. @xenodium whipped up a quick web prototype at emacs.tv. I added a bunch of videos from Emacs News and wrote some code for playing the videos from Emacs, and then grabbed more videos from YouTube playlists and Vimeo search results. (Gotta find a good way to monitor PeerTube…) As of this writing, there are 2785 videos with a combined playtime of more than 1000 hours.

I am, in fact, listening to emacstv-background-mode as I write this. I was listening to it earlier while I played Minecraft with the kiddo. I'll probably shift some of my doomscrolling to shuffling through the emacs.tv web interface on my phone. I love hearing people's enthusiasm, and I occasionally pick up interesting tips along the way. (Gotta steal prot/window-single-toggle…)

It's easy to use little crumbs of time to add more tags to the videos.org file. Sometimes I use org-agenda with buffer restriction (<) and search (s) to mark/unmark (m, u) so that I can bulk-tag (B +). To make this even more convenient, I've added emacstv-agenda-search, emacstv-org-ql-search, and emacstv-org-ql-search-untagged so that I can do that bulk tagging from anywhere.

It would be nice to have mpv reuse the window. I wonder if I can queue up a number of videos instead of doing it one at a time, and if that would do the trick…

Anyway, the web interface is at https://emacs.tv and the Elisp code and data are at https://github.com/emacstv/emacstv.github.io . Enjoy!

View org source for this post

2024-12-23 Emacs news

| emacs, emacs-news

Links from reddit.com/r/emacs, r/orgmode, r/spacemacs, r/planetemacs, Mastodon #emacs, Hacker News, lobste.rs, programming.dev, lemmy.world, lemmy.ml, communick.news, planet.emacslife.com, YouTube, the Emacs NEWS file, Emacs Calendar, and emacs-devel. Thanks to Andrés Ramírez for emacs-devel links. Do you have an Emacs-related link or announcement? Please e-mail me at sacha@sachachua.com. Thank you!

View org source for this post

subed.el: Tweaking subtitle times

| emacs, subed

When subtitle times are too far off from the video or audio, people start worrying if their video has frozen or jumped ahead. It's good to keep subtitles roughly in time with the audio.

For EmacsConf, we can get timing information from two places. WhisperX produces a JSON file with word data in the process of doing the speech recognition, and the aeneas forced alignment tool can use synthesized text-to-speech to figure out the timestamps for each line of text compared to a media file.

Aeneas timestamps are more helpful once we start editing, but it can be confused by long silences, extraneous noises, multiple speakers, and inaccurate transcripts (words added or removed).

When I combine the WhisperX word data with subtitles, I can see where the times might need a closer look because matching words weren't found.

2024-12-18_12-27-37.png
Figure 1: Screenshot with word data loaded

Loading word data requires a pretty close match at the moment, but since we change only about 4% of the subtitle text when editing, those cues are still helpful. (I measured this by the Levenshtein distance between the combined cue texts of edited subtitles versus the original WhisperX transcripts, using string-distance to approximate the editing percentage.)

Calculating how much we edited
(let ((sum-original 0)
      (sum-dist 0))
  (append
   (seq-keep
    (lambda (talk)
      (when (and (emacsconf-talk-file talk "--main.vtt")
                 (emacsconf-talk-file talk "--reencoded.json"))
        (let* ((json-object-type 'alist)
               (json-array-type 'list)
               (edited-text
                (mapconcat (lambda (sub) (elt sub 3))
                           (subed-parse-file (emacsconf-talk-file talk "--main.vtt"))
                           " "))
               (original-text
                (mapconcat
                 (lambda (word)
                   (assoc-default 'word word))
                 (assoc-default
                  'word_segments
                  (json-read-file (emacsconf-talk-file talk "--reencoded.json")))
                 " "))
               (dist (string-distance original-text edited-text)))
          (setq sum-original (+ sum-original (length original-text)))
          (setq sum-dist (+ sum-dist dist))
          (list
           (length original-text)
           (length edited-text)
           dist))))
    (emacsconf-get-talk-info))
   '(hline)
   (list
    (list
     sum-original
     (format "%d%%" (/ (* 100.0 sum-dist) sum-original))
     sum-dist))))

To make it easier to correct subtitle timing, I added a few ways to tweak subtitle timing for a region of subtitles.

WhisperX: subed-word-data-fix-subtitle-timing in subed-word-data.el tries to match the word data from WhisperX against the text of the current subtitle, using string-distance for approximate matches. I start at about two words shorter than what's in the subtitle, and then increase the number of words taken from the data while the string distance decreases. I skip the data for words before the beginning of the first subtitle in the region.

Screencast of subed-word-data-fix-subtitle-timing

Aeneas: subed-align-region uses Aeneas to realign the subtitles from the region using the section of the media file between the start of the first subtitle and the end of the last subtitle in the region. When I notice that the times are off, I skim the subtitles (or just skim them visually) to find the last well-timed subtitle. Then I pick a subtitle that's in the incorrectly-timed section. I use subed-mpv-jump-to-current-subtitle (M-j) to jump to that position, and I play back that subtitle. It usually belongs to some text further down, so I reset to that position with M-j, set my mark before the previous correctly-timed subtitle with C-SPC, go to the subtitle that matches that time, and use subed-copy-player-pos-to-start-time (C-c [) to set the proper timestamp. Then I can go to the previous incorrectly-timed subtitle and use M-x subed-align-region. This runs the Aeneas forced alignment tool using just the subtitle text in the region, the starting timestamp of the first subtitle, and the ending timestamp of the last subtitle, making it easy to adjust that section. subed-align-region is in subed-align.el

Retiming by pressing SPC after each subtitle: As an experiment, I've also added a subed-retime-subtitles command that plays through the subtitles so that I can press SPC when the next subtitle starts. It begins with the current subtitle and stops when you press a key that's not in its keymap.

Screencast with audio: subed-retime-subtitles

Manual adjustments: For fine-tuning timestamps, I usually turn on subed-waveform-show-all and shift-left-click (subed-waveform-set-start-and-copy-to-previous) or shift-right-click (subed-waveform-set-stop-and-copy-to-next) on the waveforms because it's easy to see where the words and pauses are. When I'm not sure, I can use middle-click (subed-waveform-play-sample) to play part of the file without changing the subtitle start/stop or the MPV playback position.

Screencast with audio of using the waveforms

I'm experimenting with adding repeating keybindings. There was a subed-mpv-frame-step-map that was bound to C-c C-f, so I've renamed it to subed-mpv-control, added a whole bunch of keybindings to the subed-mpv-control-map based on MPV and Aegisub shortcuts, and made it a repeating transient map.

Screencast with audio, experimenting with the mpv control map

Ideas for next steps:

Gotta get the hang of all these new capabilities through practice! =)

To make my subed-align-region workflow even more convenient, I could use completing-read to let me select a future subtitle with completion, and then Emacs could automatically fix the subtitle start time, go to the previous subtitle, and realign the region.

Also, I think switching the waveforms from overlays to text properties could be a good idea. When I cut text, the overlays get left behind, but I want the waveforms to go away too.

While writing this post and fiddling with subed, I ended up adding a bunch of keybindings and a menu. I figured this was as good a time as any to stop tweaking it and finally publish. (But it's fun! Just one more idea…)

View org source for this post

2024-12-16 Emacs news

| emacs, emacs-news

Links from reddit.com/r/emacs, r/orgmode, r/spacemacs, r/planetemacs, Mastodon #emacs, Hacker News, lobste.rs, programming.dev, lemmy.world, lemmy.ml, communick.news, planet.emacslife.com, YouTube, the Emacs NEWS file, Emacs Calendar, and emacs-devel. Thanks to Andrés Ramírez for emacs-devel links. Do you have an Emacs-related link or announcement? Please e-mail me at sacha@sachachua.com. Thank you!

View org source for this post

2024-12-09 Emacs news

| emacs, emacs-news

Links from reddit.com/r/emacs, r/orgmode, r/spacemacs, r/planetemacs, Mastodon #emacs, Hacker News, lobste.rs, programming.dev, lemmy.world, lemmy.ml, communick.news, planet.emacslife.com, YouTube, the Emacs NEWS file, Emacs Calendar, and emacs-devel. Thanks to Andrés Ramírez for emacs-devel links. Do you have an Emacs-related link or announcement? Please e-mail me at sacha@sachachua.com. Thank you!

View org source for this post