#EmacsConf backstage: looking at EmacsConf's growth over 5 years, and how to do pivot tables and graphs with Org Mode and the Python pandas library

| emacsconf, emacs, org, python

Having helped organize EmacsConf for a number of years now, I know that I usually panic about whether we have submissions partway through the call for participation. This causes us to extend the CFP deadline and ask people to ask more people to submit things, and then we end up with a wonderful deluge of talks that I then have to somehow squeeze into a reasonable-looking schedule.

This year, I managed to not panic and I also resisted the urge to extend the CFP deadline, trusting that there will actually be tons of cool stuff. It helped that my schedule SVG code let me visualize what the conference could feel like with the submissions so far, so we started with a reasonably nice one-track conference and built up from there. It also helped that I'd gone back to the submissions for 2022 and plotted them by the number of weeks before the CFP deadline, and I knew that there'd be a big spike from all those people whose Org DEADLINE: properties would nudge them into finalizing their proposals.

Out of curiosity, I wanted to see how the stats for this year compared with previous years. I wrote a small function to collect the data that I wanted to summarize:

emacsconf-count-submissions-by-week: Count submissions in INFO by distance to CFP-DEADLINE.
(defun emacsconf-count-submissions-by-week (&optional info cfp-deadline)
  "Count submissions in INFO by distance to CFP-DEADLINE."
  (setq cfp-deadline (or cfp-deadline emacsconf-cfp-deadline))
  (setq info (or info (emacsconf-get-talk-info)))
  (cons '("Weeks to CFP end date" "Count" "Hours")
        (mapcar (lambda (entry)
                  (list (car entry)
                        (length (cdr entry))
                        (apply '+ (mapcar 'cdr (cdr entry)))))
                (seq-group-by
                 'car
                 (sort
                  (seq-keep
                   (lambda (o)
                     (and (emacsconf-publish-talk-p o)
                          (plist-get o :date-submitted)
                          (cons (floor (/ (days-between (plist-get o :date-submitted) cfp-deadline)
                                          7.0))
                                (string-to-number
                                 (or (plist-get o :video-duration)
                                     (plist-get o :time)
                                     "0")))))
                   info)
                  (lambda (a b) (< (car a) (car b))))))))

and then I ran it against the different files for each year, filling in the previous years' data as needed. The resulting table is pretty long, so I've put that in a collapsible section.

(let ((years `((2023 "~/proj/emacsconf/2023/private/conf.org" "2023-09-15")
               (2022 "~/proj/emacsconf/2022/private/conf.org" "2022-09-18")
               (2021 "~/proj/emacsconf/2021/private/conf.org" "2021-09-30")
               (2020 "~/proj/emacsconf/wiki/2020/submissions.org" "2020-09-30")
               (2019 "~/proj/emacsconf/2019/private/conf.org" "2019-08-31"))))
  (append
   '(("Weeks to CFP" "Year" "Count" "Minutes"))
   (seq-mapcat
    (lambda (year-info)
      (let ((emacsconf-org-file (elt year-info 1))
            (emacsconf-cfp-deadline (elt year-info 2))
            (year (car year-info)))
        (mapcar (lambda (o) (list (car o) year (cadr o) (elt o 2)))
                (cdr (emacsconf-count-submissions-by-week (emacsconf-get-talk-info) emacsconf-cfp-deadline)))))
    years)))
Table
Weeks to CFP Year Count Minutes
-12 2023 4 70
-9 2023 2 30
-7 2023 2 30
-5 2023 2 30
-4 2023 2 60
-3 2023 3 40
-2 2023 5 130
-1 2023 10 180
0 2023 8 140
1 2023 1 20
-8 2022 2 25
-5 2022 2 31
-3 2022 2 31
-2 2022 2 17
-1 2022 8 191
0 2022 8 110
1 2022 5 107
-8 2021 4 50
-7 2021 2 17
-6 2021 1 7
-5 2021 2 22
-4 2021 2 19
-3 2021 5 73
-2 2021 1 10
-1 2021 12 163
0 2021 13 197
1 2021 1 10
2 2021 1 10
-5 2020 1 10
-4 2020 1 15
-2 2020 1 30
-1 2020 4 68
0 2020 21 424
1 2020 7 152
-5 2019 2 45
-4 2019 1 21
-2 2019 6 126
-1 2019 9 82
0 2019 9 148
2 2019 1 7

Some talks were proposed off-list and are not captured here, and cancelled or withdrawn talks weren't included either. The times for previous years use the actual video time, and the times for this year use proposed times.

Off the top of my head, I didn't know of an easy way to make a pivot table or cross-tab using just Org Mode or Emacs Lisp. I tried using datamash, but I was having a hard time getting my output just the way I wanted it. Fortunately, it was super-easy to get my data from an Org table into Python so I could use pandas.pivot_table. Because I had used #+NAME: submissions-by-week to label the table, I could use :var data=submissions-by-week to refer to the data in my Python program. Then I could summarize them by week.

Here's the number of submissions by the number of weeks to the original CFP deadline, so we can see people generally like to target the CFP date.

import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(data[1:], columns=data[0])
df = pd.pivot_table(df, columns=['Year'], index=['Weeks to CFP'], values='Count', aggfunc='sum', fill_value=0).iloc[::-1].sort_index(ascending=True)
fig, ax = plt.subplots()
figure = df.plot(title='Number of submissions by number of weeks to the CFP end date', ax=ax)
for line in ax.get_lines():
    if line.get_label() == '2023':
        line.set_linewidth(5)
for line in plt.legend().get_lines():
    if line.get_label() == '2023':
        line.set_linewidth(5)        
figure.get_figure().savefig('number-of-submissions.png')
return df
Weeks to CFP 2019 2020 2021 2022 2023
-12 0 0 0 0 4
-9 0 0 0 0 2
-8 0 0 4 2 0
-7 0 0 2 0 2
-6 0 0 1 0 0
-5 2 1 2 2 2
-4 1 1 2 0 2
-3 0 0 5 2 3
-2 6 1 1 2 5
-1 9 4 12 8 10
0 9 21 13 8 8
1 0 7 1 5 1
2 1 0 1 0 0
number-of-submissions.png

Calculating the cumulative number of submissions might be more useful. Here, each row shows the number received so far.

import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(data[1:], columns=data[0])
df = pd.pivot_table(df, columns=['Year'], index=['Weeks to CFP'], values='Count', aggfunc='sum', fill_value=0).iloc[::-1].sort_index(ascending=True).cumsum()
fig, ax = plt.subplots()
figure = df.plot(title='Cumulative submissions by number of weeks to the CFP end date', ax=ax)
for line in ax.get_lines():
    if line.get_label() == '2023':
        line.set_linewidth(5)
for line in plt.legend().get_lines():
    if line.get_label() == '2023':
        line.set_linewidth(5)        
figure.get_figure().savefig('cumulative-submissions.png')
return df
Weeks to CFP 2019 2020 2021 2022 2023
-12 0 0 0 0 4
-9 0 0 0 0 6
-8 0 0 4 2 6
-7 0 0 6 2 8
-6 0 0 7 2 8
-5 2 1 9 4 10
-4 3 2 11 4 12
-3 3 2 16 6 15
-2 9 3 17 8 20
-1 18 7 29 16 30
0 27 28 42 24 38
1 27 35 43 29 39
2 28 35 44 29 39
cumulative-submissions.png
Figure 1: Cumulative submissions by number of weeks to CFP end date

And here's the cumulative number of minutes based on the proposals.

import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(data[1:], columns=data[0])
df = pd.pivot_table(df, columns=['Year'], index=['Weeks to CFP'], values='Minutes', aggfunc='sum', fill_value=0).iloc[::-1].sort_index(ascending=True).cumsum()
fig, ax = plt.subplots()
figure = df.plot(title='Cumulative minutes by number of weeks to the CFP end date', ax=ax)
for line in ax.get_lines():
    if line.get_label() == '2023':
        line.set_linewidth(5)
for line in plt.legend().get_lines():
    if line.get_label() == '2023':
        line.set_linewidth(5)        
figure.get_figure().savefig('cumulative-minutes.png')
return df
Weeks to CFP 2019 2020 2021 2022 2023
-12 0 0 0 0 70
-9 0 0 0 0 100
-8 0 0 50 25 100
-7 0 0 67 25 130
-6 0 0 74 25 130
-5 45 10 96 56 160
-4 66 25 115 56 220
-3 66 25 188 87 260
-2 192 55 198 104 390
-1 274 123 361 295 570
0 422 547 558 405 710
1 422 699 568 512 730
2 429 699 578 512 730
cumulative-minutes.png
Figure 2: Cumulative minutes by number of weeks to the CFP end date

So… yeah… 730 minutes of talks for this year… I might've gotten a little carried away. But I like all the talks! And I want them to be captured in videos and maybe even transcribed by people who will take the time to change misrecognized words like Emax into Emacs! And I want people to be able to connect with other people who are interested in the sorts of stuff they're doing! So we're going to make it happen. The draft schedule's looking pretty full, but I think it'll work out, especially if the speakers send in their videos on time. Let's see how it all works out!

(…and look, I even got to learn how to do pivot tables and graphs with Python!)

You can comment with Disqus or you can e-mail me at sacha@sachachua.com.