Moving 18 years of comments out of Disqus and into my 11ty static site

| 11ty, blogging

Assumed audience: Technical bloggers who like:

  • static site generators: this post is about moving more things into my SSG
  • XML: check out the mention of xq, which offers a jq-like interface
  • or Org Mode: some notes here about Org Babel source blocks and graphing

I've been thinking of getting rid of the Disqus blog commenting system for a while. I used to use it in the hopes that it would handle spam filtering and the "someone has replied to your comment" notification for me. Getting rid of Disqus means one less thing that needs Javascript, one less thing that tracks people in ways we don't want, one less thing that shows ads and wants to sell our attention. Comments are rare enough these days, so I think I can handle e-mailing people when there are replies.

There are plenty of alternative commenting systems to choose from. Comentario and Isso are self-hosted, while Commento (USD 10/month) and Hyvor Talk (12 euro/month) are services. Utterances uses Github issues, which is probably not something I'll try as quite a few people in the Emacs community are philosophically opposed to Github. Along those lines, if I can find something that works without Javascript, that would be even better.

I could spend a few years trying to figure out which system I might like in terms of user interface, integration, and spam-filtering, but for now, I want to:

Fortunately, there's 11ty/eleventy-import-disqus (see zachleat's blog post: Import your Disqus Comments to Eleventy)

Exploring my disqus.xml with xq, Org Babel, and seaborn

One challenge: there are a lot of comments. How many? I got curious about analyzing the XML, and then of course I wanted to do that from Emacs. I used pipx install yq to install yq so that I could use the xq tool to query the XML, much like jq works.

My uncompressed Disqus XML export was 28MB. I spent some time deleting spam comments through the web interface, which helped with the filtering. I also deleted some more comments from the XML file as I noticed them. I needed to change /wp/ to /blog/, too.

This is how I analyzed the archive for non-deleted posts, uniquified based on message. I'll include the full Org source of that block (including the header lines) in my blog post so that you can see how I call it later.

#+NAME: analyze-disqus
#+begin_src shell :var rest="| length | \"\\(.) unique comments\"" :exports results
~/.local/bin/xq -r "[.disqus.post[] |
   select(.isDeleted != \"true\" and .message) |
   {key: .message, value: .}] |
  map(.value) |
  unique_by(.message) ${rest}" < disqus.xml
#+end_src

When I evaluate that with C-c C-c, I get:

8265 unique comments

I was curious about how it broke down by year. Because I named the source code block and used a variable to specify how to process the filtered results earlier, I can call that with a different value.

Here's the call in my Org Mode source:

#+CALL: analyze-disqus(rest="| map(.createdAt[0:4]) | group_by(.) | map([(.[0]), length]) | reverse | [\"Year\", \"Count\"], .[] | @csv") :results table output :wrap my_details Table of comment count by year
Table of comment count by year
Year Count
2025 26
2024 43
2023 34
2022 40
2021 55
2020 131
2019 107
2018 139
2017 186
2016 196
2015 593
2014 740
2013 960
2012 784
2011 924
2010 966
2009 1173
2008 1070
2007 98

I tried fiddling around with Org's #+PLOT keyword, but I couldn't figure out how to get the bar graph the way I wanted it to be. Someday, if I ever figure that out, I'll definitely save the Gnuplot setup as a snippet. For now, I visualized it using seaborn instead.

Code for graphing comments by year
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

df = pd.DataFrame(data[1:], columns=data[0])
df['Count'] = df['Count'].astype(int)
df['Year'] = df['Year'].astype(int)
df = df.sort_values('Year')
plt.figure(figsize=(12, 6))
ax = sns.barplot(x='Year', y='Count', data=df)
plt.title('Comments by Year (2007-2025)', fontsize=16, fontweight='bold')
plt.xlabel('Year')
plt.ylabel('Comments')
plt.xticks(rotation=45)
plt.grid(axis='y')
for i, v in enumerate(df['Count']):
    ax.text(i, v + 20, str(v), ha='center', fontsize=9)
plt.tight_layout()
plt.savefig('year_count_plot.svg')
return 'year_count_plot.svg'
year_count_plot.svg

Ooooooh, I can probably cross-reference this with the number of posts from my /blog/all/index.json file. I used Claude AI's help to come up with the code below, since merging data and plotting them nicely is still challenging for me. Now that I have the example, though, maybe I can do other graphs more easily. (This looks like a related tutorial on combining barplots and lineplots.)

Code for graphing
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import json
from matplotlib.ticker import FuncFormatter
from datetime import datetime

with open('/home/sacha/proj/static-blog/_site/blog/all/index.json', 'r') as f:
    posts_data = json.load(f)

# Process post data
posts_df = pd.DataFrame(posts_data)
posts_df['Year'] = pd.to_datetime(posts_df['date']).dt.year
post_counts = posts_df.groupby('Year').size().reset_index(name='post_count')

# Convert to DataFrame
comments_df = pd.DataFrame(comment_data[1:], columns=comment_data[0])
comments_df['Count'] = comments_df['Count'].astype(int)
comments_df['Year'] = comments_df['Year'].astype(int)

# Merge the two dataframes
merged_df = pd.merge(post_counts, comments_df, on='Year', how='outer').fillna(0)
merged_df = merged_df.sort_values('Year')

# Calculate comments per post ratio
merged_df['comments_per_post'] = merged_df['Count'] / merged_df['post_count']
merged_df['comments_per_post'] = merged_df['comments_per_post'].replace([np.inf, -np.inf], np.nan).fillna(0)

# Create a single figure instead of two subplots
fig, ax1 = plt.subplots(figsize=(15, 8))

# Custom colors
post_color = "#1f77b4"    # blue
comment_color = "#ff7f0e" # orange
ratio_color = "#2ca02c"   # green

# Setting up x-axis positions
x = np.arange(len(merged_df))
width = 0.35

# Bar charts on first y-axis
bars1 = ax1.bar(x - width/2, merged_df['post_count'], width, color=post_color, label='Posts')
bars2 = ax1.bar(x + width/2, merged_df['Count'], width, color=comment_color, label='Comments')
ax1.set_ylabel('Count (Posts & Comments)', fontsize=12)

# Add post count values above bars
for i, bar in enumerate(bars1):
    height = bar.get_height()
    if height > 0:
        ax1.text(bar.get_x() + bar.get_width()/2., height + 5,
                f'{int(height)}', ha='center', va='bottom', color=post_color, fontsize=9)

# Add comment count values above bars
for i, bar in enumerate(bars2):
    height = bar.get_height()
    if height > 20:  # Only show if there's enough space
        ax1.text(bar.get_x() + bar.get_width()/2., height + 5,
                f'{int(height)}', ha='center', va='bottom', color=comment_color, fontsize=9)

# Line graph on second y-axis
ax2 = ax1.twinx()
line = ax2.plot(x, merged_df['comments_per_post'], marker='o', color=ratio_color,
              linewidth=2, label='Comments per Post')
ax2.set_ylabel('Comments per Post', color=ratio_color, fontsize=12)
ax2.tick_params(axis='y', labelcolor=ratio_color)
ax2.set_ylim(bottom=0)

# Add ratio values near line points
for i, ratio in enumerate(merged_df['comments_per_post']):
    if ratio > 0:
        ax2.text(i, ratio + 0.2, f'{ratio:.1f}', ha='center', color=ratio_color, fontsize=9)

# Set x-axis labels
ax1.set_xticks(x)
ax1.set_xticklabels(merged_df['Year'], rotation=45)
ax1.set_title('Blog Posts, Comments, and Comments per Post by Year', fontsize=16, fontweight='bold')
ax1.grid(axis='y')

# Add combined legend
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper left')

# Layout and save
plt.tight_layout()
plt.savefig('posts_comments_analysis.svg')
return 'posts_comments_analysis.svg'
posts_comments_analysis.svg

Timeline notes:

  • In this graph, comments are reported by the timestamp of the comment, not the date of the post.
  • In 2007 or so, I moved to Wordpress from planner-rss.el. I think I eventually imported those Wordpress comments into Disqus when I got annoyed with Wordpress comments (Akismet? notifications?).
  • In 2008 and 2009, I was working on enterprise social computing at IBM. I made a few presentations that were popular. Also, mentors and colleagues posted lots of comments.
  • In 2012, I started my 5-year experiment with semi-retirement.
  • In 2016, A+ was born, so I wrote much fewer posts.
  • In 2019/2020, I wrote a lot of blog posts documenting how I was running EmacsConf with Emacs, and other Emacs tweaks along the way. The code is probably very idiosyncratic (… unless you happen to know other conference organizers who like to do as much as possible within Emacs? Even then, there are lots of assumptions in the code), but maybe people picked up useful ideas anyway. =)

What were my top 20 most-commented posts?

Emacs Lisp code for most-commented posts
(let* ((json-object-type 'alist)
       (json-array-type 'list)
       (comments-json (json-read-file "~/proj/static-blog/_data/commentsCounts.json"))
       (posts-json (json-read-file "~/proj/static-blog/_site/blog/all/index.json"))
       (post-map (make-hash-table :test 'equal)))
  ;; map permalink to title
  (dolist (post posts-json)
    (let ((permalink (cdr (assoc 'permalink post)))
          (title (cdr (assoc 'title post))))
      (puthash permalink title post-map)))
  ;; Sort comments by count (descending)
  (mapcar
   (lambda (row)
     (list
      (cdr row)
            (org-link-make-string
       (concat "https://sachachua.com" (symbol-name (car row)))
       (with-temp-buffer
         (insert (or (gethash (symbol-name (car row)) post-map) (symbol-name (car row))))
         (mm-url-decode-entities)
         (buffer-string)))))
   (seq-take
    (sort comments-json
          (lambda (a b) (> (cdr a) (cdr b))))
    n)))
97 blog/contact
88 Even more awesome LotusScript mail merge for Lotus Notes + Microsoft Excel
75 blog/about
45 How to Learn Emacs: A Hand-drawn One-pager for Beginners / A visual tutorial
42 Planning an Emacs-based personal wiki – Org? Muse? Hmm…
38 Married!
37 Moving from testing to development
36 What can I help you learn? Looking for mentees
33 Lotus Notes mail merge from a Microsoft Excel spreadsheet
30 Nothing quite like Org for Emacs
30 Org-mode and habits
29 zomg, Evernote and Emacs
25 Literate programming and my Emacs configuration file
25 Reinvesting time and money into Emacs
23 The Gen Y Guide to Web 2.0 at Work
22 Drupal: Overriding Drupal autocompletion to pass more parameters
21 Rhetoric and the Manila Zoo; reflections on conversations and a request for insight
20 This is a test post from org2blog
19 Agendas
19 Paper, Tablet, and Tablet PC: Comparing tools for sketchnoting

Top 3 by year. Note that this goes by the timestamp of the post, not the comment, so even old posts are in here.

Emacs Lisp code for most-commented posts by year
(let* ((json-object-type 'alist)
       (json-array-type 'list)
       (comments-json (json-read-file "~/proj/static-blog/_data/commentsCounts.json"))
       (posts-json (json-read-file "~/proj/static-blog/_site/blog/all/index.json"))
       by-year)
  (setq posts-json
        (mapcar
         (lambda (post)
           (let ((comments (alist-get (intern (alist-get 'permalink post)) comments-json)))
             (if comments
                 (cons (cons 'comments (alist-get (intern (alist-get 'permalink post)) comments-json 0))
                       post)
               post)))
         posts-json))
  (setq by-year
        (seq-group-by
         (lambda (o)
           (format-time-string "%Y"
                               (date-to-time
                                (alist-get 'date o))
                               "America/Toronto"))
         (seq-filter (lambda (o) (alist-get 'comments o)) posts-json)))
  (org-list-to-org
   (cons 'unordered
         (seq-keep
          (lambda (year)
            (list
             (org-link-make-string (concat "https://sachachua.com/blog/" (car year))
                                   (car year))
             (cons 'unordered
                   (mapcar
                    (lambda (entry)
                      (list (format "%s (%d)"
                                    (org-link-make-string
                                     (concat "https://sachachua.com" (alist-get 'permalink entry))
                                     (with-temp-buffer
                                       (insert (alist-get 'title entry))
                                       (mm-url-decode-entities)
                                       (buffer-string)))
                                    (alist-get 'comments entry))))
                    (seq-take
                     (sort
                      (cdr year)
                      (lambda (a b) (> (alist-get 'comments a)
                                       (alist-get 'comments b))))
                     n)))))
          (nreverse by-year)))))

As you can probably tell, I love writing about Emacs, especially when people drop by in the comments to:

  • share that they'd just learned about some small thing I mentioned in passing and that it was really useful for this other part of their workflow that I totally wouldn't have guessed
  • point out a simpler package or built-in Emacs function that also does whatever clever hack I wrote about, just in a more polished way
  • link to a blog post or code snippet where they've borrowed the idea and added their own spin

I want to keep having those sorts of conversations.

Deleting spam comments via the Disqus web interface and Spookfox

8000+ comments are a lot to read, but it should be pretty straightforward to review the comments at least until 2016 or so, and then just clean out spam as I come across it after that. I used the Disqus web interface to delete spam comments since the isSpam attribute didn't seem to be reliable. The web interface pages through comments 25 items at a time and doesn't seem to let you select all of them, so I started tinkering around with using Spookfox to automate this. Spookfox lets me control Mozilla Firefox from Emacs Lisp.

(progn
  ;; select all
  (spookfox-eval-js-in-active-tab "document.querySelector('.mod-bar__check input').click()")
  (wait-for 1)
  ;; delete
  (spookfox-eval-js-in-active-tab "document.querySelectorAll('.mod-bar__button')[2].click()")
  (wait-for 2)
  ;; click OK, which should make the list refresh
  (spookfox-eval-js-in-active-tab "btn = document.querySelectorAll('.mod-bar__button')[1]; if (btn.textContent.match('OK')) btn.click();")
  (wait-for 4)
  ;; backup: (spookfox-eval-js-in-active-tab "window.location.href = 'https://sachac.disqus.com/admin/moderate/spam'")
  )

I got to the end of the spam comments after maybe 10 or 20 pages, though, so maybe Disqus had auto-deleted most of the spam comments.

It's almost amusing, paging through all these spammy attempts at link-building and product promotion. I didn't want to click on any of the links since there might be malware, so sometimes I used curl to check the site. Most of the old spam links I checked don't even have working domains any more. Anything that needed spam didn't really have lasting power. It was all very "My name is Ozymandias, king of kings: / Look on my works, ye Mighty, and despair!"… and then gone.

Modifying eleventy-import-disqus for my site

Back to eleventy-import-disqus. I followed the directions to make a contentMap.json and removed the trailing , from the last entry so that the JSON could be parsed.

Modifications to eleventy-import-disqus:

  • The original code created all the files in the same directory, so I changed it to create the same kind of nested structure I use (generally ./blog/yyyy/mm/post-slug/index.html and ./blog/yyyy/mm/post-slug/index.11tydata.json). I decided to store the Disqus comments in index.json, which is lower-priority than .11tydata.json. fs-extra made this easier by creating all the parent directories.
  • Ignored deleted messages
  • Discarded avatars
  • Did some reporting to help me review potential spam
  • Reparented messages if I deleted their parent posts
  • Indent the thread JSON nicely in case I want to add or remove comments by hand

With the thread JSON files, my blog takes 143 seconds to generate, versus 133 seconds without the comments. +10 seconds isn't too bad. I was worried that it would be longer, since I added 2,088 data JSON files to the build process, but I guess 11ty is pretty efficient.

Next steps

It had been nice to have a comment form that people could fill in from anywhere and which shared their comments without needing my (often delayed) intervention. I learned lots of things from what people shared. Sometimes people even had discussions with each other, which was extra cool. Still, I think it might be a good time to experiment with alternatives. Plain e-mail for now, I guess, maybe with a nudge asking people if I could share their comments. Mastodon, too - could be fun to make it easy to add a toot to the static comments from mastodon.el or from my Org Mode inbox. (Update 2025-03-30: Adding Mastodon toots as comments in my 11ty static blog) Might be good to figure out Webmentions, too. (But then other people have been dealing with spam Webmentions, of course.)

Comment counts can be useful social signals for interesting posts. I haven't added comment counts to the lists of blog posts yet. eleventy-import-disqus created a commentsCounts.json, which I could use in my templates. However, I might change the comments in the per-post .json file if I figure out how to include Mastodon comments, so I may need to update that file or recalculate it from the posts.

Many of the blogs I read have shifted away from commenting systems, and the ones who still have comments on seem to be bracing for AI-generated comment spam. I'm not sure I like the way the Internet is moving, but maybe in this little corner, we can still have conversations across time. Comments are such a wonderful part of learning out loud. I wonder how we can keep learning together.

View org source for this post
You can comment on Mastodon, view 8 comments, or e-mail me at sacha@sachachua.com.

8 comments

@sacha certainly, go ahead 🙂

I'm not completely happy with my current ox-hugo blogging toolchain -- it works, but it is a bit kludgy and I think I can do better in terms of automation and workflow. Now that I've got a bit more experience under my belt, I might start looking again into further org mode publishing options.

@MattEducator I'm on the way to figuring out how to add Mastodon toots to my new setup, and maybe the code might be something you can use as a building block when you have time. :) https://sachachua.com/dotemacs/index.html#mastodon-adding-mastodon-toots-as-comments-in-my-11ty-static-blog

May I run it on your comment to include it on that post?

@sacha the reason I use utterenc.es is that my org>hugo blog is already hosted on Github Pages for performance and reliability reasons. The biggest downside is that you need a github account to comment -- if people aren't developers or if they have philisophical objections (as you noted) then this is a barrier. I was wondering how I could integrate Mastodon with my blog posts, and see you have also been exploring this idea, so I'm keen to look more closely what you have been doing when I get a breather...

@sacha @mrg That’s neat!

This makes me realize that I didn’t really think about using mastodon.el to fetch replies. Maybe I should: It would be more emacsy than the approach I took (request to the Mastodon API and parse the JSON myself). And it would take care of authentication, which is needed to fetch more than 60 replies.

I did use mastodon.el to send auto-replies telling people their comments had been published.

@mrg no no no, it's totally useful, I'm glad you mentioned it, don't feel like holding back useful links just in case I might have already seen it. =)

I like comments because of that 90-9-1 thing (participation inequality); ~90% of people lurk, ~9% of people might comment/edit, and maybe 1% might have their own blogs. I wish there's a better way to make it easier for people to occasionally drop by and share their thoughts. I think some of it will go into e-mail, which is still nice, but public conversations are extra nice because other people get a sense of people's questions and ideas.

@sacha Well, of course you did, sorry for reply-guying. :-)

I’m kinda sad to see the ”native” comments disappear from a lot of sites. I realise they don’t really make sense anymore and that the OG blogs didn’t have them, but still…

@mrg @noctuaminervae I remember reading Comments via Mastodon briefly when I put it in Emacs News. There's a new Mastodon feature for fetching all replies which might be useful, although I'm not sure when my GoToSocial instance will support something similar.

Since I'm using JSON to store the comments so that I can import all the ones even for posts I don't have the Org Mode source for, I'm probably going to see if there's a way I can add mastodon.el toots to my blog/yyyy/mm/post-title/index.json (creating it as needed).

Tangentially: mapping over handles to make a thank you list was why I wrote mastodon.el: Collect handles in clipboard (Emacs kill ring) for myself, which could probably be easily modified to save the toot or author URLs as text properties and then make those available for inserting into Org Mode. Although that's more for if you're in mastodon.el; @noctuaminervae, you probably already have something similar that takes advantage of the data that's already in Org Mode. =)