Moving 18 years of comments out of Disqus and into my 11ty static site
| 11ty, bloggingAssumed audience: Technical bloggers who like:
- static site generators: this post is about moving more things into my SSG
- XML: check out the mention of xq, which offers a jq-like interface
- or Org Mode: some notes here about Org Babel source blocks and graphing
I've been thinking of getting rid of the Disqus blog commenting system for a while. I used to use it in the hopes that it would handle spam filtering and the "someone has replied to your comment" notification for me. Getting rid of Disqus means one less thing that needs Javascript, one less thing that tracks people in ways we don't want, one less thing that shows ads and wants to sell our attention. Comments are rare enough these days, so I think I can handle e-mailing people when there are replies.
There are plenty of alternative commenting systems to choose from. Comentario and Isso are self-hosted, while Commento (USD 10/month) and Hyvor Talk (12 euro/month) are services. Utterances uses Github issues, which is probably not something I'll try as quite a few people in the Emacs community are philosophically opposed to Github. Along those lines, if I can find something that works without Javascript, that would be even better.
I could spend a few years trying to figure out which system I might like in terms of user interface, integration, and spam-filtering, but for now, I want to:
- remove Disqus
- keep the comments, since they add a lot to the page (ex: the conversation on A list of sharks that are obligate ram ventilators)
Fortunately, there's 11ty/eleventy-import-disqus (see zachleat's blog post: Import your Disqus Comments to Eleventy)
Exploring my disqus.xml with xq, Org Babel, and seaborn
One challenge: there are a lot of comments. How
many? I got curious about analyzing the XML, and
then of course I wanted to do that from Emacs. I
used pipx install yq
to install yq so that I
could use the xq tool to query the XML, much like
jq works.
My uncompressed Disqus XML export was 28MB. I spent
some time deleting spam comments through the web
interface, which helped with the filtering. I also
deleted some more comments from the XML file as I
noticed them. I needed to change /wp/
to /blog/
, too.
This is how I analyzed the archive for non-deleted posts, uniquified based on message. I'll include the full Org source of that block (including the header lines) in my blog post so that you can see how I call it later.
#+begin_src shell :var rest="| length | \"\\(.) unique comments\"" :exports results ~/.local/bin/xq -r "[.disqus.post[] | select(.isDeleted != \"true\" and .message) | {key: .message, value: .}] | map(.value) | unique_by(.message) ${rest}" < disqus.xml #+end_src
When I evaluate that with C-c C-c
, I get:
8265 unique comments
I was curious about how it broke down by year. Because I named the source code block and used a variable to specify how to process the filtered results earlier, I can call that with a different value.
Here's the call in my Org Mode source:
Table of comment count by year
Year | Count |
2025 | 26 |
2024 | 43 |
2023 | 34 |
2022 | 40 |
2021 | 55 |
2020 | 131 |
2019 | 107 |
2018 | 139 |
2017 | 186 |
2016 | 196 |
2015 | 593 |
2014 | 740 |
2013 | 960 |
2012 | 784 |
2011 | 924 |
2010 | 966 |
2009 | 1173 |
2008 | 1070 |
2007 | 98 |
I tried fiddling around with Org's #+PLOT keyword, but I couldn't figure out how to get the bar graph the way I wanted it to be. Someday, if I ever figure that out, I'll definitely save the Gnuplot setup as a snippet. For now, I visualized it using seaborn instead.
Code for graphing comments by year
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import numpy as np df = pd.DataFrame(data[1:], columns=data[0]) df['Count'] = df['Count'].astype(int) df['Year'] = df['Year'].astype(int) df = df.sort_values('Year') plt.figure(figsize=(12, 6)) ax = sns.barplot(x='Year', y='Count', data=df) plt.title('Comments by Year (2007-2025)', fontsize=16, fontweight='bold') plt.xlabel('Year') plt.ylabel('Comments') plt.xticks(rotation=45) plt.grid(axis='y') for i, v in enumerate(df['Count']): ax.text(i, v + 20, str(v), ha='center', fontsize=9) plt.tight_layout() plt.savefig('year_count_plot.svg') return 'year_count_plot.svg'
Ooooooh, I can probably cross-reference this with the number of posts from my /blog/all/index.json file. I used Claude AI's help to come up with the code below, since merging data and plotting them nicely is still challenging for me. Now that I have the example, though, maybe I can do other graphs more easily. (This looks like a related tutorial on combining barplots and lineplots.)
Code for graphing
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import numpy as np import json from matplotlib.ticker import FuncFormatter from datetime import datetime with open('/home/sacha/proj/static-blog/_site/blog/all/index.json', 'r') as f: posts_data = json.load(f) # Process post data posts_df = pd.DataFrame(posts_data) posts_df['Year'] = pd.to_datetime(posts_df['date']).dt.year post_counts = posts_df.groupby('Year').size().reset_index(name='post_count') # Convert to DataFrame comments_df = pd.DataFrame(comment_data[1:], columns=comment_data[0]) comments_df['Count'] = comments_df['Count'].astype(int) comments_df['Year'] = comments_df['Year'].astype(int) # Merge the two dataframes merged_df = pd.merge(post_counts, comments_df, on='Year', how='outer').fillna(0) merged_df = merged_df.sort_values('Year') # Calculate comments per post ratio merged_df['comments_per_post'] = merged_df['Count'] / merged_df['post_count'] merged_df['comments_per_post'] = merged_df['comments_per_post'].replace([np.inf, -np.inf], np.nan).fillna(0) # Create a single figure instead of two subplots fig, ax1 = plt.subplots(figsize=(15, 8)) # Custom colors post_color = "#1f77b4" # blue comment_color = "#ff7f0e" # orange ratio_color = "#2ca02c" # green # Setting up x-axis positions x = np.arange(len(merged_df)) width = 0.35 # Bar charts on first y-axis bars1 = ax1.bar(x - width/2, merged_df['post_count'], width, color=post_color, label='Posts') bars2 = ax1.bar(x + width/2, merged_df['Count'], width, color=comment_color, label='Comments') ax1.set_ylabel('Count (Posts & Comments)', fontsize=12) # Add post count values above bars for i, bar in enumerate(bars1): height = bar.get_height() if height > 0: ax1.text(bar.get_x() + bar.get_width()/2., height + 5, f'{int(height)}', ha='center', va='bottom', color=post_color, fontsize=9) # Add comment count values above bars for i, bar in enumerate(bars2): height = bar.get_height() if height > 20: # Only show if there's enough space ax1.text(bar.get_x() + bar.get_width()/2., height + 5, f'{int(height)}', ha='center', va='bottom', color=comment_color, fontsize=9) # Line graph on second y-axis ax2 = ax1.twinx() line = ax2.plot(x, merged_df['comments_per_post'], marker='o', color=ratio_color, linewidth=2, label='Comments per Post') ax2.set_ylabel('Comments per Post', color=ratio_color, fontsize=12) ax2.tick_params(axis='y', labelcolor=ratio_color) ax2.set_ylim(bottom=0) # Add ratio values near line points for i, ratio in enumerate(merged_df['comments_per_post']): if ratio > 0: ax2.text(i, ratio + 0.2, f'{ratio:.1f}', ha='center', color=ratio_color, fontsize=9) # Set x-axis labels ax1.set_xticks(x) ax1.set_xticklabels(merged_df['Year'], rotation=45) ax1.set_title('Blog Posts, Comments, and Comments per Post by Year', fontsize=16, fontweight='bold') ax1.grid(axis='y') # Add combined legend lines1, labels1 = ax1.get_legend_handles_labels() lines2, labels2 = ax2.get_legend_handles_labels() ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper left') # Layout and save plt.tight_layout() plt.savefig('posts_comments_analysis.svg') return 'posts_comments_analysis.svg'
Timeline notes:
- In this graph, comments are reported by the timestamp of the comment, not the date of the post.
- In 2007 or so, I moved to Wordpress from planner-rss.el. I think I eventually imported those Wordpress comments into Disqus when I got annoyed with Wordpress comments (Akismet? notifications?).
- In 2008 and 2009, I was working on enterprise social computing at IBM. I made a few presentations that were popular. Also, mentors and colleagues posted lots of comments.
- In 2012, I started my 5-year experiment with semi-retirement.
- In 2016, A+ was born, so I wrote much fewer posts.
- In 2019/2020, I wrote a lot of blog posts documenting how I was running EmacsConf with Emacs, and other Emacs tweaks along the way. The code is probably very idiosyncratic (… unless you happen to know other conference organizers who like to do as much as possible within Emacs? Even then, there are lots of assumptions in the code), but maybe people picked up useful ideas anyway. =)
What were my top 20 most-commented posts?
Emacs Lisp code for most-commented posts
(let* ((json-object-type 'alist) (json-array-type 'list) (comments-json (json-read-file "~/proj/static-blog/_data/commentsCounts.json")) (posts-json (json-read-file "~/proj/static-blog/_site/blog/all/index.json")) (post-map (make-hash-table :test 'equal))) ;; map permalink to title (dolist (post posts-json) (let ((permalink (cdr (assoc 'permalink post))) (title (cdr (assoc 'title post)))) (puthash permalink title post-map))) ;; Sort comments by count (descending) (mapcar (lambda (row) (list (cdr row) (org-link-make-string (concat "https://sachachua.com" (symbol-name (car row))) (with-temp-buffer (insert (or (gethash (symbol-name (car row)) post-map) (symbol-name (car row)))) (mm-url-decode-entities) (buffer-string))))) (seq-take (sort comments-json (lambda (a b) (> (cdr a) (cdr b)))) n)))
Top 3 by year. Note that this goes by the timestamp of the post, not the comment, so even old posts are in here.
Emacs Lisp code for most-commented posts by year
(let* ((json-object-type 'alist) (json-array-type 'list) (comments-json (json-read-file "~/proj/static-blog/_data/commentsCounts.json")) (posts-json (json-read-file "~/proj/static-blog/_site/blog/all/index.json")) by-year) (setq posts-json (mapcar (lambda (post) (let ((comments (alist-get (intern (alist-get 'permalink post)) comments-json))) (if comments (cons (cons 'comments (alist-get (intern (alist-get 'permalink post)) comments-json 0)) post) post))) posts-json)) (setq by-year (seq-group-by (lambda (o) (format-time-string "%Y" (date-to-time (alist-get 'date o)) "America/Toronto")) (seq-filter (lambda (o) (alist-get 'comments o)) posts-json))) (org-list-to-org (cons 'unordered (seq-keep (lambda (year) (list (org-link-make-string (concat "https://sachachua.com/blog/" (car year)) (car year)) (cons 'unordered (mapcar (lambda (entry) (list (format "%s (%d)" (org-link-make-string (concat "https://sachachua.com" (alist-get 'permalink entry)) (with-temp-buffer (insert (alist-get 'title entry)) (mm-url-decode-entities) (buffer-string))) (alist-get 'comments entry)))) (seq-take (sort (cdr year) (lambda (a b) (> (alist-get 'comments a) (alist-get 'comments b)))) n))))) (nreverse by-year)))))
- 2025
- 2024
- Using an Emacs Lisp macro to define quick custom Org Mode links to project files; plus URLs and search (6)
- Excerpts from a conversation with John Wiegley (johnw) and Adam Porter (alphapapa) about personal information management (5)
- Yay Emacs 1: EmacsConf 2023 report, SVG animation, Embark, Org Mode links (4)
- 2023
- 2022
- 2021
- 2020
- 2019
- 2018
- 2017
- 2016
- 2015
- 2014
- 2013
- 2012
- 2011
- 2010
- 2009
- 2008
- 2007
- 2006
- 2005
- Emacs: It’s all about people (4)
- Learning Bisaya (4)
- Networking (3)
- 2004
- nethack-el (4)
- RMAIL labels (4)
- Sketch website design (3)
- 2003
- 2002
As you can probably tell, I love writing about Emacs, especially when people drop by in the comments to:
- share that they'd just learned about some small thing I mentioned in passing and that it was really useful for this other part of their workflow that I totally wouldn't have guessed
- point out a simpler package or built-in Emacs function that also does whatever clever hack I wrote about, just in a more polished way
- link to a blog post or code snippet where they've borrowed the idea and added their own spin
I want to keep having those sorts of conversations.
Deleting spam comments via the Disqus web interface and Spookfox
8000+ comments are a lot to read, but it should be
pretty straightforward to review the comments at
least until 2016 or so, and then just clean out
spam as I come across it after that. I used the
Disqus web interface to delete spam comments since
the isSpam
attribute didn't seem to be reliable.
The web interface pages through comments 25 items
at a time and doesn't seem to let you select all
of them, so I started tinkering around with using
Spookfox to automate this. Spookfox lets me
control Mozilla Firefox from Emacs Lisp.
(progn ;; select all (spookfox-eval-js-in-active-tab "document.querySelector('.mod-bar__check input').click()") (wait-for 1) ;; delete (spookfox-eval-js-in-active-tab "document.querySelectorAll('.mod-bar__button')[2].click()") (wait-for 2) ;; click OK, which should make the list refresh (spookfox-eval-js-in-active-tab "btn = document.querySelectorAll('.mod-bar__button')[1]; if (btn.textContent.match('OK')) btn.click();") (wait-for 4) ;; backup: (spookfox-eval-js-in-active-tab "window.location.href = 'https://sachac.disqus.com/admin/moderate/spam'") )
I got to the end of the spam comments after maybe 10 or 20 pages, though, so maybe Disqus had auto-deleted most of the spam comments.
It's almost amusing, paging through all these
spammy attempts at link-building and product
promotion. I didn't want to click on any of the
links since there might be malware, so sometimes I
used curl
to check the site. Most of the old
spam links I checked don't even have working
domains any more. Anything that needed spam didn't
really have lasting power. It was all very "My
name is Ozymandias, king of kings: / Look on my
works, ye Mighty, and despair!"… and then gone.
Modifying eleventy-import-disqus for my site
Back to eleventy-import-disqus. I followed the
directions to make a contentMap.json and removed
the trailing ,
from the last entry so that the
JSON could be parsed.
Modifications to eleventy-import-disqus:
- The original code created all the files in the same directory, so I changed it to create the same kind of nested structure I use (generally
./blog/yyyy/mm/post-slug/index.html
and./blog/yyyy/mm/post-slug/index.11tydata.json
). I decided to store the Disqus comments inindex.json
, which is lower-priority than.11tydata.json.
fs-extra made this easier by creating all the parent directories. - Ignored deleted messages
- Discarded avatars
- Did some reporting to help me review potential spam
- Reparented messages if I deleted their parent posts
- Indent the thread JSON nicely in case I want to add or remove comments by hand
With the thread JSON files, my blog takes 143 seconds to generate, versus 133 seconds without the comments. +10 seconds isn't too bad. I was worried that it would be longer, since I added 2,088 data JSON files to the build process, but I guess 11ty is pretty efficient.
Next steps
It had been nice to have a comment form that people could fill in from anywhere and which shared their comments without needing my (often delayed) intervention. I learned lots of things from what people shared. Sometimes people even had discussions with each other, which was extra cool. Still, I think it might be a good time to experiment with alternatives. Plain e-mail for now, I guess, maybe with a nudge asking people if I could share their comments. Mastodon, too - could be fun to make it easy to add a toot to the static comments from mastodon.el or from my Org Mode inbox. (Update 2025-03-30: Adding Mastodon toots as comments in my 11ty static blog) Might be good to figure out Webmentions, too. (But then other people have been dealing with spam Webmentions, of course.)
Comment counts can be useful social signals for interesting posts. I haven't added comment counts to the lists of blog posts yet. eleventy-import-disqus created a commentsCounts.json, which I could use in my templates. However, I might change the comments in the per-post .json file if I figure out how to include Mastodon comments, so I may need to update that file or recalculate it from the posts.
Many of the blogs I read have shifted away from commenting systems, and the ones who still have comments on seem to be bracing for AI-generated comment spam. I'm not sure I like the way the Internet is moving, but maybe in this little corner, we can still have conversations across time. Comments are such a wonderful part of learning out loud. I wonder how we can keep learning together.