Using Emacs Lisp to batch-demote HTML headings for my static site
| blogging, 11ty, emacsAssumed audience: People who have lots of HTML files used as input for a static site generator, might need to do a batch operation on them, and are open to doing that with Emacs Lisp. Which might just be me, but who knows? =)
HTML defines a hierarchy of headings going from
<h1>
to <h6>
, which comes in especially handy
when people are navigating with a screenreader or
converting web pages to Org Mode. I think search
engines might use them to get a sense of the
page's structure, too. On my blog, the hierarchy
usually goes like this:
<h1>
: site title,<h2>
: blog post titles, since I put multiple blog posts on the main page and category pages (ex: blogging)<h3>
: blog post's subheadings, if any<h4>
: I rarely need subsubheadings in my main blog posts, but they're there just in case
While fiddling with my blog's CSS so that I could
try this fluid type scale, I realized that the
subheadings in my exported blog entries started at
<h2>
instead of <h3>
. This meant that the outline was this:
- Site title
- Blog post 1
- Subheading 1
- Subheading 2
- Blog post 2
- Subheading 1
- Subheading 2
- Blog post 3
I wanted the outline to be this:
- Site title
- Blog post 1
- Subheading 1
- Subheading 2
- Blog post 2
- Subheading 1
- Subheading 2
- Blog post 3
- Blog post 1
This was because I hadn't changed
org-html-toplevel-hlevel
during my 11ty export
process. To solve this for new posts, I added a
new option org-11ty-toplevel-hlevel
that
defaults to 3 in ox-11ty.el, re-exported
one of my long blog posts to test it, and
confirmed that my headings now started at <h3>
.
I still had all my old HTML files with the wrong
levels of headings. I wrote some Emacs Lisp to
shift the headings downwards (h5 to h6, h4 to h5,
h3 to h4, h2 to h3) in a file if it had an <h2>
in it. Regular expressions are usually not a good
idea when it comes to HTML because there might be
exceptions, but I figured it was a pretty small
and low-risk change, so I decided not to use the
full XML/DOM parsing functions. I saved all the
blog posts under version control just in case I
messed things up. Here's my function:
(defun my-html-shift-headings (filename) "Shift heading tags in FILENAME." (interactive "FFile: ") (let ((case-fold-search t)) ; make the search case-insensitive (with-temp-buffer (insert-file-contents filename) (goto-char (point-min)) ;; Only modify the files where we have an h2 (when (or (search-forward "<h2" nil t) (search-forward "</h2>" nil t)) (goto-char (point-min)) ;; Handle both opening and closing tags (while (re-search-forward "<\\(/\\)?h\\([2-5]\\)\\>" nil t) (let* ((closing-tag (match-string 1)) (heading-level (string-to-number (match-string 2))) (new-level (1+ heading-level))) (replace-match (concat "<" closing-tag "h" (number-to-string new-level))))) (write-file filename) filename))))
Running it on all the source HTML files in
specific subdirectories was easy with
directory-files-recursively
.
(dolist (dir '("~/proj/static-blog/blog" "~/proj/static-blog/content")) (mapc 'my-html-shift-headings (directory-files-recursively dir "\\.html\\'")))
Then I could just rebuild my blog and get all the
right heading levels. Spot-checks with Inspect
Element show that the headings now have the right
tags, and org-web-tools-read-url-as-org
now
picks up the right hierarchy for the page.
Correcting the input files was easier and more
efficient than modifying my 11ty template engine
to shift the heading levels whenever I build my
site (probably by defining a preprocessor). I
could've written a NodeJS script to do that kind
of file manipulation, but writing it in Emacs Lisp
matched how I might think of doing it
interactively. Using Emacs Lisp was also easy to
test on one or two files, check the list of files
matched by directory-files-recursively
, and then
run it on everything.
Going forward, the new org-11ty-toplevel-hlevel
variable should properly modify the behaviour of
Org's HTML export to get the headings at the right
level. We'll see!