6100 comments
2357 subscribers
6266 on Twitter
Subscribe! Feed reader E-mail

Automating tedious wiki editing tasks with Emacs and w3m

I needed to update many of the links in our wiki because a team member left, so I had to reupload all of her files to a shared service and change all the URLs to point to the new files. Unfortunately, the file service didn’t send me the former URLs of the files, so that was going to be a manual process. Our wiki had 149 pages in it. Not fun.

After a few pages of editing (and correcting the occasional typo that crept in as I changed URLs), I decided to partially automate the process. Using a smidgen of Emacs Lisp, I created a function that pasted text into a temporary buffer, performed whatever automatic fixes it could make, prompted me for any URLs it didn’t recognize, remembered the old URL – new URL mapping I defined, and copied the text back.

The function looked somewhat like this:

(defvar sacha/wiki-links nil "Associative list of (old-url . new-url).")
(defun sacha/wiki-fix ()
  (interactive)
  (with-temp-buffer
    ;; Insert text from clipboard
    (yank)
    (goto-char (point-min))
    ;; Look for all the links 
    (while (re-search-forward
            "\\[\\([^|]+\\)|\\([^\]]+\\)\\]" nil t)
      ;; Check if it's one of the links I want to replace
      (if (or (string-match-p "viewpage" (match-string 2))
              (string-match-p "lsoohoo" (match-string 2)))
          (replace-match
           (save-match-data
             ;; Prompt and the entry to the map if it does not yet exist
             (unless (assoc (match-string 2) sacha/wiki-links)
               (add-to-list 'sacha/wiki-links
                            (cons (match-string 2)
                                  (read-string (concat (match-string 1)
                                                       "? ")))))
             ;; pick up the corresponding URL
             (cdr (assoc (match-string 2) sacha/wiki-links)))
           t t nil 2)))
    ;; Copy the text into the clipboard
    (kill-new (buffer-string))))
                   

I used M-x global-set-key to bind a convenient function key to it (F12, I think), and then it was just a matter of clicking on each page, clicking on Edit, typing Ctrl-C to copy the text, switching to Emacs, pressing F12, switching back to my browser, typing Ctrl-V, and saving the wiki page. I also added some lines (not shown here) to convert the previous wiki gardener’s full links to intrawiki links, change server URLs, and do other fun things.

I thought about fully automating it (somehow hooking into w3, perhaps?), but that seemed to be more trouble than needed. Besides, it was good to review all the pages.

As a result of this Emacs wizardry, processing all 149 wiki pages took me a few hours instead of a few days. Yay!

Of course, I finished the last wiki page, I found out that I needed to change the servers in the URL. I decided to go ahead and fully automate the darn thing.

I extracted a list of URLs for the wiki by viewing the tree version of the wiki index. It used Javascript, so I couldn’t just pull the URLs out of the source code. Fortunately, the Firebug plugin for Firefox lets me copy the rendered HTML, so I used that instead. Some judicious text-editing later (replace-regexp rocks), I had a list of URLs to the different pages. I knew I needed to put in some kind of delay when loading web pages. sleep-for let me spread out my requests so I didn’t hammer the server too badly. Reading the w3m.el source code turned up w3m-async-exec. Once I set that to nil, requesting web pages and running code on the results turned out to be straightforward. Selecting the right widgets was a bit of a hack (re-search-forward here, w3m-previous-anchor there), but hey, it worked. After confirming it by manually running it on a few pages, I left it merrily running in the background.

Here it is (some tweaking required):

(defun sacha/edit-wiki-page ()
  (interactive)
  (let ((buffer (current-buffer))
        (w3m-async-exec nil)
        (delay 5)) ;; number of seconds
    ;; While not at the end of the buffer
    (while (not (eobp))
      ;; Load the URL on the current line
      (w3m-browse-url
       (buffer-substring
        (line-beginning-position)
        (line-end-position)))
      ;; Look for the edit button
      (goto-char (point-min))
      (when (search-forward "Edit" nil t)
        ;; Click it
        (w3m-view-this-url)
        ;; Look for the Minor change checkbox
        (goto-char (point-min))
        (when (search-forward "Minor change" nil t)
          ;; The text area is the second widget back
          (w3m-previous-anchor 2)
          ;; Open the text area in a temporary buffer for editing
          (w3m-view-this-url)
          ;; Do the changes
          (while (re-search-forward "https?://example.com/path" nil t)
            (replace-match "http://path.example.com" t t nil 0))
          ;; Save the value
          (w3m-form-input-textarea-set)
          (when (search-backward "Save" nil t)
            (w3m-view-this-url))))
      (switch-to-buffer buffer)
      (forward-line)
      (sleep-for delay))))

I’m sure this kind of automation might be possible with lots of hacking in Mozilla Firefox, and I’ve seen great scripts for the Mac, too. But I know Emacs, I’m comfortable digging into source code, and I can make things work.

Awesome. =D

Short URL: http://sachachua.com/blog/p/6849
  • Justin Wiley

    Tangential to the discussion, but I wanted to say nice syntax highlighting on the code samples

  • http://sachachua.com Sacha Chua

    I used the excellent Htmlize for Emacs, with a little bit of HTML tweaking afterwards. Wouldn’t you know it – the source code formatter plugin I have for Windows Live Writer doesn’t support LISP… =)

  • Giovanni Ridolfi

    > I’m sure this kind of automation might be possible
    > with lots of hacking [...] and I’ve seen great scripts [...]
    > too. But I know Emacs, I’m comfortable digging
    > into source code, and I can make things work.
    > Sacha Chua

    I hope you don’t mind if I use your sentence in my fortune, do you? ;-)

    Yesterday in the mailing list of my LUG (http://erlug.linux.it) we were
    asked how to order some paragraphs in a file.
    I proposed Emacs (sort-paragraphs), they told me that
    by hand was boring, I proposed to automate the task with
    emacsclient :-)

  • Rick Innis

    Emacs and I had a long and intense relationship many years ago, but over time we drifted apart. I was working on different platforms and after a while I started seeing other editors. We’ve had a couple of flings since then but the spark just wasn’t there. But posts like this remind me of what brought us together in the first place, and some of the good times we had.

    :-)

  • http://tychoish.com tychoish

    I use ikiwiki so I can do things like this with dired, and git, no (extra) lisp needed. But I’ve saved this stuff because I’m sure that there’ll be points where I’ll find myself doing something like this…

    Thanks for sharing.

    Cheers!

On This Day...

  • 2012: Weekly review: Week ending November 16, 2012 — I’ve been ramping up meetings and helping people out. This is good! From last week’s plans Business [X] Earn: [...]
  • 2011: Transcript: Blogging (Part 12): Two homes — Hat-tip to Holly Tse for organizing this interview! At the end of the blog series, I’ll put them all together [...]
  • 2010: Sketchnotes: Why I do them, how I do them, and how you can get started — Why do I sketch my notes? A few years ago, I sketched a presentation just for fun, to see if I [...]
  • 2009: How I learned to stop worrying and love the webinar: Part 4: Taking the next steps — The only reason to give a presentation is to help people act or think differently. I’ve tried almost everything that [...]
  • 2005: “What should I do with my life?” — What Should I Do With My Life? The real meaning of success — and how to find it Those who are [...]
  • 2005: Cold! — My ears hurt earlier. I guess Canadians have just adapted to the weather by evolving thicker ears. Mine were almost icy! [...]
  • 2005: Teaching reflections — Yesterday’s class session went well. FINALLY! I felt like I was _really_ doing something. What made the difference? Attendance. The professor emphasized [...]
  • 2005: Flash fiction: “Nine Lives to One” — In response to flashxer prompt: THE OLD MAN WAS SENILE, BEDRIDDEN, AND LYING IN HIS OWN WASTE. HE WAS WHEEZING HARD WHEN [...]
  • 2004: DLSU Braille SMS project — Someone tell me why I have to go all the way to an Indian newspaper to find out about a cool [...]
  • 2004: Workaround for tla signed archive problems — > create a file called ~/.arch-paramas/signing/sacha@free.net.ph—main.check and put, as the only line, ‘echo -n’ in it. echo -n produces no output and returns 0 [...]
  • 2004: Project Roselle — Hello, everyone! I was going to write a formal news-y write-up, but I’ll leave that for the serious bloggers. Here’s the long [...]
  • 2004: Charles Yeung’s blog — My blog is at http://thespoke.net/MyBlog/Enrison/MyBlog.aspx <laugh> Where’s everyone else? (See 2004.11.16#4 for the ‘mini-testimonials’ about the ACM contestants. =) ) E-Mail%20from%20Richi’s%20server
  • 2004: Ateneo CS department is so cool — I can’t get over the fact that not only does our department chair have a blog (Headless Chicken (Didith Rodrigo)), [...]
  • 2004: Lifehacks: Doing my mail on the train — If I leave work right after the clock chimes at 5:30, I almost always manage to get a seat on the [...]
  • 2004: Aha! — Finally integrated my ever-so-funky JUnit-tested business logic layer with JSPs and actions through Struts. This calls for a celebration! Open the bag [...]
  • 2004: Handling login authentication in Struts — Best practice: Extend RequestProcessor and perform authentication in methods such as processActionPerform() or processRoles(). - http://www.javaworld.com/javaworld/jw-09-2004/jw-0913-struts-p2.html
  • 2003: On puzzles and conversation — Had a pretty amazing weekend. Invited Jerome and Dominique over. Spent some 16 hours conversing over a puzzle. Great fun. Attended the [...]
  • 2003: On teachers’ jobs — >>> learn C by him/herself, after all it is the student's responsibility. >>> Implementing the solution is up to the student, [...]
  • 2003: Eric’s site — Eric Vidal’s real website Don’t believe his Google footprint. ;)
  • 2003: Nice quote — computer science education — Bong Munoz said: Before I step off my soapbox I’m pretty confident that we’ll get better results in the future. The [...]
  • 2003: Calendar feature? — Colin Marquardt said: I noticed that calendar feature on your wiki journal. Is it one of the things mentioned in http://www.emacswiki.org/cgi-bin/oddmuse/Calendar_Extension or something different? E-Mail [...]

Get the highlights as a PDF!

Stories from my Twenties: Highlights from a Decade of Blogging

Free sample!