Emacs: url-retrieve-synchronously and set-buffer-multibyte

| emacs

As part of preparing Emacs News, I have some code that makes a list of headlines from RSS feeds like the one from Planet Emacslife and the news posts from the Org Mode mailing list (posts with [BLOG] in the subject line). The problem was that Unicode characters ended up being weird (like é), so I usually ended up deleting them, looking up the correct characters, and replacing them.

Thanks to this Reddit thread, I found out that all I needed to get UTF-8 properly interpreted was to add (set-buffer-multibyte t) once I was in the buffer.

Here's the source code now:

(require 'url)
(require 'xml-rpc)
(defun my-org-list-from-rss (url from-date &optional to-date)
  "Convert URL to an Org list. Return entries between FROM-DATE and TO-DATE.
FROM-DATE and TO-DATE should be strings of the form YYYY-MM-DD."
  (with-current-buffer (url-retrieve-synchronously url)
    (set-buffer-multibyte t)   ;; This fixes accented characters
    (goto-char (point-min))
    (re-search-forward "<\\?xml")
    (goto-char (match-beginning 0))
    (let* ((feed (xml-parse-region (point) (point-max)))
           (from-time (org-read-date nil t from-date))
           (to-time (if to-date (org-read-date nil t to-date)))
           (is-rss (> (length (xml-get-children (car feed) 'entry)) 0)))
      (mapconcat (lambda (link)
                   (format "- %s\n"
                           (org-link-make-string (car link) (cdr link))))
                 (if is-rss
                     (mapcar
                      (lambda (entry)
                        (cons
                         (xml-get-attribute (car
                                             (or
                                              (seq-filter (lambda (x) (string= (xml-get-attribute x 'rel) "alternate"))
                                                          (xml-get-children entry 'link))
                                              (xml-get-children entry 'link))) 'href)
                         (elt (car (xml-get-children entry 'title)) 2)))
                      (-filter (lambda (entry)
                                 (let ((entry-date (elt (car (xml-get-children entry 'updated)) 2)))
                                   (and
                                    (org-string<= from-date entry-date)
                                    (or (null to-date) (string< entry-date to-date)))))
                               (xml-get-children (car feed) 'entry)))
                   (mapcar (lambda (entry)
                             (cons
                              (caddr (car (xml-get-children entry 'link)))
                              (caddr (car (xml-get-children entry 'title)))))
                           (-filter (lambda (entry)
                                      (let ((entry-time (date-to-time (elt (car (xml-get-children entry 'pubDate)) 2))))
                                        (and
                                         (not (time-less-p entry-time from-time))
                                         (or (null to-time) (time-less-p entry-time to-time)))))
                                    (xml-get-children (car (xml-get-children (car feed) 'channel)) 'item))))
                 ""))))
You can comment with Disqus or you can e-mail me at sacha@sachachua.com.