Emacs: url-retrieve-synchronously and set-buffer-multibyte
| emacs
As part of preparing Emacs News, I have some code that makes a list of headlines from RSS feeds like the one from Planet Emacslife and the news posts from the Org Mode mailing list (posts with [BLOG]
in the subject line). The problem was that Unicode characters ended up being weird (like é), so I usually ended up deleting them, looking up the correct characters, and replacing them.
Thanks to this Reddit thread, I found out that all
I needed to get UTF-8 properly interpreted was to
add (set-buffer-multibyte t)
once I was in the
buffer.
Here's the source code now:
(require 'url) (require 'xml-rpc) (defun my-org-list-from-rss (url from-date &optional to-date) "Convert URL to an Org list. Return entries between FROM-DATE and TO-DATE. FROM-DATE and TO-DATE should be strings of the form YYYY-MM-DD." (with-current-buffer (url-retrieve-synchronously url) (set-buffer-multibyte t) ;; This fixes accented characters (goto-char (point-min)) (re-search-forward "<\\?xml") (goto-char (match-beginning 0)) (let* ((feed (xml-parse-region (point) (point-max))) (from-time (org-read-date nil t from-date)) (to-time (if to-date (org-read-date nil t to-date))) (is-rss (> (length (xml-get-children (car feed) 'entry)) 0))) (mapconcat (lambda (link) (format "- %s\n" (org-link-make-string (car link) (cdr link)))) (if is-rss (mapcar (lambda (entry) (cons (xml-get-attribute (car (or (seq-filter (lambda (x) (string= (xml-get-attribute x 'rel) "alternate")) (xml-get-children entry 'link)) (xml-get-children entry 'link))) 'href) (elt (car (xml-get-children entry 'title)) 2))) (-filter (lambda (entry) (let ((entry-date (elt (car (xml-get-children entry 'updated)) 2))) (and (org-string<= from-date entry-date) (or (null to-date) (string< entry-date to-date))))) (xml-get-children (car feed) 'entry))) (mapcar (lambda (entry) (cons (caddr (car (xml-get-children entry 'link))) (caddr (car (xml-get-children entry 'title))))) (-filter (lambda (entry) (let ((entry-time (date-to-time (elt (car (xml-get-children entry 'pubDate)) 2)))) (and (not (time-less-p entry-time from-time)) (or (null to-time) (time-less-p entry-time to-time))))) (xml-get-children (car (xml-get-children (car feed) 'channel)) 'item)))) ""))))