Getting an Org link URL from a string; debugging regex groups
| elisp, org
Sometimes I want to get the URL from a string
whether the string contains a bare URL
(https://example.com
) or an Org bracketed link
([[https://example.com]]
or
[[https://example.com][Example]]
, ignoring any
extra non-link text (blah https://example.com
blah blah
). org-link-any-re
seemed like the
right regular expression to use, but I started to
get a little dizzy looking at all the parenthesis
and I couldn't figure out which matching group to
use. I tried using re-builder
. That highlighted
the groups in different colours, but I didn't know
what the colours meant. All the matching
information is in (match-data), but integer pairs
can be a little hard to translate back to
substrings. So I wrote an Emacs Lisp function to
gave me the matching groups:
(defun my-match-groups (&optional object) "Return the matching groups, good for debugging regexps." (seq-map-indexed (lambda (entry i) (list i entry (and (car entry) (if object (substring object (car entry) (cadr entry)) (buffer-substring (car entry) (cadr entry)))))) (seq-partition (match-data t) 2)))
There's probably a standard way to do this, but I couldn't figure out how to find it.
Anyway, if I give it a string with a bracketed link, I can tell that the URL ends up in group 2:
(let ((text "blah [[https://example.com][example]] blah blah")) (when (string-match org-link-any-re text) (pp-to-string (my-match-groups text))))
((0 (5 37) "[[https://example.com][example]]") (1 (5 37) "[[https://example.com][example]]") (2 (7 26) "https://example.com") (3 (28 35) "example"))
When I use a string with a bare link, I can see that the URL ends up in group 7:
(let ((text "blah https://example.com blah blah")) (when (string-match org-link-any-re text) (pp-to-string (my-match-groups text))))
((0 (5 24) "https://example.com") (1 (nil nil) nil) (2 (nil nil) nil) (3 (nil nil) nil) (4 (nil nil) nil) (5 (nil nil) nil) (6 (nil nil) nil) (7 (5 24) "https://example.com") (8 (5 10) "https") (9 (11 24) "//example.com"))
This makes it so much easier to refer to the right capture group. So now I can use those groups to extract the URL from a string:
(defun my-org-link-url-from-string (s) "Return the link URL from S." (when (string-match org-link-any-re s) (or (match-string 7 s) (match-string 2 s))))
This is handy when I summarize Emacs News links from Mastodon or from my inbox. Sometimes I add extra text after a link that I've captured from my phone, and I don't want that included in the URL. Sometimes I have a bracketed link that I've copied from org-capture note. Now I don't have to worry about the format. I can just grab the link I want.