Category Archives: R

Getting data from Org Mode tables

Org Mode is an amazingly powerful package for Emacs. I’ve been learning a lot about how to use its support for plain-text tables and spreadsheet calculations.

Using table data in Emacs Lisp with the :var argument

For example, I wanted to be able to define my abbreviations in an Org Mode table in my config. I remembered coming across this technique a few weeks ago, but I couldn’t find the webpage with the code. It turned out to be simple to write from scratch. Here’s the plain text I added to my config.

#+NAME: abbrev
| Base  | Expansion                             |
|-------+---------------------------------------|
| bc    | because                               |
| wo    | without                               |
| wi    | with                                  |
| ex    | For example,                          |
| qm    | [email protected]                   |
| qe    | http://sachachua.com/dotemacs         |
| qw    | http://sachachua.com/                 |
| qb    | http://sachachua.com/blog/            |
| qc    | http://sachachua.com/blog/emacs-chat/ |

#+begin_src emacs-lisp :exports code :var data=abbrev
(mapc (lambda (x) (define-global-abbrev (car x) (cadr x))) data)
#+end_src

The :var data=abbrev argument to the Emacs Lisp source block is where all the magic happens. Here, it takes the data from the table named “abbrev” (which I set using #+NAME: before the table) and makes it available to the code. Emacs evaluates that data when the code is tangled (or exported) to my configuration. The code that’s in my Sacha.el looks like this:

(let ((data (quote (("bc" "because")
                    ("wo" "without")
                    ("wi" "with")
                    ("ex" "For example,")
                    ("email" "[email protected]")
                    ("dote" "http://sachachua.com/dotemacs")
                    ("web" "http://sachachua.com/")
                    ("blog" "http://sachachua.com/blog/")
                    ("ec" "http://sachachua.com/blog/emacs-chat/")))))
  (mapc (lambda (x) (define-global-abbrev (car x) (cadr x))) data) )

Looking up data with org-lookup-first, org-lookup-last, and org-lookup-all

You can do more complex things with Org tables, too. Inspired by Eric Boyd’s talk on his Epic Quest of Awesome (which he based on Steve Kamb‘s), I started putting together my own. I made a list of little achievements, guessed at the years, and assigned arbitrary experience points.

The achievements table had rows like this:

Approximate date Category XP Description ID
2014 Life 50 Became a Canadian citizen – link L_CAN
2014 Programming 20 Used NodeJS and AngularJS for a client project – link P_NOD
2014 Programming 5 Pulled information out of Evernote

I wanted to summarize the points by year: points gained, total points, level (according to a lookup table based on D&D experience points), and description. The lookup table was structured like this:

#+TBLNAME: levels
| Total XP | Level | Adjective             |
|----------+-------+-----------------------|
|        0 |     1 | trained-initiate      |
|     1000 |     2 | experienced           |
|     2250 |     3 | savvy                 |
|     3750 |     4 | veteran               |
|     5500 |     5 | unusually experienced |

Now for the summary table. I created rows for different years, and then I used Org Mode to fill in the rest. (Org Mode! Wow.)

| Year | Points gained | Cumulative points | Level | Adjective        |
|------+---------------+-------------------+-------+------------------|
| 1997 |             0 |                 0 |     1 | trained-initiate |
| 1998 |            10 |                10 |     1 | trained-initiate |
| 1999 |            50 |                60 |     1 | trained-initiate |
| 2000 |            50 |               110 |     1 | trained-initiate |
| 2001 |           100 |               210 |     1 | trained-initiate |
| 2002 |            60 |               270 |     1 | trained-initiate |
| 2003 |           245 |               515 |     1 | trained-initiate |
| 2004 |           115 |               630 |     1 | trained-initiate |
| 2005 |           140 |               770 |     1 | trained-initiate |
| 2006 |            60 |               830 |     1 | trained-initiate |
| 2007 |           270 |              1100 |     2 | experienced      |
| 2008 |           290 |              1390 |     2 | experienced      |
| 2009 |           205 |              1595 |     2 | experienced      |
| 2010 |           215 |              1810 |     2 | experienced      |
| 2011 |           115 |              1925 |     2 | experienced      |
| 2012 |           355 |              2280 |     3 | savvy            |
| 2013 |           290 |              2570 |     3 | savvy            |
| 2014 |           350 |              2920 |     3 | savvy            |
| 2015 |            45 |              2965 |     3 | savvy            |
#+TBLFM: $2='(calc-eval (format "vsum(%s)" (vconcat (org-lookup-all $1 '(remote(accomplishments,@2$1..@>$1)) '(remote(accomplishments,@2$3..@>$3))))))::$3=vsum(@2$2..@+0$2)::$4='(org-lookup-last $3 '(remote(levels,@2$1..@>$1)) '(remote(levels,@2$2..@>$2)) '>=);N::$5='(org-lookup-last $3 '(remote(levels,@2$1..@>$1)) '(remote(levels,@2$3..@>$3)) '>=);L

The TBLFM (table formula) line is very long, so let me break it down.

Points gained:

(calc-eval
 (format "vsum(%s)"
         (vconcat
          (org-lookup-all
           $1
           '(remote(accomplishments,@2$1..@>$1))
           '(remote(accomplishments,@2$3..@>$3))))))

This uses org-lookup-all to look up the value of the first column ($1) in the accomplishments table, from the second row to the last row @2..@>, looking in the first column ($1). It returns the values from the third column of the matching rows ($3). This is then passed through calc’s vsum function to calculate the sum.

Cumulative points: vsum(@2$2..@+0$2) is the sum of the second column $2 from the second row @2 to the current row @+0.

Level: This uses org-lookup-last to find the last value where the operator function returns true. In this case, testing the level from column $3 against each of the values in the levels table’s column $1 while the given level is greater than or equal to the value from levels. When it finds the last matching row, it returns the $2 second column from it. ;N means treat everything as a number.

org-lookup-first is like org-lookup-last, but it returns the first matching row.

Adjective: This one works like Level does, but it returns the value from column $3 instead. I found that it converted the return values to 0 if I used ;N, so I used ;L instead.

Passing data to R or other external programs

Of course, you’re not limited to things that Emacs can do. I wanted to summarize the data in graphs, so here’s what I did.

#+RESULTS: category_analysis

#+name: category_analysis
#+begin_src R :var data=accomplishments :exports both :results graphics :file quest_category.png :height 300
library(plyr)
library(ggplot2)
categories <- ddply(data, c("Category"), summarize, Points=sum(XP))
cat_sorted <- transform(categories, Category=reorder(Category, Points))
plot <- ggplot(data = cat_sorted, aes(x = Category, y = Points))
plot <- plot + geom_bar(stat="identity")
plot <- plot + geom_text(aes(label = Points, x = Category, y = Points + 10, hjust=0))
plot <- plot + scale_y_continuous(expand=c(0,70))
plot <- plot + coord_flip()
print(plot)
#+end_src

I like including source code in published pages for fellow geeks, but having the results come first gives people more context for the source block. So I named the source block using the #+name: directive and defined a #+RESULTS: directive before it. The source block used the :var argument to bring the data in from the accomplishments table. With R blocks, the data becomes available as a data frame that you can then do interesting things with. I used the :file argument to save the output to quest_category.png.

Those are a few ways that you can get data out of Org Mode tables and into Emacs Lisp, other Org Mode tables, or external programs. As I learn more about Org Mode, I find myself using it for more of the things that I used to use Microsoft Excel for – tracking, analyzing, and even graphing. I found it a little difficult to piece together what I needed to do from the manuals and examples on the Web, so I hope this explanation will help you (and that it’ll help me when I forget the syntax, as I’m sure I will). If you come up with something really neat that uses Org Mode tables, tell me what you’ve figured out!

Getting R and ggplot2 to work in Emacs Org Mode Babel blocks; also, tracking the number of TODOs

I started tracking the number of tasks I had in Org Mode so that I could find out if my TODO list tended to shrink or grow. It was easy to write a function in Emacs Lisp to count the number of tasks in different states and summarize them in a table.

(defun sacha/org-count-tasks-by-status ()
  (interactive)
  (let ((counts (make-hash-table :test 'equal))
        (today (format-time-string "%Y-%m-%d" (current-time)))
        values output)
    (org-map-entries
     (lambda ()
       (let* ((status (elt (org-heading-components) 2)))
         (when status
           (puthash status (1+ (or (gethash status counts) 0)) counts))))
     nil
     'agenda)
    (setq values (mapcar (lambda (x)
                           (or (gethash x counts) 0))
                         '("DONE" "STARTED" "TODO" "WAITING" "DELEGATED" "CANCELLED" "SOMEDAY")))
    (setq output
          (concat "| " today " | "
                  (mapconcat 'number-to-string values " | ")
                  " | "
                  (number-to-string (apply '+ values))
                  " | "
                  (number-to-string
                   (round (/ (* 100.0 (car values)) (apply '+ values))))
                  "% |"))
    (if (called-interactively-p 'any)
        (insert output)
      output)))
(sacha/org-count-tasks-by-status)

I ran this code over several days. Here are my results as of 2014-05-01:

Date DONE START. TODO WAIT. DELEG. CANC. SOMEDAY Total % done + done +canc. + total + t – d – c Note
2014-04-16 1104 1 403 3 1 104 35 1651 67%
2014-04-17 1257 0 114 4 1 171 107 1654 76% 153 67 3 -217 Lots of trimming
2014-04-18 1292 0 74 4 5 183 100 1658 78% 35 12 4 -43 A little bit more trimming
2014-04-20 1305 0 80 4 5 183 100 1677 78% 13 0 19 6
2014-04-21 1311 1 78 4 4 184 99 1681 78% 6 1 4 -3
2014-04-22 1313 2 75 4 4 184 99 1681 78% 2 0 0 -2
2014-04-23 1369 4 66 4 5 186 101 1735 79% 56 2 54 -4 Added sharing/index.org
2014-04-24 1371 3 69 4 5 186 101 1739 79% 2 0 4 2
2014-04-25 1379 3 60 3 5 189 103 1742 79% 8 3 3 -8
2014-04-26 1384 3 65 3 5 192 103 1755 79% 5 3 13 5
2014-04-27 1389 2 66 3 5 192 103 1760 79% 5 0 5 0
2014-04-28 1396 3 67 3 5 192 103 1769 79% 7 0 9 2
2014-04-29 1396 3 67 3 5 192 103 1769 79% 0 0 0 0
2014-04-30 1404 4 70 4 5 192 103 1782 79% 8 0 13 5
2014-05-01 1413 4 80 3 4 193 103 1800 79% 9 1 18 8

Here’s the source for that table:

#+NAME: burndown
#+RESULTS:
|       Date | DONE | START. | TODO | WAIT. | DELEG. | CANC. | SOMEDAY | Total | % done | + done | +canc. | + total | + t - d - c | Note                       |
|------------+------+--------+------+-------+--------+-------+---------+-------+--------+--------+--------+---------+-------------+----------------------------|
| 2014-04-16 | 1104 |      1 |  403 |     3 |      1 |   104 |      35 |  1651 |    67% |        |        |         |             |                            |
| 2014-04-17 | 1257 |      0 |  114 |     4 |      1 |   171 |     107 |  1654 |    76% |    153 |     67 |       3 |        -217 | Lots of trimming           |
| 2014-04-18 | 1292 |      0 |   74 |     4 |      5 |   183 |     100 |  1658 |    78% |     35 |     12 |       4 |         -43 | A little bit more trimming |
| 2014-04-20 | 1305 |      0 |   80 |     4 |      5 |   183 |     100 |  1677 |    78% |     13 |      0 |      19 |           6 |                            |
| 2014-04-21 | 1311 |      1 |   78 |     4 |      4 |   184 |      99 |  1681 |    78% |      6 |      1 |       4 |          -3 |                            |
| 2014-04-22 | 1313 |      2 |   75 |     4 |      4 |   184 |      99 |  1681 |    78% |      2 |      0 |       0 |          -2 |                            |
| 2014-04-23 | 1369 |      4 |   66 |     4 |      5 |   186 |     101 |  1735 |    79% |     56 |      2 |      54 |          -4 | Added sharing/index.org    |
| 2014-04-24 | 1371 |      3 |   69 |     4 |      5 |   186 |     101 |  1739 |    79% |      2 |      0 |       4 |           2 |                            |
| 2014-04-25 | 1379 |      3 |   60 |     3 |      5 |   189 |     103 |  1742 |    79% |      8 |      3 |       3 |          -8 |                            |
| 2014-04-26 | 1384 |      3 |   65 |     3 |      5 |   192 |     103 |  1755 |    79% |      5 |      3 |      13 |           5 |                            |
| 2014-04-27 | 1389 |      2 |   66 |     3 |      5 |   192 |     103 |  1760 |    79% |      5 |      0 |       5 |           0 |                            |
| 2014-04-28 | 1396 |      3 |   67 |     3 |      5 |   192 |     103 |  1769 |    79% |      7 |      0 |       9 |           2 |                            |
| 2014-04-29 | 1396 |      3 |   67 |     3 |      5 |   192 |     103 |  1769 |    79% |      0 |      0 |       0 |           0 |                            |
| 2014-04-30 | 1404 |      4 |   70 |     4 |      5 |   192 |     103 |  1782 |    79% |      8 |      0 |      13 |           5 |                            |
| 2014-05-01 | 1413 |      4 |   80 |     3 |      4 |   193 |     103 |  1800 |    79% |      9 |      1 |      18 |           8 |                            |
#+TBLFM: @3$11..@>$11=$2-@-1$2::@3$13..@>$13=$9-@-1$9::@3$14..@>$14=$13-$11-($7-@-1$7)::@3$12..@>$12=$7-@-1$7

I wanted to graph this with Gnuplot, but it turns out that Gnuplot is difficult to integrate with Emacs on Microsoft Windows. I gave up after a half an hour of poking at it, since search results indicated there were long-standing problems with how Gnuplot got input from Emacs. Besides, I’d been meaning to learn more R anyway, and R is more powerful when it comes to statistics and data visualization.

Getting R to work with Org Mode babel blocks in Emacs on Windows was a challenge. Here are some of the things I ran into.

The first step was easy: Add R to the list of languages I could evaluate in a source block (I already had dot and ditaa from previous experiments).

(org-babel-do-load-languages
 'org-babel-load-languages
 '((dot . t)
   (ditaa . t) 
   (R . t)))

But my code didn’t execute at all, even when I was trying something that printed out results instead of drawing images. I got a little lost trying to dig into org-babel-execute:R with edebug, eventually ending up in comint.el. The real solution was even easier. I had incorrectly set inferior-R-program-name to the path of R in my configuration, which made M-x R work but which meant that Emacs was looking in the wrong place for the options to pass to R (which Org Babel relied on). The correct way to do this is to leave inferior-R-program-name with the default value (Rterm) and make sure that my system path included both the bin directory and the bin\x64 directory.

Then I had to pick up the basics of R again. It took me a little time to figure out that I needed to parse the columns I pulled in from Org, using strptime to convert the date column and as.numeric to convert the numbers. Eventually, I got it to plot some results with the regular plot command.

dates <- strptime(as.character(data$Date), "%Y-%m-%d")
tasks_done <- as.numeric(data$DONE)
tasks_uncancelled <- as.numeric(data$Total) - as.numeric(data$CANC.)
df <- data.frame(dates, tasks_done, tasks_uncancelled)
plot(x=dates, y=tasks_uncancelled, ylim=c(0,max(tasks_uncancelled)))
lines(x=dates, y=tasks_uncancelled, col="blue", type="o")
lines(x=dates, y=tasks_done, col="green", type="o")

r-plot

I wanted prettier graphs, though. I installed the ggplot2 package and started figuring it out. No matter what I did, though, I ended up with a blank white image instead of my graph. If I used M-x R instead of evaluating the src block, the code worked. Weird! Eventually I found out that adding print(...) around my ggplot made it display the image correctly. Yay! Now I had what I wanted.

library(ggplot2)
dates <- strptime(as.character(data$Date), "%Y-%m-%d")
tasks_done <- as.numeric(data$DONE)
tasks_uncancelled <- as.numeric(data$Total) - as.numeric(data$CANC.)
df <- data.frame(dates, tasks_done, tasks_uncancelled)
plot = ggplot(data=df, aes(x=dates, y=tasks_done, ymin=0)) + geom_line(color="#009900") + geom_point() + geom_line(aes(y=tasks_uncancelled), color="blue") + geom_point(aes(y=tasks_uncancelled))
print(plot)

 r-graph

The blue line represents the total number of tasks (except for the cancelled ones), and the green line represents tasks that are done.

Here’s something that looks a little more like a burn down chart, since it shows just the number of things to be done:

library(ggplot2)
dates <- strptime(as.character(data$Date), "%Y-%m-%d")
tasks_remaining <- as.numeric(data$Total) - as.numeric(data$CANC.) - as.numeric(data$DONE)
df <- data.frame(dates, tasks_remaining)
plot = ggplot(data=df, aes(x=dates, y=tasks_remaining, ymin=0)) + geom_line(color="#009900") + geom_point()
print(plot)

r-graph-2

The drastic decline there is me realizing that I had lots of tasks that were no longer relevant, not me being super-productive. =)

As it turns out, I tend to add new tasks at about the rate that I finish them (or slightly more). I think this is okay. It means I’m working on things that have next steps, and next steps, and steps beyond that. If I add more tasks, that gives me more variety to choose from. Besides, I have a lot of repetitive tasks, so those never get marked as DONE over here.

Anyway, cool! Now that I’ve gotten R to work on my system, you’ll probably see it in even more of these blog posts. =D Hooray for Org Babel and R!

Update 2014-05-09: Stephen suggested http://blogs.neuwirth.priv.at/software/2012/03/28/r-and-emacs-with-org-mode/ for more tips on setting up Org Mode with R and Emacs Speaks Statistics (ESS).