Tags: babel

RSS - Atom - Subscribe via email

Python, Org Mode, and writing Org tables to CSVs so that I can read them back

| emacs, org, writing

I’ve been getting deeper into Python so that I can model our personal finances. I really like using the pandas library to manipulate data. All those years I spent trying to juggle increasing complex spreadsheets… Working with Python code in Org Babel blocks is just so much more fun. I like being able to keep my assumptions in tables without having to fuss around with naming cells for easy-to-read formulas, slice and summarize parts of my data frames, organize my notes in outlines and add commentary, and define more complicated functions that I don’t have to squeeze into a single line.

I haven’t quite been able to tempt W- into the world of Org Babel Python blocks. Still, I don’t want to give up the awesomeness of having pretty tables that I can easily edit and use. So I have a bunch of named tables (using #+NAME:), and some code that exports my tables to CSVs:

#+NAME: tables
| Table         | Key                 |
|---------------+---------------------|
| assets_w      | Description         |
| assets_s      | Description         |
| tax_rates     |                     |
| disposition   | Asset               |
| probate_rates | Asset               |
| basic         | Client information  |
| base_expenses | Category            |
| general       | General assumptions |

#+begin_src emacs-lisp :results silent :var tables=tables :tangle no
  (defun my-tbl-export (row)
    "Search for table named `NAME` and export."
    (interactive "s")
    (save-excursion
      (goto-char (point-min))
      (let ((case-fold-search t))
        (when (search-forward-regexp (concat "#\\+NAME: +" (car row)) nil t)
          (next-line)
          (org-table-export (format "%s.csv" (car row)) "orgtbl-to-csv")))))
  (mapc 'my-tbl-export tables)
#+end_src

and some code that imports them back in, and formats tables nicely if I’m displaying them in Org. The in_org block doesn’t get tangled into index.py, so I don’t clutter command-line use with Org table markup.

#+begin_src python :results silent :tangle no
  in_org=1
#+end_src

#+begin_src python :results silent :exports code
  import pandas as pd
  import numpy as np
  import orgbabelhelper as ob
  def out(df, **kwargs):
    if 'in_org' in globals():
      print(ob.dataframe_to_orgtable(df, **kwargs))
    else:
      print(df)
    return df
#+end_src

#+begin_src python :results silent :var tables=tables :colnames yes
  for row in tables:
    table = row[0]
    index = row[1] 
    if row[1] == '':
      index = None
    globals()[table] = pd.read_csv(table + '.csv', index_col=index).apply(pd.to_numeric, errors='ignore')
    # print(globals()[table])
#+end_src

Then I can use C-c C-v C-b (org-babel-execute-buffer) to update everything if I change the table in my Org file, and I can use C-c C-v C-t (org-babel-tangle) to create an index.py that W- can read through or run without needing Org.

Getting data from Org Mode tables

Posted: - Modified: | emacs, org

Org Mode is an amazingly powerful package for Emacs. I’ve been learning a lot about how to use its support for plain-text tables and spreadsheet calculations.

Using table data in Emacs Lisp with the :var argument

For example, I wanted to be able to define my abbreviations in an Org Mode table in my config. I remembered coming across this technique a few weeks ago, but I couldn’t find the webpage with the code. It turned out to be simple to write from scratch. Here’s the plain text I added to my config.

#+NAME: abbrev
| Base  | Expansion                             |
|-------+---------------------------------------|
| bc    | because                               |
| wo    | without                               |
| wi    | with                                  |
| ex    | For example,                          |
| qm    | sacha@sachachua.com                   |
| qe    | http://sachachua.com/dotemacs         |
| qw    | http://sachachua.com/                 |
| qb    | http://sachachua.com/blog/            |
| qc    | http://sachachua.com/blog/emacs-chat/ |

#+begin_src emacs-lisp :exports code :var data=abbrev
(mapc (lambda (x) (define-global-abbrev (car x) (cadr x))) data)
#+end_src

The :var data=abbrev argument to the Emacs Lisp source block is where all the magic happens. Here, it takes the data from the table named “abbrev” (which I set using #+NAME: before the table) and makes it available to the code. Emacs evaluates that data when the code is tangled (or exported) to my configuration. The code that’s in my Sacha.el looks like this:

(let ((data (quote (("bc" "because")
                    ("wo" "without")
                    ("wi" "with")
                    ("ex" "For example,")
                    ("email" "sacha@sachachua.com")
                    ("dote" "http://sachachua.com/dotemacs")
                    ("web" "http://sachachua.com/")
                    ("blog" "http://sachachua.com/blog/")
                    ("ec" "http://sachachua.com/blog/emacs-chat/")))))
  (mapc (lambda (x) (define-global-abbrev (car x) (cadr x))) data) )

Looking up data with org-lookup-first, org-lookup-last, and org-lookup-all

You can do more complex things with Org tables, too. Inspired by Eric Boyd’s talk on his Epic Quest of Awesome (which he based on Steve Kamb‘s), I started putting together my own. I made a list of little achievements, guessed at the years, and assigned arbitrary experience points.

The achievements table had rows like this:

Approximate date Category XP Description ID
2014 Life 50 Became a Canadian citizen – link L_CAN
2014 Programming 20 Used NodeJS and AngularJS for a client project – link P_NOD
2014 Programming 5 Pulled information out of Evernote

I wanted to summarize the points by year: points gained, total points, level (according to a lookup table based on D&D experience points), and description. The lookup table was structured like this:

#+TBLNAME: levels
| Total XP | Level | Adjective             |
|----------+-------+-----------------------|
|        0 |     1 | trained-initiate      |
|     1000 |     2 | experienced           |
|     2250 |     3 | savvy                 |
|     3750 |     4 | veteran               |
|     5500 |     5 | unusually experienced |

Now for the summary table. I created rows for different years, and then I used Org Mode to fill in the rest. (Org Mode! Wow.)

| Year | Points gained | Cumulative points | Level | Adjective        |
|------+---------------+-------------------+-------+------------------|
| 1997 |             0 |                 0 |     1 | trained-initiate |
| 1998 |            10 |                10 |     1 | trained-initiate |
| 1999 |            50 |                60 |     1 | trained-initiate |
| 2000 |            50 |               110 |     1 | trained-initiate |
| 2001 |           100 |               210 |     1 | trained-initiate |
| 2002 |            60 |               270 |     1 | trained-initiate |
| 2003 |           245 |               515 |     1 | trained-initiate |
| 2004 |           115 |               630 |     1 | trained-initiate |
| 2005 |           140 |               770 |     1 | trained-initiate |
| 2006 |            60 |               830 |     1 | trained-initiate |
| 2007 |           270 |              1100 |     2 | experienced      |
| 2008 |           290 |              1390 |     2 | experienced      |
| 2009 |           205 |              1595 |     2 | experienced      |
| 2010 |           215 |              1810 |     2 | experienced      |
| 2011 |           115 |              1925 |     2 | experienced      |
| 2012 |           355 |              2280 |     3 | savvy            |
| 2013 |           290 |              2570 |     3 | savvy            |
| 2014 |           350 |              2920 |     3 | savvy            |
| 2015 |            45 |              2965 |     3 | savvy            |
#+TBLFM: $2='(calc-eval (format "vsum(%s)" (vconcat (org-lookup-all $1 '(remote(accomplishments,@2$1..@>$1)) '(remote(accomplishments,@2$3..@>$3))))))::$3=vsum(@2$2..@+0$2)::$4='(org-lookup-last $3 '(remote(levels,@2$1..@>$1)) '(remote(levels,@2$2..@>$2)) '>=);N::$5='(org-lookup-last $3 '(remote(levels,@2$1..@>$1)) '(remote(levels,@2$3..@>$3)) '>=);L

The TBLFM (table formula) line is very long, so let me break it down.

Points gained:

(calc-eval
 (format "vsum(%s)"
         (vconcat
          (org-lookup-all
           $1
           '(remote(accomplishments,@2$1..@>$1))
           '(remote(accomplishments,@2$3..@>$3))))))

This uses org-lookup-all to look up the value of the first column ($1) in the accomplishments table, from the second row to the last row @2..@>, looking in the first column ($1). It returns the values from the third column of the matching rows ($3). This is then passed through calc’s vsum function to calculate the sum.

Cumulative points: vsum(@2$2..@+0$2) is the sum of the second column $2 from the second row @2 to the current row @+0.

Level: This uses org-lookup-last to find the last value where the operator function returns true. In this case, testing the level from column $3 against each of the values in the levels table’s column $1 while the given level is greater than or equal to the value from levels. When it finds the last matching row, it returns the $2 second column from it. ;N means treat everything as a number.

org-lookup-first is like org-lookup-last, but it returns the first matching row.

Adjective: This one works like Level does, but it returns the value from column $3 instead. I found that it converted the return values to 0 if I used ;N, so I used ;L instead.

Passing data to R or other external programs

Of course, you’re not limited to things that Emacs can do. I wanted to summarize the data in graphs, so here’s what I did.

#+RESULTS: category_analysis

#+name: category_analysis
#+begin_src R :var data=accomplishments :exports both :results graphics :file quest_category.png :height 300
library(plyr)
library(ggplot2)
categories <- ddply(data, c("Category"), summarize, Points=sum(XP))
cat_sorted <- transform(categories, Category=reorder(Category, Points))
plot <- ggplot(data = cat_sorted, aes(x = Category, y = Points))
plot <- plot + geom_bar(stat="identity")
plot <- plot + geom_text(aes(label = Points, x = Category, y = Points + 10, hjust=0))
plot <- plot + scale_y_continuous(expand=c(0,70))
plot <- plot + coord_flip()
print(plot)
#+end_src

I like including source code in published pages for fellow geeks, but having the results come first gives people more context for the source block. So I named the source block using the #+name: directive and defined a #+RESULTS: directive before it. The source block used the :var argument to bring the data in from the accomplishments table. With R blocks, the data becomes available as a data frame that you can then do interesting things with. I used the :file argument to save the output to quest_category.png.

Those are a few ways that you can get data out of Org Mode tables and into Emacs Lisp, other Org Mode tables, or external programs. As I learn more about Org Mode, I find myself using it for more of the things that I used to use Microsoft Excel for – tracking, analyzing, and even graphing. I found it a little difficult to piece together what I needed to do from the manuals and examples on the Web, so I hope this explanation will help you (and that it’ll help me when I forget the syntax, as I’m sure I will). If you come up with something really neat that uses Org Mode tables, tell me what you’ve figured out!