tag - visualization :: Sacha Chua

Posted: May 2, 2014 - Modified: May 9, 2014| emacs, org, quantified

I started tracking the number of tasks I had in Org Mode so that I could find out if my TODO list tended to shrink or grow. It was easy to write a function in Emacs Lisp to count the number of tasks in different states and summarize them in a table.

(defun sacha/org-count-tasks-by-status ()
  (interactive)
  (let ((counts (make-hash-table :test 'equal))
        (today (format-time-string "%Y-%m-%d" (current-time)))
        values output)
    (org-map-entries
     (lambda ()
       (let* ((status (elt (org-heading-components) 2)))
         (when status
           (puthash status (1+ (or (gethash status counts) 0)) counts))))
     nil
     'agenda)
    (setq values (mapcar (lambda (x)
                           (or (gethash x counts) 0))
                         '("DONE" "STARTED" "TODO" "WAITING" "DELEGATED" "CANCELLED" "SOMEDAY")))
    (setq output
          (concat "| " today " | "
                  (mapconcat 'number-to-string values " | ")
                  " | "
                  (number-to-string (apply '+ values))
                  " | "
                  (number-to-string
                   (round (/ (* 100.0 (car values)) (apply '+ values))))
                  "% |"))
    (if (called-interactively-p 'any)
        (insert output)
      output)))
(sacha/org-count-tasks-by-status)

I ran this code over several days. Here are my results as of 2014-05-01:

Date	DONE	START.	TODO	WAIT.	DELEG.	CANC.	SOMEDAY	Total	% done	+ done	+canc.	+ total	+ t – d – c	Note
2014-04-16	1104	1	403	3	1	104	35	1651	67%
2014-04-17	1257	0	114	4	1	171	107	1654	76%	153	67	3	-217	Lots of trimming
2014-04-18	1292	0	74	4	5	183	100	1658	78%	35	12	4	-43	A little bit more trimming
2014-04-20	1305	0	80	4	5	183	100	1677	78%	13	0	19	6
2014-04-21	1311	1	78	4	4	184	99	1681	78%	6	1	4	-3
2014-04-22	1313	2	75	4	4	184	99	1681	78%	2	0	0	-2
2014-04-23	1369	4	66	4	5	186	101	1735	79%	56	2	54	-4	Added sharing/index.org
2014-04-24	1371	3	69	4	5	186	101	1739	79%	2	0	4	2
2014-04-25	1379	3	60	3	5	189	103	1742	79%	8	3	3	-8
2014-04-26	1384	3	65	3	5	192	103	1755	79%	5	3	13	5
2014-04-27	1389	2	66	3	5	192	103	1760	79%	5	0	5	0
2014-04-28	1396	3	67	3	5	192	103	1769	79%	7	0	9	2
2014-04-29	1396	3	67	3	5	192	103	1769	79%	0	0	0	0
2014-04-30	1404	4	70	4	5	192	103	1782	79%	8	0	13	5
2014-05-01	1413	4	80	3	4	193	103	1800	79%	9	1	18	8

Here's the source for that table:

#+NAME: burndown
#+RESULTS:
|       Date | DONE | START. | TODO | WAIT. | DELEG. | CANC. | SOMEDAY | Total | % done | + done | +canc. | + total | + t - d - c | Note                       |
|------------+------+--------+------+-------+--------+-------+---------+-------+--------+--------+--------+---------+-------------+----------------------------|
| 2014-04-16 | 1104 |      1 |  403 |     3 |      1 |   104 |      35 |  1651 |    67% |        |        |         |             |                            |
| 2014-04-17 | 1257 |      0 |  114 |     4 |      1 |   171 |     107 |  1654 |    76% |    153 |     67 |       3 |        -217 | Lots of trimming           |
| 2014-04-18 | 1292 |      0 |   74 |     4 |      5 |   183 |     100 |  1658 |    78% |     35 |     12 |       4 |         -43 | A little bit more trimming |
| 2014-04-20 | 1305 |      0 |   80 |     4 |      5 |   183 |     100 |  1677 |    78% |     13 |      0 |      19 |           6 |                            |
| 2014-04-21 | 1311 |      1 |   78 |     4 |      4 |   184 |      99 |  1681 |    78% |      6 |      1 |       4 |          -3 |                            |
| 2014-04-22 | 1313 |      2 |   75 |     4 |      4 |   184 |      99 |  1681 |    78% |      2 |      0 |       0 |          -2 |                            |
| 2014-04-23 | 1369 |      4 |   66 |     4 |      5 |   186 |     101 |  1735 |    79% |     56 |      2 |      54 |          -4 | Added sharing/index.org    |
| 2014-04-24 | 1371 |      3 |   69 |     4 |      5 |   186 |     101 |  1739 |    79% |      2 |      0 |       4 |           2 |                            |
| 2014-04-25 | 1379 |      3 |   60 |     3 |      5 |   189 |     103 |  1742 |    79% |      8 |      3 |       3 |          -8 |                            |
| 2014-04-26 | 1384 |      3 |   65 |     3 |      5 |   192 |     103 |  1755 |    79% |      5 |      3 |      13 |           5 |                            |
| 2014-04-27 | 1389 |      2 |   66 |     3 |      5 |   192 |     103 |  1760 |    79% |      5 |      0 |       5 |           0 |                            |
| 2014-04-28 | 1396 |      3 |   67 |     3 |      5 |   192 |     103 |  1769 |    79% |      7 |      0 |       9 |           2 |                            |
| 2014-04-29 | 1396 |      3 |   67 |     3 |      5 |   192 |     103 |  1769 |    79% |      0 |      0 |       0 |           0 |                            |
| 2014-04-30 | 1404 |      4 |   70 |     4 |      5 |   192 |     103 |  1782 |    79% |      8 |      0 |      13 |           5 |                            |
| 2014-05-01 | 1413 |      4 |   80 |     3 |      4 |   193 |     103 |  1800 |    79% |      9 |      1 |      18 |           8 |                            |
#+TBLFM: @3$11..@>$11=$2-@-1$2::@3$13..@>$13=$9-@-1$9::@3$14..@>$14=$13-$11-($7-@-1$7)::@3$12..@>$12=$7-@-1$7

I wanted to graph this with Gnuplot, but it turns out that Gnuplot is difficult to integrate with Emacs on Microsoft Windows. I gave up after a half an hour of poking at it, since search results indicated there were long-standing problems with how Gnuplot got input from Emacs. Besides, I'd been meaning to learn more R anyway, and R is more powerful when it comes to statistics and data visualization.

Getting R to work with Org Mode babel blocks in Emacs on Windows was a challenge. Here are some of the things I ran into.

The first step was easy: Add R to the list of languages I could evaluate in a source block (I already had dot and ditaa from previous experiments).

(org-babel-do-load-languages
 'org-babel-load-languages
 '((dot . t)
   (ditaa . t) 
   (R . t)))

But my code didn't execute at all, even when I was trying something that printed out results instead of drawing images. I got a little lost trying to dig into org-babel-execute:R with edebug, eventually ending up in comint.el. The real solution was even easier. I had incorrectly set inferior-R-program-name to the path of R in my configuration, which made M-x R work but which meant that Emacs was looking in the wrong place for the options to pass to R (which Org Babel relied on). The correct way to do this is to leave inferior-R-program-name with the default value (Rterm) and make sure that my system path included both the bin directory and the bin\x64 directory.

Then I had to pick up the basics of R again. It took me a little time to figure out that I needed to parse the columns I pulled in from Org, using strptime to convert the date column and as.numeric to convert the numbers. Eventually, I got it to plot some results with the regular plot command.

dates <- strptime(as.character(data$Date), "%Y-%m-%d")
tasks_done <- as.numeric(data$DONE)
tasks_uncancelled <- as.numeric(data$Total) - as.numeric(data$CANC.)
df <- data.frame(dates, tasks_done, tasks_uncancelled)
plot(x=dates, y=tasks_uncancelled, ylim=c(0,max(tasks_uncancelled)))
lines(x=dates, y=tasks_uncancelled, col="blue", type="o")
lines(x=dates, y=tasks_done, col="green", type="o")

I wanted prettier graphs, though. I installed the ggplot2 package and started figuring it out. No matter what I did, though, I ended up with a blank white image instead of my graph. If I used M-x R instead of evaluating the src block, the code worked. Weird! Eventually I found out that adding print(...) around my ggplot made it display the image correctly. Yay! Now I had what I wanted.

library(ggplot2)
dates <- strptime(as.character(data$Date), "%Y-%m-%d")
tasks_done <- as.numeric(data$DONE)
tasks_uncancelled <- as.numeric(data$Total) - as.numeric(data$CANC.)
df <- data.frame(dates, tasks_done, tasks_uncancelled)
plot = ggplot(data=df, aes(x=dates, y=tasks_done, ymin=0)) + geom_line(color="#009900") + geom_point() + geom_line(aes(y=tasks_uncancelled), color="blue") + geom_point(aes(y=tasks_uncancelled))
print(plot)

r-graph

The blue line represents the total number of tasks (except for the cancelled ones), and the green line represents tasks that are done.

Here's something that looks a little more like a burn down chart, since it shows just the number of things to be done:

library(ggplot2)
dates <- strptime(as.character(data$Date), "%Y-%m-%d")
tasks_remaining <- as.numeric(data$Total) - as.numeric(data$CANC.) - as.numeric(data$DONE)
df <- data.frame(dates, tasks_remaining)
plot = ggplot(data=df, aes(x=dates, y=tasks_remaining, ymin=0)) + geom_line(color="#009900") + geom_point()
print(plot)

The drastic decline there is me realizing that I had lots of tasks that were no longer relevant, not me being super-productive. =)

As it turns out, I tend to add new tasks at about the rate that I finish them (or slightly more). I think this is okay. It means I'm working on things that have next steps, and next steps, and steps beyond that. If I add more tasks, that gives me more variety to choose from. Besides, I have a lot of repetitive tasks, so those never get marked as DONE over here.

Anyway, cool! Now that I've gotten R to work on my system, you'll probably see it in even more of these blog posts. =D Hooray for Org Babel and R!

Update 2014-05-09: Stephen suggested http://blogs.neuwirth.priv.at/software/2012/03/28/r-and-emacs-with-org-mode/ for more tips on setting up Org Mode with R and Emacs Speaks Statistics (ESS).

You can view 9 comments or e-mail me at sacha@sachachua.com.

Quantified Awesome: Adding calendar heatmaps to categories

Posted: Jul 25, 2013 - Modified: Jul 20, 2013| quantified

It’s amazing how little tweaks give you a whole new sense of the data. I’ve been using Cal-HeatMap to look at my blogging history. I figured I’d build it into Quantified Awesome to make it even easier to analyze how I spend my time. 1.9 hours later, here’s what I have. All totals are reported for the past 12-month period by default (as of this writing, July 19 2012 to July 19 2013, including the day’s activities), but it adjusts depending on the filter settings.

Here’s me working on the Quantified Awesome system:

Instead of just a table of log entries or a summary of numbers, I can see the gaps and sprints in my activity.

Here’s the one for Discretionary – Productive:

Pretty consistent, actually.

and Discretionary – Play:

February must’ve been when I had a new video game to tinker around with. Plenty of opportunities to relax.

Here’s my Business – Earn graph:

and Business – Build:

I’ve been biking pretty regularly, mostly on Tuesdays and Thursdays…

In contrast, I take the subway only if it’s winter or really rainy, if I’m going somewhere far or steeply uphill, or if my bike is flat (as it was yesterday).

Neato. I should definitely do this for groceries too, now that I’ve loaded my grocery receipts into Quantified Awesome! (No public link yet for that data, sorry. =) ) I also want to figure out how to speed things up enough so that I can do quartile analysis and then use that to colour the scale…

Calendar heatmaps for the win!

You can view 3 comments or e-mail me at sacha@sachachua.com.

Mohiomap: A visual way to browse your Evernote notebook

Posted: Jun 25, 2013 - Modified: Jun 24, 2013| geek

Evernote is a great tool for taking notes, but sometimes searching and browsing those notes can get unwieldy if you have thousands of items. For example, searching my notebooks for “evernote” gets me >130 results, which look a little like this in Evernote’s desktop application:

This is great if I can narrow things down with notebooks, keywords, and tags, but wouldn’t it be nice to be able to explore better?

Christian Hirsch (who has been working on quite a few visual interfaces to wikis and knowledgebases) reached out to me about Mohiomap, which links up with your Evernote notebook and lets you see it as an interactive map.

You can click on notes to navigate further and to see a preview in the left sidebar.

You can expand items without closing the previous ones, so it’s a handy tool for exploration. I like the way that they indicate number of other entries with both a thicker line as well as a larger circle – the thicker lines are easier to follow when you’re starting from a node.

The trick with new tools is to figure out how you want to fit them into your workflow. Right now, Mohiomap is a visualization and search tool. What new questions can I ask with this interface? How can I use it to learn more?

Use Mohiomap to find related notes: I like the way it displays links to related notes. The notes are determined using the Evernote API, which seems to take the note source and tags into account. Related notes are difficult to find using the desktop application, so this might be a good way to explore when I’m writing blog posts.
Use Mohiomap when searching for something that will have hits in multiple notebooks, if I want to group by notebook: Mohio’s search interface organizes the first layer of results by notebook. If I used notebooks more, then this might be a good way to browse through my search results. I tend to use tags, though. Oh well!
Use Mohiomap to encourage myself to tag more, and to fix my tags. Mohiomap shows tags that are connected with each other, so that might be a way to identify overlapping tags. This is slighly less useful with a small result set (30 notes don’t have much overlap), but maybe it will become more useful later. It also lets you draw lines from notes to tags in order to add a tag to a note, and maybe this will evolve into more tagging features.

It looks like the first use (browsing through related notes) might be the most relevant for me. Let’s see how well Evernote’s recommendation algorithm works!

Other thoughts: Plus points for making the back button work and keeping graphs individually bookmarkable. =) I’d love to be able to add more search results, like viewing 50 or 100 at a time – or viewing a graph of the tags in my entire Evernote knowledgebase, which would be nifty. Dynamic force-directed networks can be disconcerting because of the motion. It might be great to have different views of it in addition to the current interface – maybe something more constrained like the way FreeMind or thebrain.com work?

UserVoice appears to be the place for suggestions related to Mohiomap. Looking forward to seeing this grow, and any other apps that visualize your data!

You can view 1 comment or e-mail me at sacha@sachachua.com.

Visualization resources

Nov 8, 2011

One of my coworkers asked me if I knew interesting examples of visualizations. I mentioned quite a few sites and she found them super-helpful (like, give-Sacha-a-hug helpful! =) ). Just in case you find these handy: (no hugs required)

Flowing Data is one of my favourite blogs for data graphics inspiration. Data Visualization is cool, too.

IBM Many Eyes
This collaborative visualization project makes coming up with charts and graphs so much easier. Lots of data sets and lots of examples to explore, too. Note: don’t upload private data.

Protovis has a graphing library and a gallery of pretty examples. I’d love to play around with graphs like this. RaphaelJS has a few examples, too. Graphing libraries generally do.

Hans Rosling shows you can do play-by-play commentary for statistics and have people on the edge of their seats.

OKCupid visualizations are fascinating. It turns out that one can get all sorts of insights out of a massive online dating database. The blog posts are cleverly written and often include practical tips, like this one on profile picture attractiveness, camera types, flash, depth of field, and time of day. They have mind-boggling data. You may not want to open the blog posts in a school or work context, though.

What are your favourite sources for visualization inspiration?

You can view 1 comment or e-mail me at sacha@sachachua.com.

Learning from my mood data

Apr 18, 2011| geek, quantified

One of the unexpected benefits of switching my phone plan to something that includes unlimited international texting is that I can participate in nifty things like Experimonth, which is a month-long study about moods. I get regular text messages prompting me to rate my happiness on a scale of 1-10, and it graphs it for me. I can probably come up with similar graphs using KeepTrack and a bit of spreadsheet magic, but the convenience and the social data make this fun and interesting.

Here’s how my mood data stacks up so far:

I stay on a fairly even keel, with awesome happy experiences possibly any day of the week. Hmm, maybe I should track text notes too, so I can get a better handle on what causes the 10s or the 6s. It might also be interesting to combine the happiness ratings with my time analyses to see if there any correlations.

Here are the results they’ve collected so far:

You can e-mail me at sacha@sachachua.com.

Visualization of my blog categories

Posted: Feb 13, 2010 - Modified: Feb 9, 2010| visualization, blogging, visual

This visualizes how often I blogged something with a tag in a given year, sorted by all-time popularity. There are more categories, but I skipped them. The height of each block represents how many blog posts I wrote in that category, while the different blocks represent the years, ending with 2010 at the far right. The graph reflects changing interests and recurring themes.

This visualizes some of the things I’ve been writing about in 2010. We’re only a month in, so the last line is pretty small, and in some cases (n < 4) not even visible.

Sparkline bar graphs created with Sparklines for Excel. Initial categories table created with the following SQL incantation:

select p.post_date, p.post_title, terms.name from wp_posts p inner join wp_term_relationships tr on p.id=tr.object_id inner join wp_term_taxonomy tt on tr.term_taxonomy_id=tt.term_taxonomy_id inner join wp_terms terms on tt.term_id=terms.term_id into outfile '/tmp/categories.csv';

then imported and tweaked in Microsoft Excel.

You can view 5 comments or e-mail me at sacha@sachachua.com.

Harvesting the backchannel bazaar of insights

Feb 10, 2010| ibm, presentation, speaking

One of the things I love about virtual presentations is the richness of the backchannel conversation — the chat that accompanies a presentation. When people don’t have to worry about interrupting others and they’re free to discuss things in parallel, the conversation explodes.

It can be overwhelming for speakers and participants alike, but it’s a great way to capture a lot of insights, answer many, many questions, and start an ongoing conversation.

A few weeks ago, I gave a presentation on microblogging. There were 150+ participants. 51 people actively used the chat to share their thoughts during the presentation, typing in 461 messages in total. Topics ranged from beginner questions about getting started to advanced questions involving multiple tools.

I saved the chat transcript and uploaded it along with my session materials. Another participant converted the text transcript into a spreadsheet that also summarized messages by author. The spreadsheet also tagged replies with the ID of the person being replied to.

I reviewed the chat spreadsheet and categorized useful messages, assigning the following keywords:

Value: related to the value of microblogging (13 messages)
Process: incorporating it into your day (15 messages)
Network: growing your network (12 messages)
Tools: discussion of specific tools to make things easier (26 messages)
Challenges: what’s difficult and how to deal with it (15 messages)
Adoption: meta-conversation about microblogging (10 messages)
Personas: managing multiple personas (10 messages)
Takeaways: short summary (14 messages)
Next: things to explore next (12 messages)

There were many messages I didn’t categorize because they repeated information, were related to the teleconference itself, or were part of the general back-and-forth.

As usual, IBMers like talking about tools and sharing tool-related tips. You should’ve seen us during Dan Roam’s presentation on the Back of the Napkin – we were fascinated by the drawing tools he used! ;)

It’s interesting to see how people cluster around topics, too. When I look at the spreadsheet, I can see who cares a lot about adoption, who’s interested in personas, etc.

I’m sure there’s been research on the analysis of conversations. The backchannel is like Internet relay chat (IRC), after all, and IRC has been around for decades. I wonder how the real-time extra channel of speaking influences the flow of the backchannel and vice versa. I wonder how we can get better at picking up ideas and following up on them. I wonder how we can get better at strengthening the newly-formed connections.

In a real-life presentation, it would be difficult to have all these conversations and to get this kind of insight into what people care about. A presentation backchannel where people can chat is an incredibly powerful tool, and I’m looking forward to helping learn more about making the most of it!