What it’s like to work with data

How did I learn to work with data?

I learned the basics of SQL in high school, I think. In university, I got most of my kicks from the extracurricular projects I worked on because doing so let me hang out with interesting people. As those people graduated, I moved to handling those systems on my own. Blogging have me another reason to explore data analysis, since I was curious about my stats. Eventually, with Quantified Self, I started collecting and scraping my own data.

I do a lot of data analysis and report creation as part of my social business consulting. It has deepened my appreciation of database indexes, subqueries, common table expressions, recursive queries, caching tables, arrays, partitioned queries, string manipulation with regular expressions, and visualization tools. I’d love to get together with other social business data geeks so that we could swap analysis questions and techniques, but we’d need to get approval for sharing data or set up a sanitization protocol that my clients would be comfortable with. We’re doing some pretty cool stuff.
What is it like when my clients ask me data questions, or when I think of a question I’d like to explore?
I start by thinking of whether we have the data to answer that question, or how I can collect the data we need. I think about whether there are similar questions that are easier to answer. Then I start thinking about how to bring everything together: which tables, which joins, which conditions. Sometimes I have to use subqueries to combine the data. I’m getting into the habit of using common table expressions to make those easier to read. I feel satisfied when I can connect everything in a way that makes sense to me. I also like seeing the common threads among different questions, and turning those insights into parameterized reports.
Sometimes the first report I make fits the situation perfectly. Other times, we go back and forth a little to figure out what the real question is. I really appreciate it when other people help me sanity-check the numbers, because I occasionally overlook things. I’d like to get better at catching those errors.
Once the report settles down, I can think about the performance. Sometimes it’s as simple as adding an index or creating a table that caches complex calculations. Other times, I might need to modify the presentation or the question a little.
In addition to making my reports more reliable, I’d like to get better at visualizing the data so that people can get an intuitive feel for what’s going on.
I also want to get better at making inferences based on the data, especially when it comes to teasing out time-delayed or multivariate factors. I think my data sets are usually too small for things like that, though.
Anyway, that’s what it’s like to enjoy crunching the numbers. I love being able to do it, and I like exploring the kinds of questions that people imagine. =)