More posts about: geek Tags: analytics, postgresql, sql // 2 Comments »
My analytics numbers were way off from what I expected them to be. When I did a day-by-day comparison of my numbers and the reference set of numbers, I realized that a few weeks of data were missing from the year of data I was analyzing – a couple of days here, two weeks there, and so on. I manually identified the missing dates so that I could backfill the data. Since this was the second time I ran into that problem, though, I realized I needed a better way to catch this error and identify gaps.
Initially, I verified the number of days in my PostgreSQL database table with a SQL statement along the lines of:
SELECT year, month, COUNT(*) AS num_days FROM (SELECT date_part('year', day_ts) AS year, date_part('month', day_ts) AS month, day_ts FROM (SELECT DISTINCT day_ts FROM table_with_data) AS temp) AS temp2 ORDER BY year, month
I checked each row to see if it matched the number of days in the month.
It turns out there’s an even better way to look for missing dates. PostgreSQL has a
generate_sequence command, so you can do something like this:
SELECT missing_date FROM generate_series('2015-01-01'::date, CURRENT_DATE - INTERVAL '1 day') missing_date WHERE missing_date NOT IN (SELECT DISTINCT day_ts FROM table_with_data) ORDER BY missing_date