Tags: sql

RSS - Atom - Subscribe via email

Finding missing dates in PostgreSQL

Posted: - Modified: | geek

My analytics numbers were way off from what I expected them to be. When I did a day-by-day comparison of my numbers and the reference set of numbers, I realized that a few weeks of data were missing from the year of data I was analyzing – a couple of days here, two weeks there, and so on. I manually identified the missing dates so that I could backfill the data. Since this was the second time I ran into that problem, though, I realized I needed a better way to catch this error and identify gaps.

Initially, I verified the number of days in my PostgreSQL database table with a SQL statement along the lines of:

SELECT year, month, COUNT(*) AS num_days FROM
(SELECT date_part('year', day_ts) AS year,
 date_part('month', day_ts) AS month,
 day_ts FROM (SELECT DISTINCT day_ts FROM table_with_data) AS temp) AS temp2
ORDER BY year, month

I checked each row to see if it matched the number of days in the month.

It turns out there’s an even better way to look for missing dates. PostgreSQL has a generate_sequence command, so you can do something like this:

SELECT missing_date
FROM generate_series('2015-01-01'::date, CURRENT_DATE - INTERVAL '1 day') missing_date
WHERE missing_date NOT IN (SELECT DISTINCT day_ts FROM table_with_data)
ORDER BY missing_date

Neat, huh?