Finding missing dates in PostgreSQL

Posted: - Modified: | geek

My analytics numbers were way off from what I expected them to be. When I did a day-by-day comparison of my numbers and the reference set of numbers, I realized that a few weeks of data were missing from the year of data I was analyzing – a couple of days here, two weeks there, and so on. I manually identified the missing dates so that I could backfill the data. Since this was the second time I ran into that problem, though, I realized I needed a better way to catch this error and identify gaps.

Initially, I verified the number of days in my PostgreSQL database table with a SQL statement along the lines of:

SELECT year, month, COUNT(*) AS num_days FROM
(SELECT date_part('year', day_ts) AS year,
 date_part('month', day_ts) AS month,
 day_ts FROM (SELECT DISTINCT day_ts FROM table_with_data) AS temp) AS temp2
ORDER BY year, month

I checked each row to see if it matched the number of days in the month.

It turns out there’s an even better way to look for missing dates. PostgreSQL has a generate_sequence command, so you can do something like this:

SELECT missing_date
FROM generate_series('2015-01-01'::date, CURRENT_DATE - INTERVAL '1 day') missing_date
WHERE missing_date NOT IN (SELECT DISTINCT day_ts FROM table_with_data)
ORDER BY missing_date

Neat, huh?

You can view 2 comments or e-mail me at sacha@sachachua.com.

2 comments

This was the trick I needed, and your post saved me some time. Thanks!

I do believe you have a typo in the second code block, though:

CURRENT_DATE - INTERVAL '1 day'. The hyphen should be a comma.

Hope life remains awesome.

Oh! That's because my data extract script always copied over data excluding the current day (the analytics data I was copying from was batch-updated every night), so I didn't want to see today listed as a missing date. =)