img

Using timeseries to simplify calculations

Say you want to sum up some figures per something, say per day, but there are holes in the events, which will lead to holes in the results.
Timeseries to the rescue!

(Note, in this example we will use MySql/MariaDB, but the general idea is applicable to all databases.)

Lets use the following as a starting point:
INSERT INTO totals(datum, number_of_events, minutes_used) 
	SELECT datum, count(1), sum(minutes_used) 
	FROM events 
	WHERE datum BETWEEN date('2020-12-04') AND date('2020-12-10') 
	GROUP BY datum;
This will work, but whenever there are no events on a particular date there will be nothing inserted!
You could work around it with a UNION on the SELECT, and especially if this insert is done for example daily on one single date (e.g. it runs at 00:00 and sums up yesterday) a UNION is a common solution.
But there is a more elegant way that works especially well if the date-range is larger.

We start by making a table that will hold all the dates that we will work with.
DROP TABLE IF EXISTS system_dates;
CREATE TABLE system_dates (
  id int(10) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  datum date NOT NULL, UNIQUE (datum)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Then we fill it
INSERT INTO system_dates (datum)
	SELECT a.datum AS d FROM (
		SELECT curdate() - INTERVAL (-5000 + a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a)) DAY as datum
		FROM 
			(SELECT 9 as a UNION ALL SELECT 8 UNION ALL SELECT 7 UNION ALL SELECT 6 UNION ALL SELECT 5 UNION ALL SELECT 4 UNION ALL SELECT 3 UNION ALL SELECT 2 UNION ALL SELECT 1 UNION ALL SELECT 0) AS a
			CROSS JOIN (SELECT 9 as a UNION ALL SELECT 8 UNION ALL SELECT 7 UNION ALL SELECT 6 UNION ALL SELECT 5 UNION ALL SELECT 4 UNION ALL SELECT 3 UNION ALL SELECT 2 UNION ALL SELECT 1 UNION ALL SELECT 0) AS b
			CROSS JOIN (SELECT 9 as a UNION ALL SELECT 8 UNION ALL SELECT 7 UNION ALL SELECT 6 UNION ALL SELECT 5 UNION ALL SELECT 4 UNION ALL SELECT 3 UNION ALL SELECT 2 UNION ALL SELECT 1 UNION ALL SELECT 0) AS c
			CROSS JOIN (SELECT 9 as a UNION ALL SELECT 8 UNION ALL SELECT 7 UNION ALL SELECT 6 UNION ALL SELECT 5 UNION ALL SELECT 4 UNION ALL SELECT 3 UNION ALL SELECT 2 UNION ALL SELECT 1 UNION ALL SELECT 0) AS d
	) a
	LEFT OUTER JOIN system_dates d ON d.datum = a.datum
	WHERE d.datum is null;
I can imagine that this looks a bit weird and overcomplicated, but just bear with me, it is actually quite simple.
The four cross-joins create all combinations from 0 to 9999, then by taking the curdate() as a starting point and going back 5000 days, we create a window from now - 5000 days to now + 4999 days. That is over 13 years in the past and 13 years in the future. If you need a larger window, add another cross join.
The left outer join and where d.date is null are so that this statement can be run regularly and it will just add the missing dates where needed.
Remember, this table is created once, and the addition of dates is done just now and then (e.g. in a weekly cron job)

So, now the good bit. Remember our original totals query? It looked like this:
INSERT INTO totals(datum, number_of_events, minutes_used) 
	SELECT datum, count(1), sum(minutes_used) 
	FROM events 
	WHERE datum BETWEEN date('2020-12-04') AND date('2020-12-10') 
	GROUP BY datum;

We can use the system_dates table as a starting point and simply left outer join onto it!
INSERT INTO totals(datum, number_of_events, minutes_used) 
	SELECT system_dates.datum, count(events.id), sum(ifnull(minutes_used, 0)) 
	FROM system_dates 
	LEFT OUTER JOIN events ON events.datum = system_dates.datum 
	WHERE system_dates.datum BETWEEN date('2020-12-04') AND date('2020-12-10') 
	GROUP BY system_dates.datum 
Fixed and done.
Note: The ifnull(something, 0) converts the NULL's of missing events (on days there are none) into 0's.
Note2: Make sure there is an index on the datum field, or a composite index with the datum field as its first part.

It is also very easy to expand this to group on other elements that are listed in another table, by simply joining to one (or several!) extra tables:
INSERT INTO totals(user_id, datum, number_of_events, minutes_used) 
	SELECT users.user_id, system_dates.datum, count(events.id), sum(ifnull(minutes_used, 0)) 
	FROM system_dates 
	CROSS JOIN users
	LEFT OUTER JOIN events ON events.datum = system_dates.datum  AND  events.user_id = users.user_id
	WHERE system_dates.datum BETWEEN date('2020-12-04') AND date('2020-12-10') 
	GROUP BY system_dates.datum, users.user_id 
The power of a cross join!
Note the extra condition in the left outer join to events, we now also group on user_id, so we are only interested to count events for that user_id.


I have used this method to simplify and dramatically speed up many complicated calculations for reporting engines and the like. Perhaps you can use this to your advantage too.

Ellert van Koperen, December 2020.