Using timeseries to simplify calculations
Say you want to sum up some figures per something, say per day, but there are holes in the events, which will lead to holes in the results.Timeseries to the rescue!
(Note, in this example we will use MySql/MariaDB, but the general idea is applicable to all databases.)
Lets use the following as a starting point:
INSERT INTO totals(datum, number_of_events, minutes_used)
SELECT datum, count(1), sum(minutes_used)
FROM events
WHERE datum BETWEEN date('2020-12-04') AND date('2020-12-10')
GROUP BY datum;
This will work, but whenever there are no events on a particular date there will be nothing inserted!You could work around it with a UNION on the SELECT, and especially if this insert is done for example daily on one single date (e.g. it runs at 00:00 and sums up yesterday) a UNION is a common solution.
But there is a more elegant way that works especially well if the date-range is larger.
We start by making a table that will hold all the dates that we will work with.
DROP TABLE IF EXISTS system_dates;
CREATE TABLE system_dates (
id int(10) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
datum date NOT NULL, UNIQUE (datum)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Then we fill it
INSERT INTO system_dates (datum)
SELECT a.datum AS d FROM (
SELECT curdate() - INTERVAL (-5000 + a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a)) DAY as datum
FROM
(SELECT 9 as a UNION ALL SELECT 8 UNION ALL SELECT 7 UNION ALL SELECT 6 UNION ALL SELECT 5 UNION ALL SELECT 4 UNION ALL SELECT 3 UNION ALL SELECT 2 UNION ALL SELECT 1 UNION ALL SELECT 0) AS a
CROSS JOIN (SELECT 9 as a UNION ALL SELECT 8 UNION ALL SELECT 7 UNION ALL SELECT 6 UNION ALL SELECT 5 UNION ALL SELECT 4 UNION ALL SELECT 3 UNION ALL SELECT 2 UNION ALL SELECT 1 UNION ALL SELECT 0) AS b
CROSS JOIN (SELECT 9 as a UNION ALL SELECT 8 UNION ALL SELECT 7 UNION ALL SELECT 6 UNION ALL SELECT 5 UNION ALL SELECT 4 UNION ALL SELECT 3 UNION ALL SELECT 2 UNION ALL SELECT 1 UNION ALL SELECT 0) AS c
CROSS JOIN (SELECT 9 as a UNION ALL SELECT 8 UNION ALL SELECT 7 UNION ALL SELECT 6 UNION ALL SELECT 5 UNION ALL SELECT 4 UNION ALL SELECT 3 UNION ALL SELECT 2 UNION ALL SELECT 1 UNION ALL SELECT 0) AS d
) a
LEFT OUTER JOIN system_dates d ON d.datum = a.datum
WHERE d.datum is null;
I can imagine that this looks a bit weird and overcomplicated, but just bear with me, it is actually quite simple.The four cross-joins create all combinations from 0 to 9999, then by taking the curdate() as a starting point and going back 5000 days, we create a window from now - 5000 days to now + 4999 days. That is over 13 years in the past and 13 years in the future. If you need a larger window, add another cross join.
The left outer join and where d.date is null are so that this statement can be run regularly and it will just add the missing dates where needed.
Remember, this table is created once, and the addition of dates is done just now and then (e.g. in a weekly cron job)
So, now the good bit. Remember our original totals query? It looked like this:
INSERT INTO totals(datum, number_of_events, minutes_used)
SELECT datum, count(1), sum(minutes_used)
FROM events
WHERE datum BETWEEN date('2020-12-04') AND date('2020-12-10')
GROUP BY datum;
We can use the system_dates table as a starting point and simply left outer join onto it!
INSERT INTO totals(datum, number_of_events, minutes_used)
SELECT system_dates.datum, count(events.id), sum(ifnull(minutes_used, 0))
FROM system_dates
LEFT OUTER JOIN events ON events.datum = system_dates.datum
WHERE system_dates.datum BETWEEN date('2020-12-04') AND date('2020-12-10')
GROUP BY system_dates.datum
Fixed and done.Note: The ifnull(something, 0) converts the NULL's of missing events (on days there are none) into 0's.
Note2: Make sure there is an index on the datum field, or a composite index with the datum field as its first part.
It is also very easy to expand this to group on other elements that are listed in another table, by simply joining to one (or several!) extra tables:
INSERT INTO totals(user_id, datum, number_of_events, minutes_used)
SELECT users.user_id, system_dates.datum, count(events.id), sum(ifnull(minutes_used, 0))
FROM system_dates
CROSS JOIN users
LEFT OUTER JOIN events ON events.datum = system_dates.datum AND events.user_id = users.user_id
WHERE system_dates.datum BETWEEN date('2020-12-04') AND date('2020-12-10')
GROUP BY system_dates.datum, users.user_id
The power of a cross join! Note the extra condition in the left outer join to events, we now also group on user_id, so we are only interested to count events for that user_id.
I have used this method to simplify and dramatically speed up many complicated calculations for reporting engines and the like. Perhaps you can use this to your advantage too.
Ellert van Koperen, December 2020.