Embedding a date into a Serial
part 2: updating the sequence correctly
This article uses PostgreSQL as the primary database. It is probably possible to use the same technique in other database systems too.Update: it came to my attention that the universally used 'now()' statements may or may not return the actual current time at the instant the function is called. To avoid any confusion I therefore changed all uses of 'now()' into 'CURRENT_DATE', which is guaranteed to return the start time of the current transaction.
After the decision has been made to employ the trick of embedding a date into a sequence (or bigserial, which is more or less the same) as explained here, some thought must go into ensuring the correct values will be coming out of this sequence.
Note: as these methods work with direct access to the sequence, the more modern bigint GENERATED ALWAYS AS IDENTITY need a bespoke trick that i address in part 3.
Let me get something out of the way first:
Q: why not simply update the sequence from a cron job at midnight?
A: this can not be accurate. If there is an insert done just after 00:00 and the cronjob has not run yet, then this row will have the wrong date in its sequence, and may even land in an incorrect partition. That is just sloppy and not acceptable.
Lets first set up a test-bed for the (speed-)testing of the different methods.
I created a PostgreSQL Docker container using https://hub.docker.com/_/postgres and spun it up using
docker run --name postgrescontainer -p 5432:5432 -e POSTGRES_PASSWORD=*************** -d postgres
This runs on my late 2014 Macbook (2.2 GHz Core i7), so not the fastest hardware, nor an exact replication of a professional hosting environment, but that is quite irrelevant when comparing differences, as you will see soon enough.
Setting up some prerequisites:
CREATE EXTENSION IF NOT EXISTS pgcrypto;
CREATE OR REPLACE FUNCTION sha1(text) returns text as $$
SELECT encode(digest($1, 'sha1'), 'hex')
$$ LANGUAGE SQL STRICT IMMUTABLE;
This creates an easy to use sha1 function, that helps nicely when generating complex kind-of random data.Baseline
Lets first define the baselines. We create a simple table and a more complex table (with more columns and more indexes) and in both insert one million rows of random data.This is the part that must be re-run every test. To save space I will not copy-paste it in every code example below, but dont forget to re-run this when doing the tests yourself.
DROP TABLE if exists test;
CREATE TABLE test
( id bigserial primary key,
something text,
tstamp timestamp default now() );
DROP TABLE if exists testw;
CREATE TABLE testw
( id bigserial primary key,
something text,
somenumber int,
somethingelse text,
tstamp timestamp default now() );
CREATE INDEX ON testw (something, somenumber);
CREATE INDEX ON testw (somenumber, somethingelse);
CREATE INDEX ON testw (somethingelse, tstamp);
CREATE INDEX ON testw (tstamp, id);
Perform the baseline test:
INSERT INTO test(something, tstamp)
SELECT concat('nummer ', i::text), '2000-01-01'::timestamp + interval '2' day * (i::int)
FROM generate_series(1,1000000) as t(i);
-- 1,000,000 rows affected in 5 s 454 ms
INSERT INTO testw (something, somenumber, somethingelse, tstamp)
SELECT concat('nummer ', i::text), i::int, sha1(i::text),
'2000-01-01'::timestamp + interval '2' day * (i::int)
FROM generate_series(1,1000000) as t(i);
-- 1,000,000 rows affected in 31 s 468 ms
Results | Simple table | More complex table |
---|---|---|
Baseline | 5 s 454 ms | 31 s 468 ms |
Method one: always update the sequence
In this bruteforce method we simply update the sequence to have the current date embedded, whether it is needed or not.-- recreate test table, and then
CREATE OR REPLACE FUNCTION test__BI() RETURNS trigger AS $$
BEGIN
PERFORM setval('public.test_id_seq'::regclass,
(currval('public.test_id_seq'::regclass)::bigint
% 10000000000000::bigint
+ ( (EXTRACT(year from CURRENT_DATE)::bigint-2000)*10000::bigint
+ (EXTRACT(month from CURRENT_DATE)::bigint)*100::bigint
+ (EXTRACT(day from CURRENT_DATE)::bigint)
) * 10000000000000::bigint
)::bigint,
true);
NEW.id = currval('public.test_id_seq'::regclass)::bigint;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE TRIGGER test__BI BEFORE INSERT ON test FOR EACH ROW EXECUTE PROCEDURE test__BI();
INSERT INTO test(something, tstamp)
SELECT concat('nummer ', i::text), '2000-01-01'::timestamp + interval '2' day * (i::int)
FROM generate_series(1,1000000) as t(i);
-- 1,000,000 rows affected in 20 s 635 ms
-- recreate testw table, and then
CREATE OR REPLACE FUNCTION testw__BI() RETURNS trigger AS $$
BEGIN
PERFORM setval('public.testw_id_seq'::regclass,
(currval('public.testw_id_seq'::regclass)::bigint
% 10000000000000::bigint
+ ( (EXTRACT(year from CURRENT_DATE)::bigint-2000)*10000::bigint
+ (EXTRACT(month from CURRENT_DATE)::bigint)*100::bigint
+ (EXTRACT(day from CURRENT_DATE)::bigint)
) * 10000000000000::bigint
)::bigint,
true);
NEW.id = currval('public.testw_id_seq'::regclass)::bigint;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE TRIGGER testw__BI BEFORE INSERT ON testw FOR EACH ROW EXECUTE PROCEDURE testw__BI();
INSERT INTO testw (something, somenumber, somethingelse, tstamp)
SELECT concat('nummer ', i::text), i::int, sha1(i::text),
'2000-01-01'::timestamp + interval '2' day * (i::int)
FROM generate_series(1,1000000) as t(i);
-- 1,000,000 rows affected in 47 s 809 ms
Results | Simple table | More complex table |
---|---|---|
Baseline | 5 s 454 ms | 31 s 468 ms |
Always update sequence | 20 s 635 ms | 47 s 809 ms |
On the positive side, the trigger will ensure that the sequence is always set correctly.
Method two: conditionally update the sequence
In this somewhat more subtle method we update the sequence if it is deemed needed.-- recreate test table, and then
CREATE OR REPLACE FUNCTION test__BI() RETURNS trigger AS $$
BEGIN
IF currval('public.test_id_seq'::regclass)::bigint <
( (EXTRACT(year from CURRENT_DATE)-2000)::bigint*10000::bigint
+ EXTRACT(month from CURRENT_DATE)::bigint*100::bigint
+ EXTRACT(day from CURRENT_DATE)::bigint
)::bigint * 10000000000000::bigint
THEN
PERFORM setval('public.test_id_seq'::regclass,
(currval('public.test_id_seq'::regclass)::bigint % 10000000000000::bigint
+ ( (EXTRACT(year from CURRENT_DATE)-2000)::bigint*10000::bigint
+ EXTRACT(month from CURRENT_DATE)::bigint*100::bigint
+ EXTRACT(day from CURRENT_DATE)::bigint
)::bigint * 10000000000000::bigint
)::bigint,
true);
NEW.id = currval('public.test_id_seq'::regclass)::bigint;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE TRIGGER test__BI BEFORE INSERT ON test FOR EACH ROW EXECUTE PROCEDURE test__BI();
INSERT INTO test(something, tstamp)
SELECT concat('nummer ', i::text), '2000-01-01'::timestamp + interval '2' day * (i::int)
FROM generate_series(1,1000000) as t(i);
-- 1,000,000 rows affected in 9 s 745 ms
-- recreate testw table, and then
CREATE OR REPLACE FUNCTION testw__BI() RETURNS trigger AS $$
BEGIN
IF currval('public.testw_id_seq'::regclass)::bigint <
(( (EXTRACT(year from CURRENT_DATE)-2000)*10000
+ EXTRACT(month from CURRENT_DATE)*100
+ EXTRACT(day from CURRENT_DATE)
)::bigint * 10000000000000::bigint)
THEN
PERFORM setval('public.testw_id_seq'::regclass,
(currval('public.testw_id_seq'::regclass)::bigint
% 10000000000000::bigint
+ ( (EXTRACT(year from CURRENT_DATE)-2000)::bigint * 10000::bigint
+ EXTRACT(month from CURRENT_DATE)::bigint * 100::bigint
+ EXTRACT(day from CURRENT_DATE)::bigint
)::bigint * 10000000000000::bigint
)::bigint,
true);
NEW.id = currval('public.testw_id_seq'::regclass)::bigint;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE TRIGGER testw__BI BEFORE INSERT ON testw FOR EACH ROW EXECUTE PROCEDURE testw__BI();
INSERT INTO testw (something, somenumber, somethingelse, tstamp)
SELECT concat('nummer ', i::text), i::int, sha1(i::text),
'2000-01-01'::timestamp + interval '2' day * (i::int)
FROM generate_series(1,1000000) as t(i);
-- 1,000,000 rows affected in 36 s 407 ms
Results | Simple table | More complex table |
---|---|---|
Baseline | 5 s 454 ms | 31 s 468 ms |
Always update sequence | 20 s 635 ms | 47 s 809 ms |
Conditionally update sequence | 9 s 745 ms | 36 s 407 ms |
On the positive side, the trigger will ensure that the sequence is always set correctly and the check shaves off 2/3rd of the overhead. Not bad.
Method three: only at certain times conditionally update the sequence
This is the most evolved method. Using this trick shaves off a little extra overhead, but does rely on one additional step, as explained below.-- recreate test table, and then
CREATE OR REPLACE FUNCTION test__BI() RETURNS trigger AS $$
BEGIN
IF ((EXTRACT(hour from CURRENT_DATE) = 0) AND (EXTRACT(minute from CURRENT_DATE) < 10)) OR (currval('public.test_id_seq'::regclass)::bigint < 10000000000000::bigint) THEN
IF currval('public.test_id_seq'::regclass)::bigint <
( (EXTRACT(year from CURRENT_DATE)-2000)::bigint*10000::bigint
+ EXTRACT(month from CURRENT_DATE)::bigint*100::bigint
+ EXTRACT(day from CURRENT_DATE)::bigint
)::bigint * 10000000000000::bigint
THEN
PERFORM setval('public.test_id_seq'::regclass,
(currval('public.test_id_seq'::regclass)::bigint % 10000000000000::bigint
+ ( (EXTRACT(year from CURRENT_DATE)-2000)::bigint*10000::bigint
+ EXTRACT(month from CURRENT_DATE)::bigint*100::bigint
+ EXTRACT(day from CURRENT_DATE)::bigint
)::bigint * 10000000000000::bigint
)::bigint,
true);
NEW.id = currval('public.test_id_seq'::regclass)::bigint;
END IF;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE TRIGGER test__BI BEFORE INSERT ON test FOR EACH ROW EXECUTE PROCEDURE test__BI();
truncate test;
INSERT INTO test(something, tstamp)
SELECT concat('nummer ', i::text), '2000-01-01'::timestamp + interval '2' day * (i::int)
FROM generate_series(1,1000000) as t(i);
-- 1,000,000 rows affected in 7 s 680 ms
Have a good look at the code. The first bit of the first IF statement restricts any further code from running outside of the allowed timewindow between 00:00 and 00:10. This is intentional.-- recreate test table, and then
CREATE OR REPLACE FUNCTION testw__BI() RETURNS trigger AS $$
BEGIN
IF ((EXTRACT(hour from CURRENT_DATE) = 0) AND (EXTRACT(minute from CURRENT_DATE) < 10)) OR (currval('public.testw_id_seq'::regclass)::bigint < 10000000000000::bigint) THEN
IF currval('public.testw_id_seq'::regclass)::bigint <
(( (EXTRACT(year from CURRENT_DATE)-2000)*10000
+ EXTRACT(month from CURRENT_DATE)*100
+ EXTRACT(day from CURRENT_DATE)
)::bigint * 10000000000000::bigint)
THEN
PERFORM setval('public.testw_id_seq'::regclass,
(currval('public.testw_id_seq'::regclass)::bigint
% 10000000000000::bigint
+ ( (EXTRACT(year from CURRENT_DATE)-2000)::bigint * 10000::bigint
+ EXTRACT(month from CURRENT_DATE)::bigint * 100::bigint
+ EXTRACT(day from CURRENT_DATE)::bigint
)::bigint * 10000000000000::bigint
)::bigint,
true);
NEW.id = currval('public.testw_id_seq'::regclass)::bigint;
END IF;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE TRIGGER testw__BI BEFORE INSERT ON testw FOR EACH ROW EXECUTE PROCEDURE testw__BI();
INSERT INTO testw (something, somenumber, somethingelse, tstamp)
SELECT concat('nummer ', i::text), i::int, sha1(i::text),
'2000-01-01'::timestamp + interval '2' day * (i::int)
FROM generate_series(1,1000000) as t(i);
-- 1,000,000 rows affected in 34 s 072 ms
Also here the first bit of the first IF statement restricts any further code from running outside of the allowed timewindow between 00:00 and 00:10. This is intentional.This gives an either external or internal cronjob time to run and update the sequences where needed for sequences that did not see an update already, and if an update is done in the critical timewindow between midnight and 10 minutes after midnight these will always be checked and modified where needed before being inserted.
This cronjob can update the sequence with a function similar to what was used in Method one, where a PERFORM setval simply sets the sequence correctly and that is that.
It is important that this cronjob does run however, if there is an issue there and it does NOT run for whatever reason, this may lead to incorrect values in the sequence, which would be mean inconsistent data !
Results | Simple table | More complex table |
---|---|---|
Baseline | 5 s 454 ms | 31 s 468 ms |
Always update sequence | 20 s 635 ms | 47 s 809 ms |
Conditionally update sequence | 9 s 745 ms | 36 s 407 ms |
Only at certain times conditionally update the sequence | 7 s 680 ms | 34 s 072 ms |
Method four: change the sequence-value on the fly
In this even more subtle method we change the sequence-value on the fly, i.e. we directly pad the data into it.-- for this method we must change something to the creation of the table
DROP TABLE IF EXISTS test;
DROP SEQUENCE IF EXISTS test_serial;
CREATE SEQUENCE test_serial START 1;
CREATE OR REPLACE FUNCTION gen_test_key() RETURNS BIGINT AS $$
DECLARE
new_id bigint;
BEGIN
new_id = (nextval('public.test_serial'::regclass)::bigint % 10000000000000::bigint
+ ( (EXTRACT(year from CURRENT_DATE)-2000)::bigint * 10000::bigint
+ EXTRACT(month from CURRENT_DATE)::bigint * 100::bigint
+ EXTRACT(day from CURRENT_DATE)::bigint
)::bigint * 10000000000000::bigint
)::bigint;
RETURN new_id;
END;
$$ LANGUAGE plpgsql;
CREATE TABLE test
( id bigint primary key default gen_test_key(),
something text,
tstamp timestamp default now());
INSERT INTO test(something, tstamp)
SELECT concat('nummer ', i::text), '2000-01-01'::timestamp + interval '2' day * (i::int)
FROM generate_series(1,1000000) as t(i);
-- 1,000,000 rows affected in 7 s 609 ms
-- for this method we must change something to the creation of the table
DROP TABLE IF EXISTS testw;
DROP SEQUENCE IF EXISTS testw_serial;
CREATE SEQUENCE testw_serial START 1;
CREATE OR REPLACE FUNCTION gen_testw_key() RETURNS BIGINT AS $$
DECLARE
new_id bigint;
BEGIN
new_id = (nextval('public.testw_serial'::regclass)::bigint % 10000000000000::bigint
+ ( (EXTRACT(year from CURRENT_DATE)-2000)::bigint * 10000::bigint
+ EXTRACT(month from CURRENT_DATE)::bigint * 100::bigint
+ EXTRACT(day from CURRENT_DATE)::bigint
)::bigint * 10000000000000::bigint
)::bigint;
RETURN new_id;
END;
$$ LANGUAGE plpgsql;
CREATE TABLE testw
( id bigint primary key default gen_testw_key(),
something text,
somenumber int,
somethingelse text,
tstamp timestamp default now());
CREATE INDEX ON testw (something, somenumber);
CREATE INDEX ON testw (somenumber, somethingelse);
CREATE INDEX ON testw (somethingelse, tstamp);
CREATE INDEX ON testw (tstamp, id);
INSERT INTO testw (something, somenumber, somethingelse, tstamp)
SELECT concat('nummer ', i::text), i::int, sha1(i::text), '2000-01-01'::timestamp + interval '2' day * (i::int)
FROM generate_series(1,1000000) as t(i);
-- 1,000,000 rows affected in 33 s 667 ms
In a previous iteration of this method a trigger was used to change the id before inserting it, but by simply using the function directly as the default value it became twice as quick.The only downside is that with this method it is possible to insert incorrect id values. That would be a stupid thing to do, but can not be prevented without an additional trigger.
Results | Simple table | More complex table |
---|---|---|
Baseline | 5 s 454 ms | 31 s 468 ms |
Always update sequence | 20 s 635 ms | 47 s 809 ms |
Conditionally update sequence | 9 s 745 ms | 36 s 407 ms |
Only at certain times conditionally update the sequence | 7 s 680 ms | 34 s 072 ms |
Directly pad the sequence | 7 s 609 ms | 33 s 667 ms |
Note: adding-in the date on the fly leaves the sequence as-is, which you may like or dislike, depending on your preferences and standards.
Conclusion
Of all the proposed techniques, method 1 (always update sequence) is too simplistic, and off the table.Method 2 (conditionally update sequence) is a robust method that I would be happy to use without ever having to worry about it anymore.
Method 3 (only at certain times conditionally update the sequence) has the advantage that it performs better then method 2, but it does add complexity, which can break.
Method 4 (directly pad the sequence) is an elegant method which surprisingly is also the fastest method. The only downside is that the sequence itself is not updated, but if that is something that you can live with, this method is definitely a very good choice.
Ellert van Koperen, original article created on September 2022, updated on February 2023.
There is a part 3 to this article, solving the same issue for generated always as identity fields.