img

Properly setting up PostgreSQL replication between docker containers on MacOS.

Setting up PostgreSQL replication in docker containers on MacOS, is less straightforward then one would assume.
Check out this post for a short list of the most troublesome issues that must be overcome.
I used the Postgres Docker Official Image in this example, but it will work equally well with the official Postgis image.

Start by opening a terminal and creating and entering a working folder.
cd ~/Documents
mkdir cluster_setup.nosync 
cd cluster_setup.nosync
Allready some points of attention here:
1: There is some software that has issues with spaces in paths. Except if you have a space in your username, that is easy to avoid.
2: icloud sync does not play very nice with docker, especially if you delete and recreate files quickly. The .nosync suffix warns icloud sync to mind its own business and stay out of this folder. Ugly, but it works.

Clean up docker in a quite un-subtle way: trash everything that is not currently running.
docker system prune -f
docker volume prune -f
tip: to throw away internal volumes you can look them up with info comands, but as long as the yml file is similar as to when it was started you may also use:
docker compose -f specific_yml_file.yml down -v
this also works if the container is not running. The -v flag is the one killing the internal volumes. Do this if a container failed to properly start and you have trouble with stuff seeming to 'stick around' even after a prune.

Create a few more terminal windows in the same place by executing this 3 times:
osascript -e 'tell app "Terminal" to do script "cd ~/Documents/Database.one/cluster_setup.nosync"'
I usually organize these windows in a 2x2 grid to keep things easy for myself, and they are labelled 1a and 1b on one side, and 2a and 2b on the other side.


Grab the zipfile with the configuration files from here and unzip it into the working folder. There should now be a subdirectory with all the configuration files in it.
If you are curious what is in these files without downloading them, head over to this page.


Right, now that the workarea has been prepared, lets start for real.

on terminal 1a


Clean out the old working directories and re-create them.
rm -rf postgres1 postgres2

mkdir -p postgres1/config
mkdir -p postgres1/init
mkdir -p postgres1/dump/data

mkdir -p postgres2/config
mkdir -p postgres2/init
mkdir -p postgres2/dump/data

copy the files from the template folder to their proper places
cp templates/init.* postgres1/init/ 
cp templates/*.conf postgres1/config/ 
cp templates/pgpass postgres1/config/ 
cp postgres1/config/postgresql_m.conf postgres1/config/postgresql.conf 

cp templates/init.* postgres2/init/ 
cp templates/*.conf postgres2/config/ 
cp templates/pgpass postgres2/config/ 
cp postgres2/config/postgresql_s.conf postgres2/config/postgresql.conf 

cp templates/*.yml .
note that the replica is first built in standalone config.

Create the network that both containers will use, and start the Master node
docker network create postgresnet
docker compose -f docker-compose-1.yml up

on terminal 1b


Enter the running container and do some updates
docker exec -it postgres1 bash
apt update -y
apt install apt-utils vim -y

Check the replication slots
psql -h localhost -U postgresadmin postgresdb -c 'SELECT slot_name, slot_type, temporary, conflicting, active FROM pg_replication_slots;'
password is g_97Tgs2fjru857_dfh
This should give one non-active slot
     slot_name      | slot_type | temporary | conflicting | active 
--------------------+-----------+-----------+-------------+--------
 replication_slot_1 | physical  | f         |             | f
(1 row)


on terminal 2a


Start the Replica-node
docker compose -f docker-compose-2.yml up


on terminal 2b


enter the running Replica container, do some updates and exit again
docker exec -it postgres2 bash
apt update -y
apt install apt-utils vim -y

exit


At this point normally the replica would be filled from the master using pg_basebackup run from the replica, however, for that to work the replica database may NOT be running, and if the database is not running then docker shuts the container down (!!@) (watchdog mechanism) So only because we are running on docker we will have to do things differently.

Second issue: since on MacOS there is a settings-issue with mapping the postgres data directory onto a local folder, we can not dump the Master's data *directly* into the data directory of the Replica. The data has to move across machines with an extra step, and we have to use the 'docker cp' command to get it into the data directory of the Replica. Since pg_basebackup is (probably) not installed on the docker host, the backup has to be created from the Master and then pushed over the network to the Replica, and there placed on top of its data directory.

on terminal 1b


Do the initial dump/backup
pg_basebackup -h localhost -p 5432 -U replicator -c fast -v -X stream -F plain -D /dump/data
touch /dump/data/standby.signal
rm /dump/data/postgresql.auto.conf


on terminal 2b


We copy over the entire dump into a new data-directory with the tar-trick, correct the ownership, and flip the configuration file.
cd postgres1/dump/data
tar -czvf tmp.tar.gz --gname postgres --uname postgres ./* 
docker exec -i postgres2 sh -c 'rm -rf /data/pgdataR && mkdir /data/pgdataR && cd /data/pgdataR && tar -xzvf -' < ./tmp.tar.gz
docker exec -i postgres2 sh -c ' chown postgres /data/pgdataR && chmod 700 /data/pgdataR'
cd ../../..
cp postgres2/config/postgresql_r.conf postgres2/config/postgresql.conf 
docker compose -f docker-compose-2.yml down
container restart is needed, hence the last command.

on terminal 2a


Renaming the datadirectory while the database is running is not possible. The alternative is that we point it to the new directory instead; for that we use the extra yml file.
Start the Replica again:
docker compose -f docker-compose-2R.yml up


That is basically it! Lets check if everything is working.

on terminal 2b


docker exec -it postgres2 bash
psql -h localhost -U postgresadmin postgresdb -c 'SELECT slot_name, slot_type, temporary, conflicting, active FROM pg_replication_slots;'
password is g_97Tgs2fjru857_dfh
This should should give NO replication slot, as this is the Replica.
 slot_name | slot_type | temporary | conflicting | active 
-----------+-----------+-----------+-------------+--------
(0 rows)


on terminal 1b


psql -h localhost -U postgresadmin postgresdb -c 'SELECT slot_name, slot_type, temporary, conflicting, active FROM pg_replication_slots;'
password is g_97Tgs2fjru857_dfh
This should should give one active replication slot.
     slot_name      | slot_type | temporary | conflicting | active 
--------------------+-----------+-----------+-------------+--------
 replication_slot_1 | physical  | f         |             | t
(1 row)

And some more info about the status of the replication:
psql -h localhost -U postgresadmin postgresdb -c '\x' -c 'SELECT * FROM pg_stat_replication;'
password is g_97Tgs2fjru857_dfh
Expanded display is on.
-[ RECORD 1 ]----+------------------------------
pid              | 390
usesysid         | 16385
usename          | replicator
application_name | standby1
client_addr      | 172.18.0.3
client_hostname  | 
client_port      | 39408
backend_start    | 2024-04-15 08:38:22.931351+00
backend_xmin     | 
state            | streaming
sent_lsn         | 0/3000148
write_lsn        | 0/3000148
flush_lsn        | 0/3000148
replay_lsn       | 0/3000148
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 0
sync_state       | async
reply_time       | 2024-04-15 08:43:54.022361+00

It works!

Note that the databases are running in interactive mode containers, where you have to keep the terminal open for the container to keep running. This is mainly in order to easily see all the logs, which enables you to see at a glance what happens where.
That will become rather important in the second part of this series, that will be coming soon.

Ellert van Koperen, April 2024.