UPDATE: My coworker Richard liked this write up, and Skytools, so much he threw together a demo script. You can get it here.
I recently had to do a near-zero downtime upgrade from PostgreSQL 8.4.x to PostgreSQL 9.4.x for a custmer. I couldn’t use streaming replication because of the change in major version (and because it’s simply not present in 8.x), so that left me looking at logical replication options. Usually, everyone else would be thinking Slony right here. I’ve only messed with Slony a few times, but each time was a pita, and the whole thing just seemed overly complicated to me. So I decided to give Londiste a look.
Londiste is part of the Skytools suite, originally developed by Skype back when they were a ’no central node’ setup. As such, the thing was literally born to be “master-master” and assumes nodes come and go at will, so it’s got all the tools to handle bringing nodes up/down, marking them active/inactive, catching them up, etc. It’s written in Python, and uses plain text ini files for configuration.
There’s really only two hurdles that I found with using Londiste. First is that if you can’t get the rpms from the PGDG Yum Repo you’re looking at compiling from Git. And second, the online documentation for it is hard to find, hard to follow, and practically no one has used it so you can’t ask RandomPostgresPerson for help.
Which is exactly why I’m writing this blog post. Here’s what I needed to get me through the migration in question. I hope it helps you, should you consider using Londiste for your own replication needs. To whit:
> SELECT
n.nspname as schema,
c.relname as table
FROM
pg_class c
JOIN
pg_namespace n
ON
n.oid = c.relnamespace
WHERE
c.relkind = 'r'
AND NOT EXISTS (
SELECT
1
FROM
pg_constraint con
WHERE
con.conrelid = c.oid
AND
con.contype = 'p'
)
AND n.nspname <> ALL (
ARRAY [
'pg_catalog',
'sys',
'dbo',
'information_schema'
]
);
On the PostgreSQL 9.4.x server that will be receiving the replicated data, we need to ensure that all roles are pre-created. We want all ownerships and grants to be identical when we’re done, right? You can use pg_dumpall -g
on the PostgreSQL 8.4.x to get a listing of roles.
Again, like Slony, we should pre-build the schema on the PostgreSQL 9.4.x server. I think you can actually get Londiste to do this for you as part of the replication, but I couldn’t find anything online for sure, and I didn’t have time to add more experimentation here (we’re on the customer’s dime here, remember). So, use pg_dump
over the network and pipe it to pg_restore
to transfer the schema thusly:
$ pg_dump \
-Fc \
-C \
-S \
-h IPofOldServer \
-U postgres \
myapp | \
pg_restore \
-d template1 \
-v \
-C \
-e \
-s
$ yum -y install \
skytools-94 \
skytools-94-modules
$ yum -y install \
python-devel \
asciidoc \
xmlto \
postgresql-devel
$ git clone git://github.com/markokr/skytools.git
$ cd skytools
$ git submodule init
$ git submodule update
$ ./autogen.sh
$ ./configure
$ make
$ make install
Restart the PostgreSQL 8.4.x cluster to load the new libs and modules
Now we configure the Londiste ticker. Note, we have trust setup for the postgres user in pg_hba.conf
so there is no password= in the connection strings. Adjust to meet your setup:
$ mkdir -pv ~postgres/londiste-config/{log,pid}
$ cd ~postgres/londiste-config
$ cat << EOF > ticker.ini
[pgqd]
base_connstr = user=postgres host=IPofOldServer
database_list = myapp
logfile = log/ticker.log
pidfile = pid/ticker.pid
EOF
Start up the ticker, to provide the replication “heartbeat” by running pgqd -d ticker.ini
Check the ticker.log
to ensure there are no warnings or errors! You can stop the ticker with pgqd -s ticker.ini
while you fix things.
Now, we tell Londiste about the master node (same note applies about the lack of password in the connection string):
$ cd ~postgres/londiste-config
$ cat << EOF > master.ini
[londiste3]
db = user=postgres host=IPofOldServer dbname=myapp
queue_name = myappq
loop_delay = 0.5
logfile = log/master.log
pidfile = pid/master.pid
EOF
$ londiste3 \
master.ini \
create-root \
master 'user=postgres host=IPofOldServer dbname=myapp'
Check the master.log to see if you have a line like INFO Node "master" initialized for queue "myappq" with type "root"
Now, spin up the master’s replication worker process by running londiste3 -d master.ini worker
Next, we configure our slave node (same note applies about the lack of password in the connection string):
$ cd ~postgres/londiste-config
$ cat << EOF > slave.ini
[londiste3]
db = user=postgres host=127.0.0.1 dbname=myapp
queue_name = myappq
loop_delay = 0.5
logfile = log/slave.log
pidfile = pid/slave.pid
EOF
leaf
but I could have created it as a branch
if we we’re going to cascade replication:$ londiste3 \
slave.ini \
create-leaf slave 'user=postgres host=127.0.0.1 dbname=myapp' \
--provider='user=postgres host=IPofOldServer dbname=myapp'
Check the slave.log to see if you have the line INFO Node "slave" initialized for queue "myappq" with type "branch"
Spin up the slave’s replication worker process by running londiste3 -d slave.ini worker
Tell the master node that we want to replicate all the tables in the db (londiste3 master.ini add-table --all
) as well as all the sequences (londiste3 master.ini add-seq --all
). Note that this only adds the tables that currently exist. If you add new tables to the master db, you need to londiste3 master.ini add-table tablename
to add them to replication. Ditto for new sequences.
For the slave node, also replicate all the tables (londiste3 slave.ini add-table --all
) and all the sequences (londiste3 slave.ini add-seq --all
). Note that this only adds the tables that currently exist. If you add new tables to the master db, you need to londiste3 slave.ini add-table tablename
to add them to replication. Ditto for new sequences.
At this point, replication is actually up and running. Any changes occurring on the master node are being replicated to the slave node. That’s all you need to do.
But what about the data that was already in the master db? You don’t need to do anything. It’s already replicating. You can forcibly tell Londiste to ‘catch things up’ by doing londiste3 slave.ini resync --all
if you like though.
If you want to check on the replication at any point, simply issue londiste3 slave.ini status
or to be more pedantic londiste3 slave.ini compare
which will examine row counts and md5sums between master and slave.
Enjoy your new cross-version logical replication!