Up next on the list of changes coming soon to Crunchy Postgres via Automation (CPA), Patroni 3.x! If you want to know what else is ‘coming soon’, you can catch up here.
We’ve been stuck on Patroni 2.x for a long time for a lot of reasons that I’m not going to get into, but we’ve finally gotten to a point where we can make the leap to 3.x, and I have a draft pull request against development
to get things merged!
As one would expect, Patroni is at the core of our highly-available setup and we don’t make changes to it lightly. Thankfully we’re able to take the plunge now and move things to the latest 3.x version; this will allos us to take advantage of some of the new features and rethink some of the ways we do things.
The first new feature we’re taking advantage of is the new DCS failsafe mode. We’ve had a few clients ask for this feature specifically, and we’re eager to finally deliver it. If you’re not familiar, when this feature is disabled, if Patroni loses contact with the DCS, it will demote the current Postgres leader to become a follower. Patroni doesn’t know if there’s been a network split or some other isolation event. To prevent split-brain, every member demotes their nodes to follower status and awaits the return of the DCS. With this feature enabled, Patroni will notice the DCS absence, but if it can still communicate with the other Patroni nodes in the cluster, then it leaves the leader status alone and effectively ‘carries on’ as normal. Thus, your Postgres remains available during a DCS outage. It’s a nice little win for customers that have less than rock solid networking.
The next new feature that we’re going to make use of is failover priority. In a perfect world, all nodes in a given Patroni cluster would be equal. Same specs, same performance, same network characteristics, etc. In the real world, it’s often true that some nodes are, shall we say, more equal than others and thus if a failover were to occur, you’d prefer Patroni choose one of these nodes instead of the others. This feature allows for this very behavior; you can tag each node with a priority that Patroni considers when deciding the ‘best’ failover candidate.
Another new feature is that Patroni learned to validate its configuration, so we now create a Systemd drop-in that calls --validate-config
at service start (via ExecStartPre
). This ensures that our configuration is properly valid and that we’re not accidentally running a configuration that ‘appears to work’ but is incorrect. It’s a minor thing, to be sure, but it adds to the QoL for our end users.
Patroni 3.x also introduces a new ‘check timeline’ feature. I’ll let their documentation explain it:
By default, when running leader elections, Patroni does not take into account
the current timeline of replicas, what in some cases could be undesirable
behavior. You can prevent the node not having the same timeline as a former
leader become the new leader by changing the value of check_timeline
parameter to true.
In a future release, after more testing and betting understanding how it behaves, we might enable this by default. For this initial release, we’ll simply be wiring this up so that user’s can enable it if they desire. There’s a reason it is not enabled upstream, so we want to tread with care.
Finally, we’re going to start enabling the watchdog functionality of Patroni. This isn’t a Patroni 3.x feature, but it’s something we’ve previously not taken advantage of (even though we did have support for it wired in). With our new release, we’re going to start loading the softdog
kernel module, setting up /dev/watchdog
for Patroni’s use, and enabling the watchdog configuration in Patroni. Users will be able to disable this, of course, but we have enough ‘in the field’ experience now to justify enabling this.
As with any release, we’ll also be taking the opportunity to clean up cruft around our Patroni role and our Patroni configuration. There are currently a lot of configuration options wired up that we never use and customers never ask about, so we’ll be pruning these unused items. This should be a great release and a nice step forward for the product, and I’m excited to deliver CPA 2.3.0 later this year!
:wq