Expose synchronous replication status through API
See original GitHub issueHi,
I started a cluster for testing, made of 3 nodes and I think that the current setup does not allow me to scale reads and backups.
My desired scenario:
- Master - writes and reads
- Synchronous slave - reads only for offloading of master
- Potential synchronous slave - backups only
The current haproxy config and patroni API allows for this:
backend be_db_rw
option httpchk GET /master
http-check expect status 200
...
backend be_db_ro
option httpchk GET /
http-check expect string replica
...
But this will result in having one master and 2 read only servers.
I was thinking of adding another key - value to the API response containing the result of:
postgres=# select client_addr, sync_state from pg_stat_replication; client_addr | sync_state -------------+------------ 10.0.0.1 | sync 10.0.0.2 | potential
This is visible only from the current master so I can’t simply add it to the API response. So I was thinking of adding the key client_addr and value sync_state in etcd and after expose them through API. There should be a match of IP so that the value is displayed on the correct server. This way the haproxy setup for sync and potential slaves will be strait forward.
Maybe I am missing a feature of patroni that allows one to find out which is the sync and potential. Or maybe there is a better way of doing this. Ideas?
Issue Analytics
- State:
- Created 7 years ago
- Comments:10 (5 by maintainers)
Top GitHub Comments
Yes, I think the idea about filtering out the GET points based on whether the replica is synchronous or not should be implemented.
Hi. You are right, currently there is no way to tell which replica is synchronous and which is potential. I even knew better way to identify replicas (without relying on client_addr). We just need to set application_name in a
primary_conninfo
parameter. But… Will it really help you? First of all, you are getting information about replica status asynchronously. At the moment when you got this info it could happen that insync
replica already was changed. Postgres changes in sync replicas really easy and fast. For example I was running a test cluster with 2 replicas, postgresql1 and postgresql2. postgresql1 was in sync replica. I stopped it and a few moments later started it up.As it was expected master almost immediately changed in sync replica to the postgresql2. But after postgresql1 became available it switched back to it.
Well, lets assume that in sync replica is not changing very often and you are reading from the “right” one. But what postgres documentation tells about synchronous replication? http://www.postgresql.org/docs/9.5/static/warm-standby.html#SYNCHRONOUS-REPLICATION
Synchronous replication follows different purpose, it increases data loss protection, but gives not warranty that you will read exactly the same data as from master.
Unfortunately there is no easy way to scale read workload. Your application should aware of the fact that it could get some old data from the replica (even from in sync replica).
Anyway I like your idea of exposing such information. It could be very useful for example for monitoring.
P.S., there is a better way to identify replicas:
We could even think about excluding some replicas from load balancing if they are more then N bytes behind the master (N could be passed as a request parameter). But one should keep in mind that master updates its position in etcd once in a while (once per 10 seconds by default).