Обсуждение: Replication Cluster Monitoring
Hi All,
Please bear with me as I’m not a dba and I’m new to Postgres. I’m writing a Java application to monitor a streaming replication cluster (Windows). I want to monitor the Master and initiate failover if necessary (something like a scaled down version of pgpool). I also want to monitor the standby and terminate synchronous replication in the event of a failure. At this point, my app is polling the Master every N seconds and triggering a failover if the wait is too long or it receives a connection error. I’m worried that this method of assessing server health could lead to false-failovers. Any suggestions as to specific health checks I could run or issues I should watch out for? Thanks!
CONFIDENTIALITY : This e-mail and any attachments are confidential and may be privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose or store or copy the information in any medium.
Hi All,
Please bear with me as I’m not a dba and I’m new to Postgres. I’m writing a Java application to monitor a streaming replication cluster (Windows). I want to monitor the Master and initiate failover if necessary (something like a scaled down version of pgpool). I also want to monitor the standby and terminate synchronous replication in the event of a failure. At this point, my app is polling the Master every N seconds and triggering a failover if the wait is too long or it receives a connection error. I’m worried that this method of assessing server health could lead to false-failovers. Any suggestions as to specific health checks I could run or issues I should watch out for? Thanks!
CONFIDENTIALITY : This e-mail and any attachments are confidential and may be privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose or store or copy the information in any medium.
I think you should follow this way:
http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
-- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
This email has been checked for viruses by Avast antivirus software. |
HEMPLEMAN Matthew <matthew.hempleman@alstom.com> wrote: > I’m writing a Java application to monitor a streaming > replication cluster (Windows). I want to monitor the Master and > initiate failover if necessary (something like a scaled down > version of pgpool). I also want to monitor the standby and > terminate synchronous replication in the event of a failure. At > this point, my app is polling the Master every N seconds and > triggering a failover if the wait is too long or it receives a > connection error. I’m worried that this method of assessing > server health could lead to false-failovers. Any suggestions as > to specific health checks I could run or issues I should watch > out for? Such an approach has many race conditions that can cause problems. You may want to do web searches on the terms "split-brain syndrome", STONITH, fencing, and heartbeat (as they apply to computing). It is not trivial to get this right, and if it's not right it can easily cause more down time than it prevents. (That's not unique to PostgreSQL; it's the nature of automating fail-over.) Be sure to consider what happens for transient network failures on each machine and combination of machines, or if a machine temporarily has a load that causes it not to respond for seconds or minutes. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company