Re: Timeout and Synch Rep
От | Fujii Masao |
---|---|
Тема | Re: Timeout and Synch Rep |
Дата | |
Msg-id | AANLkTikPjD6ji461ckcgfG859KuzNMeQ9TAauwPAeat3@mail.gmail.com обсуждение исходный текст |
Ответ на | Timeout and Synch Rep (Josh Berkus <josh@agliodbs.com>) |
Список | pgsql-hackers |
On Fri, Oct 8, 2010 at 4:50 AM, Josh Berkus <josh@agliodbs.com> wrote: > In my effort to make the discussion around the design decisions of synch > rep less opaque, I'm starting a separate thread about what has developed > to be one of the more contentious issues. > > I'm going to champion timeouts because I plan to use them. In fact, I > plan to deploy synch rep with a timeout if it's available within 2 weeks > of 9.1 being released. Without a timeout (i.e. "wait forever" is the > only mode), that project will probably never use synch rep. > > Let me give you my use-case so that you can understand why I want a timeout. > > Client is a telecommunications service provider. They have a primary > server and a failover server for data updates. They also have two async > slaves on older machines for reporting purposes. The failover > currently does NOT accept any queries in order to keep it as current as > possible. > > They would like the failover to be synchronous so that they can > guarentee no data loss in the event of a master failure. However, zero > data loss is less important to them than uptime ... they have a five9's > SLA with their clients, and the hardware on the master is very good. > > So, if something happens to the standby, and it cannot return an ack in > 30 seconds, they would like it to degrade to asynch mode. At that > point, they would also like to trigger a nagios alert which will wake up > the sysadmin with flashing red lights. Once he has resolved the > problem, he would like to promote the now-asynch standby back to synch > standby. > > Yes, this means that, in the event of a standby failure, they have a > window where any failure on the master will mean data loss. The user > regards this risk as acceptable, given that both the master and the > failover are located in the same data center in any case, so there is > always a risk of a sufficient disaster wiping out all data back to the > daily backup. This explains very well why some systems require the timeout. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
В списке pgsql-hackers по дате отправления: