Re: postmaster recovery and automatic restart suppression
От | Fujii Masao |
---|---|
Тема | Re: postmaster recovery and automatic restart suppression |
Дата | |
Msg-id | 3f0b79eb0906170036j13f643afjf53c9b134453b3c0@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: postmaster recovery and automatic restart suppression ("Czichy, Thoralf (NSN - FI/Helsinki)" <thoralf.czichy@nsn.com>) |
Список | pgsql-hackers |
Hi, On Wed, Jun 17, 2009 at 12:22 AM, Czichy, Thoralf (NSN - FI/Helsinki)<thoralf.czichy@nsn.com> wrote: > [STONITH is not always best strategy if failures can be declared as > user-space software problem only, limit STONITH to HW/OS failures] > > The isolation of the failing Postgres instance does not require a > STONITH > - mainly as there's also other software running on the same node that > we'd > not want to automatically switchover (e.g. because it takes longer to do > or > the functionality is more critical or less critical). Also we generally > trust > the HW, OS kernel and cluster middleware to behave correctly . These > functions > also follow the principle of fail-fast-and-safe. This trust might be an > assumption that not everybody agrees with, though. So, if the failure > originated > from HW/OS/Clusterware it clearly is a STONITH situation, but if it's a > user-space problem - the default assumption is that isolation can be > implemented on > OS-level and that's a guarantee that the clusterware gives (using a > separate > Quorum mechanism to avoid split-brain situations). HW-level STONITH seems to be too much for your case. How about making your HA-middleware shut the dying postgres down before doing switchover by using (for example) "pg_ctl -mi stop"? In this case, other softwares can still keep on running on the original node after switchover. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
В списке pgsql-hackers по дате отправления: