Re: Slave promotion problem...
От | marin@kset.org |
---|---|
Тема | Re: Slave promotion problem... |
Дата | |
Msg-id | 02c9f0b656add4fae12ec2453fdc6b84@kset.org обсуждение исходный текст |
Ответ на | Re: Slave promotion problem... (Martín Marqués <martin@2ndquadrant.com>) |
Ответы |
Re: Slave promotion problem...
|
Список | pgsql-general |
On 2015-08-31 14:38, Martín Marqués wrote: > El 31/08/15 a las 03:29, marin@kset.org escribió: >> Last week we had some problems on the master server which caused a >> failover on the slave (the master was completely unresponsive due to >> reasons still unknown). The slave received the promote signal (pg_ctl >> promote) and logged that in the logs: >> 2015-08-28 23:05:10 UTC [6]: [50-1] user=,db= LOG: received promote >> request >> 2015-08-28 23:05:10 UTC [467]: [2-1] user=,db= FATAL: terminating >> walreceiver process due to administrator command >> >> 5 hours later the slave still didn't promote. Meanwhile we fixed the >> master and restarted it. The slave was restarted and it behaved just >> like the promote signal didn't arrive, connecting to the master as a >> regular slave. > > Aren't there any further logs after the walreceiver termination? > Up to here everything looks fine, but we have no idea on what was > logged > afterwards. There are logs (quite a few, cca. 5 hours of it), every second something like this: 2015-08-28 23:05:12 UTC [79867]: [1-1] user=[unknown],db=[unknown] LOG: connection received: host=[local] 2015-08-28 23:05:12 UTC [79867]: [2-1] user=postgres,db=postgres LOG: connection authorized: user=postgres database=postgres This logs the connection of the process that probes the server is alive. I was expecting to see something like: redo done at xxxxx last completed transaction was at log time xxxxxxx But those lines didn't appear after 5 hours. As I understand, these are written before the server uses the restore_command to get WAL and history files from the archive. > >> I am unsure if this promotion failure is a bug/glitch, but the promote >> procedure is automated and tested a couple of hundred times so I am >> certain we initiated the promote correctly. > > Are you using homemade scripts? Maybe you need to test them more > thoroughly, with different environment parameters. We use a custom script for the restore_command, but is seems that it was not invoked. Regards, Mladen Marinović
В списке pgsql-general по дате отправления: