Re: BUG #15346: Replica fails to start after the crash
От | Michael Paquier |
---|---|
Тема | Re: BUG #15346: Replica fails to start after the crash |
Дата | |
Msg-id | 20180829121051.GC5903@paquier.xyz обсуждение исходный текст |
Ответ на | Re: BUG #15346: Replica fails to start after the crash (Alexander Kukushkin <cyberdemn@gmail.com>) |
Ответы |
Re: BUG #15346: Replica fails to start after the crash
|
Список | pgsql-bugs |
On Wed, Aug 29, 2018 at 08:59:16AM +0200, Alexander Kukushkin wrote: > Why the block 72478 of index relfile doesn't meet our expectations > (contains so few tuples)? > The answer to this question is in the page header. LSN, written in the > indexpage header is AB3/56BF3B68. > That has only one meaning, while the postgres was working before the > crash it managed to apply WAL stream til at least AB3/56BF3B68, what > is far ahead of "Minimum recovery ending location: AB3/4A1B3118". Yeah, that's the pinpoint. Do you know by chance what was the content of the control file for each standby you have upgraded to 9.6.10 before starting them with the new binaries? You mentioned a cluster of three nodes, so I guess that you have two standbys, and that one of them did not see the symptoms discussed here, while the other saw them. Do you still have the logs of the recovery just after starting the other standby with 9.4.10 which did not see the symptom? All your standbys are using the background worker which would cause the btree deletion code to be scanned, right? I am trying to work on a reproducer with a bgworker starting once recovery has been reached, without success yet. Does your cluster generate some XLOG_PARAMETER_CHANGE records? In some cases, 9.4.8 could have updated minRecoveryPoint to go backward, which is something that 8d68ee6 has been working on addressing. Did you also try to use local WAL segments up where AB3/56BF3B68 is applied, and also have a restore_command so as extra WAL segment fetches from the archive would happen? -- Michael
Вложения
В списке pgsql-bugs по дате отправления: