Re: BUG #15346: Replica fails to start after the crash
От | Dmitry Dolgov |
---|---|
Тема | Re: BUG #15346: Replica fails to start after the crash |
Дата | |
Msg-id | CA+q6zcVjv1Lp-3=prBbpq2CbBioK91SHarfw3F8FHuUN4EwcUA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #15346: Replica fails to start after the crash (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Ответы |
Re: BUG #15346: Replica fails to start after the crash
|
Список | pgsql-bugs |
> On Wed, 22 Aug 2018 at 17:08, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2018-Aug-22, Alexander Kukushkin wrote: > > > 2018-08-22 16:44 GMT+02:00 Alvaro Herrera <alvherre@2ndquadrant.com>: > > > > > > > > Sounds likely. I suggest to have a look at what's going on inside the > > > postmaster process when it gets stuck. > > > > Well, it doesn't get stuck, it aborts start with the message: > > 2018-08-22 14:26:42.073 UTC,,,28485,,5b7d7282.6f45,23,,2018-08-22 > > 14:26:10 UTC,1/0,0,WARNING,01000,"page 179503104 of relation > > base/18055/212875 does not exist",,,,,"xlog redo at AB3/50323E78 for > > Btree/DELETE: 182 items",,,,"" > > 2018-08-22 14:26:42.073 UTC,,,28485,,5b7d7282.6f45,24,,2018-08-22 > > 14:26:10 UTC,1/0,0,PANIC,XX000,"WAL contains references to invalid > > pages",,,,,"xlog redo at AB3/50323E78 for Btree/DELETE: 182 > > items",,,,"" > > 2018-08-22 14:26:42.214 UTC,,,28483,,5b7d7282.6f43,3,,2018-08-22 > > 14:26:10 UTC,,0,LOG,00000,"startup process (PID 28485) was terminated > > by signal 6: Aborted",,,,,,,,,"" > > Oh, that's weird ... sounds like the fact that the bgworker starts > somehow manages to corrupt the list of invalid pages in the startup > process. That doesn't make any sense ... We can see that the crash itself happened because in XLogReadBufferExtended at `if (PageIsNew(page))` (xlogutils.c:512) we've got a page that apparently wasn't initialized yet, and, since we've reached a consistent state, log_invalid_page panics. > ENOTIME for a closer look ATM, though, sorry. Maybe you could try > running under valgrind? Could you elaborate please, what can we find using valgrind in this case, some memory leaks? In any way there is a chance that everything will be ok, since even just a slow tracing under gdb leads to disappearing of this race condition.
В списке pgsql-bugs по дате отправления: