Re: BUG #17928: Standby fails to decode WAL on termination of primary

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: BUG #17928: Standby fails to decode WAL on termination of primary
Дата
Msg-id CA+hUKG+cXwfAk0dEbD5CZ76p1uADGywuvb6_N4Uhziek54FZHg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Noah Misch <noah@leadboat.com>)
Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
On Mon, Sep 25, 2023 at 12:58 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Mon, Sep 25, 2023 at 09:02:35AM +1300, Thomas Munro wrote:
> > I see there was a failure on 16 on the very slow AIX box, and I have
> > access so looking into that...
>
> Lucky you, if I may say ;)

FTR anyone involved with an open source project can get an account on
the GCC compile farm machines.  That particular machine is so
overloaded that it's practically unusable (~8 hours to run the test,
hard to run vi etc).

> A bunch of architectures that are not Intel are failing.  Here is a
> summary based on the buildfarm reports:
> topminnow, mips64el with gcc 4.9.2
> mereswine, ARMv7 with gcc 10.2.1
> sungazer, ppc64 with gcc 8.3.0
> frogfish, mips64el with gcc 4.6.3
> mamba, macppc with gcc 10.4.0
> gull, ARMv7 with clang 13.0.0
> grison, ARMv7 with gcc 4.6.3
> copperhead, riscv64 with gcc 10.X
>
> The only thing close to that I have close by is tanager on Armv7 (it
> has not reported to the buildfarm for a few weeks as it has
> overheated because of the summer here, but I've put it back online
> now).  However, it has passed a few hundred cycles with both gcc and
> clang yesterday, on top of having a clean buildfarm run.

One thing that the failing systems have in common is that they are
extremely slow.  3 to 8 hours to complete the tests.  turaco is an
armv7 system that doesn't fail, but it's much faster.  At a guess,
probably something like an armv8 CPU that is just running 32 bit armv7
software, not  a real old school armv7 chip.

Which gives me the idea to try these tests under qemu...

> With sungazer now failing on REL_16_STABLE, it feels to me that we are
> actually looking at two bugs?  One on HEAD, and one in stable
> branches?  For HEAD and the 2PC failure, the records up to PREPARE
> TRANSACTION should be replayed by the standby getting promoted, but
> I'd rather dig into that with a host that's able to report the
> failure.

Oh, right yeah that is quite different and could even be unrelated.



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #17928: Standby fails to decode WAL on termination of primary
Следующее
От: Noah Misch
Дата:
Сообщение: Re: BUG #17928: Standby fails to decode WAL on termination of primary