On Thu, Jan 20, 2022 at 6:24 PM Andres Freund <andres@anarazel.de> wrote:
> I wonder if the easiest way to make this test reliable would be to make the
> table a temporary one? That now uses very aggressive horizons, there's no
> bgwriter that could pin the page, etc.
Good idea, thanks. I pushed that minimal change.
Skipping over some other unrelated recoveryCheck failures showing in
the BF today[1], next up we have another kind of failure on these
Linux sparc animals:
2022-01-19 16:39:48.768 CET [9703:4] DETAIL: Last completed
transaction was at log time 2022-01-19 16:39:48.669624+01.
2022-01-19 16:39:54.629 CET [9703:5] LOG: restartpoint starting: wal
2022-01-19 16:39:55.180 CET [9705:5] LOG: incorrect resource manager
data checksum in record at 0/AD445A8
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tadarida&dt=2022-01-19%2015%3A44%3A59
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2022-01-19%2015%3A14%3A06
That would have kept me busy for a long time, if Noah hadn't recently
diagnosed an ext4-on-sparc bug where you sometimes read bogus zeroes
from a file that is being concurrently written[2], which of course
breaks streaming replication and thus this test. To turn that green, I
guess we'll need to switch to another filesystem, or wait for a
patched kernel.
[1] https://www.postgresql.org/message-id/CA%2BhUKGKV6fOHvfiPt8%3DdOKzvswjAyLoFoJF1iQXMNpi7%2BhD1JQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/20220116210241.GC756210%40rfd.leadboat.com