Re: logical decoding and replication of sequences
От | Tomas Vondra |
---|---|
Тема | Re: logical decoding and replication of sequences |
Дата | |
Msg-id | e9896eb9-48c6-2b8e-9375-ec339affb980@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: logical decoding and replication of sequences (Noah Misch <noah@leadboat.com>) |
Ответы |
Re: logical decoding and replication of sequences
|
Список | pgsql-hackers |
On 8/7/22 02:36, Noah Misch wrote: > On Thu, Apr 07, 2022 at 08:34:50PM +0200, Tomas Vondra wrote: >> I've pushed a revert af all the commits related to this - decoding of >> sequences and test_decoding / built-in replication changes. > > Two July buildfarm runs failed with PANIC during standby promotion: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2022-07-19%2004%3A13%3A18 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2022-07-31%2011%3A33%3A13 > > The attached patch hacks things so an ordinary x86_64 GNU/Linux machine > reproduces this consistently. "git bisect" then traced the regression to the > above revert commit (2c7ea57e56ca5f668c32d4266e0a3e45b455bef5). The pg_ctl > test suite passes under this hack in all supported branches, and it passed on > v15 until that revert. Would you investigate? > > The buildfarm animal uses keep_error_builds. From kept data directories, I > deduced these events: > > - After the base backup, auto-analyze ran on the primary and wrote WAL. > - Standby streamed and wrote up to 0/301FFF. > - Standby received the promote signal. Terminated streaming. WAL page at 0/302000 remained all-zeros. > - Somehow, end-of-recovery became a PANIC. > I think it'd be really bizarre if this was due to the revert, as that simply undoes minor WAL changes (and none of this should affect what happens at WAL page boundary etc.). It just restores WAL as it was before 0da92dc, nothing particularly complicated. I did go through all of the changes again and I haven't spotted anything particularly suspicious, but I'll give it another try tomorrow. However, I did try bisecting this using the attached patch, and that does not suggest the issue is in the revert commit. It actually fails all the way back to 5dc0418fab2, and it starts working on 9553b4115f1. ... 6392f2a0968 Try to silence "-Wmissing-braces" complaints in ... => 5dc0418fab2 Prefetch data referenced by the WAL, take II. 9553b4115f1 Fix warning introduced in 5c279a6d350. ... This is merely 10 commits before the revert, and it seems way more related to WAL. Also, adding this to the two nodes in 003_standby.pl makes the issue go away, it seems: $node_standby->append_conf('postgresql.conf', qq(recovery_prefetch = off)); I'd bet it's about WAL prefetching, not the revert, and the bisect was a bit incorrect, because the commits are close and the failures happen to be rare. (Presumably you first did the bisect and then wrote the patch that reproduces this, right?) Adding Thomas Munro to the thread, he's the WAL prefetching expert ;-) regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: