Re: Unstable tests for recovery conflict handling
От | Andres Freund |
---|---|
Тема | Re: Unstable tests for recovery conflict handling |
Дата | |
Msg-id | 20220726181611.4xw3blxigqzsz4d4@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: Unstable tests for recovery conflict handling (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Hi, On 2022-07-26 13:57:53 -0400, Tom Lane wrote: > I happened to notice that while skink continues to fail off-and-on > in 031_recovery_conflict.pl, the symptoms have changed! What > we're getting now typically looks like [1]: > > [10:45:11.475](0.023s) ok 14 - startup deadlock: lock acquisition is waiting > Waiting for replication conn standby's replay_lsn to pass 0/33FB8B0 on primary > done > timed out waiting for match: (?^:User transaction caused buffer deadlock with recovery.) at t/031_recovery_conflict.plline 367. > > where absolutely nothing happens in the standby log, until we time out: > > 2022-07-24 10:45:11.452 UTC [1468367][client backend][2/4:0] LOG: statement: SELECT * FROM test_recovery_conflict_table2; > 2022-07-24 10:45:11.472 UTC [1468547][client backend][3/2:0] LOG: statement: SELECT 'waiting' FROM pg_locks WHERE locktype= 'relation' AND NOT granted; > 2022-07-24 10:48:15.860 UTC [1468362][walreceiver][:0] FATAL: could not receive data from WAL stream: server closed theconnection unexpectedly > > So this is not a case of RecoveryConflictInterrupt doing the wrong thing: > the startup process hasn't detected the buffer conflict in the first > place. I wonder if this, at least partially, could be be due to the elog thing I was complaining about nearby. I.e. we decide to FATAL as part of a recovery conflict interrupt, and then during that ERROR out as part of another recovery conflict interrupt (because nothing holds interrupts as part of FATAL). Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: