Re: pgsql: Allow using the updated tuple while moving it to a different par
От | Tom Lane |
---|---|
Тема | Re: pgsql: Allow using the updated tuple while moving it to a different par |
Дата | |
Msg-id | 11887.1531432634@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: pgsql: Allow using the updated tuple while moving it to adifferent par (Amit Kapila <amit.kapila16@gmail.com>) |
Список | pgsql-committers |
Amit Kapila <amit.kapila16@gmail.com> writes: > On Thu, Jul 12, 2018 at 1:29 PM, Michael Paquier <michael@paquier.xyz> wrote: >> Looks unrelated, but flaviventris (a snake?) has just complained: >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2018-07-12%2007%3A36%3A01 > Yeah, I am confused as to how this commit can lead to that failure. It didn't; looks like just a random failure to me. The test is timing out: ok 26 - catalog xmin of cascaded slot still null with hs_feedback reset # re-enabling hot_standby_feedback and disabling while stopped ### Reloading node "standby_2" # Running: pg_ctl -D /home/bf/build/buildfarm-flaviventris/HEAD/pgsql.build/src/test/recovery/tmp_check/t_001_stream_rep_standby_2_data/pgdata reload server signaled Waiting for replication conn standby_1's replay_lsn to pass '0/30A1720' on master done Waiting for replication conn standby_2's replay_lsn to pass '0/30A1720' on standby_1 done ### Stopping node "standby_2" using mode fast # Running: pg_ctl -D /home/bf/build/buildfarm-flaviventris/HEAD/pgsql.build/src/test/recovery/tmp_check/t_001_stream_rep_standby_2_data/pgdata -mfast stop waiting for server to shut down.... done server stopped # No postmaster PID for node "standby_2" # Timed out waiting for slot xmins to advance at t/001_stream_rep.pl line 157. This looks to me like it's probably a low-probability timing problem in the test script itself. The relevant part of the script is $node_master->safe_psql('postgres', qq[INSERT INTO tab_int VALUES (11000);]); replay_check(); $node_standby_2->safe_psql('postgres', 'ALTER SYSTEM SET hot_standby_feedback = off;'); $node_standby_2->stop; ($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2, "xmin IS NOT NULL"); isnt($xmin, '', 'xmin of cascaded slot non-null with postgres shut down'); where the failure is occurring because get_slot_xmins() never sees the expected state. I'm suspicious that this is happening because the script is making no attempt to ensure the relative timing of events on the three servers --- notably, issuing a "reload" does nothing to guarantee that the server has actually seen and responded to the parameter change. As an extreme case, I can reproduce the failure exactly with this: diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl index a0d3e8f..0824184 100644 --- a/src/test/recovery/t/001_stream_rep.pl +++ b/src/test/recovery/t/001_stream_rep.pl @@ -286,7 +286,7 @@ is($catalog_xmin, '', note "re-enabling hot_standby_feedback and disabling while stopped"; $node_standby_2->safe_psql('postgres', 'ALTER SYSTEM SET hot_standby_feedback = on;'); -$node_standby_2->reload; +# $node_standby_2->reload; $node_master->safe_psql('postgres', qq[INSERT INTO tab_int VALUES (11000);]); replay_check(); so I'm thinking that what we're seeing is an effect of the second standby sometimes being too slow to turn on hot_standby_feedback. We could maybe hack around that with a short "sleep" in this script, but a less cruddy solution would be to change PostgresNode::reload so that it somehow waits for the server to process the reload signal. regards, tom lane
В списке pgsql-committers по дате отправления: