Обсуждение: Instability of pg_walsummary/002_blocks.pl due to timing
Hello hackers, A couple of the 002.blocks test's failures occurred during past three months: [1], [2] with the following diagnostics: # Failed test 'WAL summarizer generates statistics for WAL reads' # at /home/bf/bf-build/culicidae/REL_18_STABLE/pgsql/src/bin/pg_walsummary/t/002_blocks.pl line 54. # got: 'f' # expected: 't' # Looks like you failed 1 test of 8. pgsql.build/testrun/pg_walsummary/002_blocks/log/regress_log_002_blocks [12:29:12.131](0.351s) ok 1 - WAL summarization caught up after insert [12:29:12.196](0.065s) not ok 2 - WAL summarizer generates statistics for WAL reads [12:29:12.198](0.002s) # Failed test 'WAL summarizer generates statistics for WAL reads' # at /home/bf/bf-build/culicidae/REL_18_STABLE/pgsql/src/bin/pg_walsummary/t/002_blocks.pl line 54. [12:29:12.198](0.000s) # got: 'f' # expected: 't' [12:29:12.267](0.069s) # after insert, summarized through 0/1821510 [12:29:12.507](0.240s) ok 3 - got new WAL summary after update This test case is rather new, it was added by f4694e0f3 (from 2025-03-05). I could reproduce this failure within 20 test runs with the following modification: --- a/src/backend/postmaster/walsummarizer.c +++ b/src/backend/postmaster/walsummarizer.c @@ -1544,6 +1544,7 @@ summarizer_read_local_xlog_page(XLogReaderState *state, * so we don't tight-loop. */ ProcessWalSummarizerInterrupts(); +pg_usleep(1000000); summarizer_wait_for_wal(); Michael, as you added the test case, could you please have a look? [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2025-04-09%2007%3A36%3A05 [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2025-07-01%2010%3A23%3A38 Best regards, Alexander
Hello Michail,
07.07.2025 03:18, Michael Paquier wrote:
07.07.2025 03:18, Michael Paquier wrote:
I'm failing to reproduce it, unfortunately. It looks like just a timing issue with the reports, so the best option I can think of here would be to switch the test to do a wait until the stats have been generated, leading to the attached. Do you still see the problem with that in place?
I'm sorry for not being accurate enough -- I forgot to mention that I
replicated the config from culicidae, and now I see that "fsync=on"
is needed to reproduce the failure for me (though maybe).
With:
./configure -q --enable-cassert --enable-debug --enable-tap-tests && make -s -j8
echo "fsync=on" >/tmp/temp.config
for i in {1..100}; do echo "ITERATION $i"; TEMP_CONFIG=/tmp/temp.config PROVE_TESTS="t/002*" make -s check -C src/bin/pg_walsummary/ || break; done
I got failures on iterations 3, 5, 1. With your patch applied, I got 100
iterations passed.
Thank you for the fix!
Best regards,
Alexander