Обсуждение: Instability of pg_walsummary/002_blocks.pl due to timing

Поиск
Список
Период
Сортировка

Instability of pg_walsummary/002_blocks.pl due to timing

От
Alexander Lakhin
Дата:
Hello hackers,

A couple of the 002.blocks test's failures occurred during past three
months: [1], [2] with the following diagnostics:
#   Failed test 'WAL summarizer generates statistics for WAL reads'
#   at /home/bf/bf-build/culicidae/REL_18_STABLE/pgsql/src/bin/pg_walsummary/t/002_blocks.pl line 54.
#          got: 'f'
#     expected: 't'
# Looks like you failed 1 test of 8.

pgsql.build/testrun/pg_walsummary/002_blocks/log/regress_log_002_blocks
[12:29:12.131](0.351s) ok 1 - WAL summarization caught up after insert
[12:29:12.196](0.065s) not ok 2 - WAL summarizer generates statistics for WAL reads
[12:29:12.198](0.002s) #   Failed test 'WAL summarizer generates statistics for WAL reads'
#   at /home/bf/bf-build/culicidae/REL_18_STABLE/pgsql/src/bin/pg_walsummary/t/002_blocks.pl line 54.
[12:29:12.198](0.000s) #          got: 'f'
#     expected: 't'
[12:29:12.267](0.069s) # after insert, summarized through 0/1821510
[12:29:12.507](0.240s) ok 3 - got new WAL summary after update

This test case is rather new, it was added by f4694e0f3 (from 2025-03-05).

I could reproduce this failure within 20 test runs with the following
modification:
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -1544,6 +1544,7 @@ summarizer_read_local_xlog_page(XLogReaderState *state,
                                  * so we don't tight-loop.
                                  */
                                 ProcessWalSummarizerInterrupts();
+pg_usleep(1000000);
                                 summarizer_wait_for_wal();

Michael, as you added the test case, could you please have a look?

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2025-04-09%2007%3A36%3A05
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2025-07-01%2010%3A23%3A38

Best regards,
Alexander



Re: Instability of pg_walsummary/002_blocks.pl due to timing

От
Alexander Lakhin
Дата:
Hello Michail,

07.07.2025 03:18, Michael Paquier wrote:
I'm failing to reproduce it, unfortunately.  It looks like just a
timing issue with the reports, so the best option I can think of here
would be to switch the test to do a wait until the stats have been
generated, leading to the attached.  Do you still see the problem with
that in place?

I'm sorry for not being accurate enough -- I forgot to mention that I
replicated the config from culicidae, and now I see that "fsync=on"
is needed to reproduce the failure for me (though maybe).

With:
./configure -q --enable-cassert --enable-debug --enable-tap-tests && make -s -j8
echo "fsync=on" >/tmp/temp.config
for i in {1..100}; do echo "ITERATION $i"; TEMP_CONFIG=/tmp/temp.config PROVE_TESTS="t/002*" make -s check -C src/bin/pg_walsummary/ || break; done
I got failures on iterations 3, 5, 1. With your patch applied, I got 100
iterations passed.

Thank you for the fix!

Best regards,
Alexander