Re: stats test intermittent failure
От | Alexander Lakhin |
---|---|
Тема | Re: stats test intermittent failure |
Дата | |
Msg-id | 8d526f29-c269-dc3d-38ac-10fa3a0a7fb3@gmail.com обсуждение исходный текст |
Ответ на | stats test intermittent failure (Melanie Plageman <melanieplageman@gmail.com>) |
Список | pgsql-hackers |
Hi Melanie,
10.07.2023 21:35, Melanie Plageman wrote:
10.07.2023 21:35, Melanie Plageman wrote:
Hi, Jeff pointed out that one of the pg_stat_io tests has failed a few times over the past months (here on morepork [1] and more recently here on francolin [2]). Failing test diff for those who prefer not to scroll: +++ /home/bf/bf-build/francolin/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/stats.out 2023-07-07 18:48:25.976313231 +0000 @@ -1415,7 +1415,7 @@ :io_sum_vac_strategy_after_reuses > :io_sum_vac_strategy_before_reuses; ?column? | ?column? ----------+---------- - t | t + t | f My theory about the test failure is that, when there is enough demand for shared buffers, the flapping test fails because it expects buffer access strategy *reuses* and concurrent queries already flushed those buffers before they could be reused. Attached is a patch which I think will fix the test while keeping some code coverage. If we count evictions and reuses together, those should have increased.
I managed to reproduce that failure with the attached patch applied
(on master) and with the following script (that effectively multiplies
probability of the failure by 360):
CPPFLAGS="-O0" ./configure -q --enable-debug --enable-cassert --enable-tap-tests && make -s -j`nproc` && make -s check -C src/test/recovery
mkdir -p src/test/recovery00/t
cp src/test/recovery/t/027_stream_regress.pl src/test/recovery00/t/
cp src/test/recovery/Makefile src/test/recovery00/
for ((i=1;i<=9;i++)); do cp -r src/test/recovery00/ src/test/recovery$i; done
for ((i=1;i<=10;i++)); do echo "iteration $i"; NO_TEMP_INSTALL=1 parallel --halt now,fail=1 -j9 --linebuffer --tag make -s check -C src/test/{} ::: recovery1 recovery2 recovery3 recovery4 recovery5 recovery6 recovery7 recovery8 recovery9 || break; done
Without your patch, I get:
iteration 2
...
recovery5 # Failed test 'regression tests pass'
recovery5 # at t/027_stream_regress.pl line 92.
recovery5 # got: '256'
recovery5 # expected: '0'
...
src/test/recovery5/tmp_check/log/regress_log_027_stream_regress contains:
--- .../src/test/regress/expected/stats.out 2023-07-11 20:05:10.536059706 +0300
+++ .../src/test/recovery5/tmp_check/results/stats.out 2023-07-11 20:30:46.790551305 +0300
@@ -1418,7 +1418,7 @@
:io_sum_vac_strategy_after_reuses > :io_sum_vac_strategy_before_reuses;
?column? | ?column?
----------+----------
- t | t
+ t | f
(1 row)
With your patch applied, 10 iterations performed successfully for me.
So it looks like your theory and your fix are correct.
Best regards,
Alexander
Вложения
В списке pgsql-hackers по дате отправления: