Re: Stabilizing the test_decoding checks, take N

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: Stabilizing the test_decoding checks, take N
Дата
Msg-id CAFiTN-vFrtusp4hYC+e06EAhK3OxZrr48d9VxXMmJkQi2ZFeNA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Stabilizing the test_decoding checks, take N  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Stabilizing the test_decoding checks, take N  (Dilip Kumar <dilipbalaut@gmail.com>)
Список pgsql-hackers
On Mon, Apr 18, 2022 at 11:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Apr 16, 2022 at 10:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> My pet dinosaur prairiedog just failed in the contrib/test_decoding
> tests [1]:
>
> diff -U3 /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out
> --- /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out 2022-04-15 07:59:17.000000000 -0400
> +++ /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out  2022-04-15 09:06:36.000000000 -0400
> @@ -77,10 +77,12 @@
>   streaming change for transaction
>   streaming change for transaction
>   streaming change for transaction
> + closing a streamed block for transaction
> + opening a streamed block for transaction
>   streaming change for transaction
>   closing a streamed block for transaction
>   committing streamed transaction
> -(13 rows)
> +(15 rows)
>
> Looking at the postmaster log, it's obvious where this extra transaction
> came from: auto-analyze ran on pg_type concurrently with the test step
> just before this one.  That could only happen if the tests ran long enough
> for autovacuum_naptime to elapse, but prairiedog is a pretty slow machine.
> (And I hasten to point out that some other animals, such as those running
> valgrind or CLOBBER_CACHE_ALWAYS, are even slower.)
>
> We've seen this sort of problem before [2], and attempted to fix it [3]
> by making these tests ignore empty transactions.  But of course
> auto-analyze's transaction wasn't empty, so that didn't help.
>

The possible reason here is that this extra (auto-analyze) transaction
causes the logical decoding work mem to reach before the last change
of the test's transaction. As can be seen from the logs, it just
closed the stream before the last change and then opened a new stream
for the last change. Now, it is true that the auto-analyze changes
won't be decoded as they don't perform DML operation on any
non-catalog table but it could generate some invalidation message
which needs to be processed even though we won't send anything related
to it to the downstream.

This analysis seems right to me.
 
This needs to be verified once by doing some manual testing as it may
not be easily reproducible every time. If this happens to be true then
I think your suggestion related to increasing autovacuum_naptime would
work.

I will try to reproduce this, maybe by reducing the autovacuum_naptime or parallelly running some script that continuously performs DDL-only transactions.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: vignesh C
Дата:
Сообщение: Re: Skipping schema changes in publication
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: Crash in new pgstats code