Re: long-standing data loss bug in initial sync of logical replication
От | vignesh C |
---|---|
Тема | Re: long-standing data loss bug in initial sync of logical replication |
Дата | |
Msg-id | CALDaNm14MFRQ9vkTXY1QTJdWHmHhXZPFj3Zv3PcqF9CdJukbUQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: long-standing data loss bug in initial sync of logical replication (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: long-standing data loss bug in initial sync of logical replication
|
Список | pgsql-hackers |
On Tue, 9 Jul 2024 at 17:05, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jul 1, 2024 at 10:51 AM vignesh C <vignesh21@gmail.com> wrote: > > > > > > This issue is present in all supported versions. I was able to > > reproduce it using the steps recommended by Andres and Tomas's > > scripts. I also conducted a small test through TAP tests to verify the > > problem. Attached is the alternate_lock_HEAD.patch, which includes the > > lock modification(Tomas's change) and the TAP test. > > > > @@ -1568,7 +1568,7 @@ OpenTableList(List *tables) > /* Allow query cancel in case this takes a long time */ > CHECK_FOR_INTERRUPTS(); > > - rel = table_openrv(t->relation, ShareUpdateExclusiveLock); > + rel = table_openrv(t->relation, ShareRowExclusiveLock); > > The comment just above this code ("Open, share-lock, and check all the > explicitly-specified relations") needs modification. It would be > better to explain the reason of why we would need SRE lock here. Updated comments for the same. > > To reproduce the issue in the HEAD version, we cannot use the same > > test as in the alternate_lock_HEAD patch because the behavior changes > > slightly after the fix to wait for the lock until the open transaction > > completes. > > > > But won't the test that reproduces the problem in HEAD be successful > after the code change? If so, can't we use the same test instead of > slight modification to verify the lock mode? Before the patch fix, the ALTER PUBLICATION command would succeed immediately. Now, the ALTER PUBLICATION command waits until it acquires the ShareRowExclusiveLock. This change means that in test cases, previously we waited until the table was added to the publication, whereas now, after applying the patch, we wait until the ALTER PUBLICATION command is actively waiting for the ShareRowExclusiveLock. This waiting step ensures consistent execution and sequencing of tests each time. > > The attached issue_reproduce_testcase_head.patch can be > > used to reproduce the issue through TAP test in HEAD. > > The changes made in the HEAD version do not directly apply to older > > branches. For PG14, PG13, and PG12 branches, you can use the > > alternate_lock_PG14.patch. > > > > Why didn't you include the test in the back branches? If it is due to > background psql stuff, then won't commit > (https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=187b8991f70fc3d2a13dc709edd408a8df0be055) > can address it? Indeed, I initially believed it wasn't available. Currently, I haven't incorporated the back branch patch, but I plan to include it in a subsequent version once there are no review comments on the HEAD patch. The updated v2 version patch has the fix for the comments. Regards, Vignesh
Вложения
В списке pgsql-hackers по дате отправления: