Re: Column Filtering in Logical Replication
От | Tomas Vondra |
---|---|
Тема | Re: Column Filtering in Logical Replication |
Дата | |
Msg-id | bd11879e-b5a7-1dce-78d8-2649779d7554@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Column Filtering in Logical Replication (Amit Kapila <amit.kapila16@gmail.com>) |
Список | pgsql-hackers |
On 3/29/22 12:00, Amit Kapila wrote: > On Sun, Mar 20, 2022 at 4:53 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> On 3/20/22 07:23, Amit Kapila wrote: >>> On Sun, Mar 20, 2022 at 8:41 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >>>> >>>> On Fri, Mar 18, 2022 at 10:42 PM Tomas Vondra >>>> <tomas.vondra@enterprisedb.com> wrote: >>>> >>>>> So the question is why those two sync workers never complete - I guess >>>>> there's some sort of lock wait (deadlock?) or infinite loop. >>>>> >>>> >>>> It would be a bit tricky to reproduce this even if the above theory is >>>> correct but I'll try it today or tomorrow. >>>> >>> >>> I am able to reproduce it with the help of a debugger. Firstly, I have >>> added the LOG message and some While (true) loops to debug sync and >>> apply workers. Test setup >>> >>> Node-1: >>> create table t1(c1); >>> create table t2(c1); >>> insert into t1 values(1); >>> create publication pub1 for table t1; >>> create publication pu2; >>> >>> Node-2: >>> change max_sync_workers_per_subscription to 1 in potgresql.conf >>> create table t1(c1); >>> create table t2(c1); >>> create subscription sub1 connection 'dbname = postgres' publication pub1; >>> >>> Till this point, just allow debuggers in both workers just continue. >>> >>> Node-1: >>> alter publication pub1 add table t2; >>> insert into t1 values(2); >>> >>> Here, we have to debug the apply worker such that when it tries to >>> apply the insert, stop the debugger in function apply_handle_insert() >>> after doing begin_replication_step(). >>> >>> Node-2: >>> alter subscription sub1 set pub1, pub2; >>> >>> Now, continue the debugger of apply worker, it should first start the >>> sync worker and then exit because of parameter change. All of these >>> debugging steps are to just ensure the point that it should first >>> start the sync worker and then exit. After this point, table sync >>> worker never finishes and log is filled with messages: "reached >>> max_sync_workers_per_subscription limit" (a newly added message by me >>> in the attached debug patch). >>> >>> Now, it is not completely clear to me how exactly '013_partition.pl' >>> leads to this situation but there is a possibility based on the LOGs >>> it shows. >>> >> >> Thanks, I'll take a look later. >> > > This is still failing [1][2]. > > [1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2022-03-28%2005%3A16%3A53 > [2] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2022-03-24%2013%3A13%3A08 > AFAICS we've concluded this is a pre-existing issue, not something introduced by a recently committed patch, and I don't think there's any proposal how to fix that. So I've put that on the back burner until after the current CF. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: