RE: pg_get_publication_tables() output duplicate relid
От | houzj.fnst@fujitsu.com |
---|---|
Тема | RE: pg_get_publication_tables() output duplicate relid |
Дата | |
Msg-id | OS0PR01MB5716C551FC5464AE47BD2F8594759@OS0PR01MB5716.jpnprd01.prod.outlook.com обсуждение исходный текст |
Ответ на | Re: pg_get_publication_tables() output duplicate relid (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
RE: pg_get_publication_tables() output duplicate relid
|
Список | pgsql-hackers |
On Sat, Nov 20, 2021 7:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> > wrote: > > > > > > The problematic case is attaching the partition *after* the > > > subscriber has already marked the root parent as synced and/or ready > > > for replication. Refreshing the subscription doesn't help it > > > discover the newly attached partition, because a > > > publish_via_partition_root only ever tells about the root parent, > > > which would be already synced, so the subscriber would think there's > > > nothing to copy. > > > > > > > Okay, I see this could be a problem but I haven't tried to reproduce it. > > One more thing you mentioned is that the initial sync won't work after refresh > but later changes will be replicated but I noticed that later changes also don't > get streamed till we restart the subscriber server. > I am not sure but we might not be invalidating apply workers cache due to > which it didn't notice the same. I investigated this bug recently, and I think the reason is that when receiving relcache invalidation message, the callback function[1] in walsender only reset the schema sent status while it doesn't reset the replicate_valid flag. So, it won’t rebuild the publication actions of the relation. [1] static void rel_sync_cache_relation_cb(Datum arg, Oid relid) ... /* * Reset schema sent status as the relation definition may have changed. * Also free any objects that depended on the earlier definition. */ if (entry != NULL) { entry->schema_sent = false; list_free(entry->streamed_txns); ... Also, when you DETACH a partition, the publication won’t be rebuilt too because of the same reason. Which could cause unexpected behavior if we modify the detached table's data . And the bug happens regardless of whether pubviaroot is set or not. For the fix: I think if we also reset replicate_valid flag in rel_sync_cache_relation_cb, then the bug can be fixed. I have a bit hesitation about this approach, because it could increase the frequency of invalidating and rebuilding the publication action. But I haven't produced some other better approaches. Another possibility could be that we add a syscache callback function for pg_inherits table, but we currently don't have syscache for pg_inherits. We might need to add the cache pg_inherits first which doesn't seems better than the above approach. What do you think ? Attach an initial patch which reset the replicate_valid flag in rel_sync_cache_relation_cb and add some reproduction tests in 013_partition.pl. Best regards, Hou zj
Вложения
В списке pgsql-hackers по дате отправления: