On 06/02/2021 06:07, Amit Kapila wrote:
> On Sat, Feb 6, 2021 at 6:22 AM Peter Smith <smithpb2250@gmail.com> wrote:
>> On Sat, Feb 6, 2021 at 2:10 AM Petr Jelinek
>> <petr.jelinek@enterprisedb.com> wrote:
>>>> +ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN])
>>>> +{
>>>> + if (syncslotname)
>>>> + sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
>>>> + else
>>>> + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
>>>> +
>>>> + return syncslotname;
>>>> +}
>>> Given that we are now explicitly dropping slots, what happens here if we
>>> have 2 different downstreams that happen to get same suboid and reloid,
>>> will one of the drop the slot of the other one? Previously with the
>>> cleanup being left to temp slot we'd at maximum got error when creating
>>> it but with the new logic in LogicalRepSyncTableStart it feels like we
>>> could get into situation where 2 downstreams are fighting over slot no?
>>>
> I think so. See, if the alternative suggested below works or if you
> have any other suggestions for the same?
>
>> The PG docs [1] says "there is only one copy of pg_subscription per
>> cluster, not one per database". IIUC that means it is not possible for
>> 2 different subscriptions to have the same suboid.
>>
> I think he is talking about two different clusters having separate
> subscriptions but point to the same publisher. In different clusters,
> we can get the same subid/relid. I think we need a cluster-wide unique
> identifier to distinguish among different subscribers. How about using
> the system_identifier stored in the control file (we can use
> GetSystemIdentifier to retrieve it). I think one concern could be
> that adding that to slot name could exceed the max length of slot
> (NAMEDATALEN -1) but I don't think that is the case here
> (pg_%u_sync_%u_UINT64_FORMAT (3 + 10 + 6 + 10 + 20 + '\0')). Note last
> is system_identifier in this scheme.
Yep that's what I mean and system_identifier seems like a good choice to me.
> Do you guys think that works or let me know if you have any other
> better idea? Petr, is there a reason why such an identifier is not
> considered originally, is there any risk in it?
Originally it was not considered likely because it's all based on
pglogical/BDR work where ids are hashes of stuff that's unique across
group of instances, not counter based like Oids in PostgreSQL and I
simply didn't realize it could be a problem until reading this patch :)
--
Petr Jelinek