Обсуждение: Re: There is a defect in the ReplicationSlotCreate() function where it iterates through ReplicationSlotCtl->replication_slots[max_replication_slots] to find a slot but does not break out of the loop when a slot is found.

Поиск
Список
Период
Сортировка
> On 14 Jan 2025, at 05:24, lxiaogang5 <lxiaogang5@gmail.com> wrote:

> There is a logical defect in the function ReplicationSlotCreate (creating a new slot) when it internally traverses
theReplicationSlotCtl->replication_slots[max_replication_slots] array. When an available slot is found (slot = s), it
shouldbreak the current for loop. This issue still exists in the latest code, resulting in wasted CPU resources. 

We could exit early, but any system which max_replication_slots set high enough
that the savings are measurable in a non-hot codepath is likely having other
performance problems as well (max_replication_slots is by default 10).

If you feel strongly about I suggest creating a patch and submitting.  Also,
please don't send 6Mb images of text when simply copy/paste of the text would
have worked even better.

--
Daniel Gustafsson




Daniel Gustafsson <daniel@yesql.se> writes:
> On 14 Jan 2025, at 05:24, lxiaogang5 <lxiaogang5@gmail.com> wrote:
>> There is a logical defect in the function ReplicationSlotCreate (creating a new slot) when it internally traverses
theReplicationSlotCtl->replication_slots[max_replication_slots] array. When an available slot is found (slot = s), it
shouldbreak the current for loop. This issue still exists in the latest code, resulting in wasted CPU resources. 

> We could exit early, but any system which max_replication_slots set high enough
> that the savings are measurable in a non-hot codepath is likely having other
> performance problems as well (max_replication_slots is by default 10).

Are we reading the same code?

    for (i = 0; i < max_replication_slots; i++)
    {
        ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];

        if (s->in_use && strcmp(name, NameStr(s->data.name)) == 0)
            ereport(ERROR,
                    (errcode(ERRCODE_DUPLICATE_OBJECT),
                     errmsg("replication slot \"%s\" already exists", name)));
        if (!s->in_use && slot == NULL)
            slot = s;
    }

In the first place, breaking early would be wrong: we would fail to
scan all slots to check for duplicate name.  In the second place,
the code does choose the first not-in-use slot, because of the
check for "slot == NULL" before changing the value of "slot".

If we intended to support hundreds or thousands of replication
slots, it'd perhaps be worthwhile to replace this with a hashtable
lookup.  But that doesn't seem like a very credible use-case.
For typical numbers of slots this way is likely faster than
messing with a hashtable.

            regards, tom lane



> On 14 Jan 2025, at 15:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>> On 14 Jan 2025, at 05:24, lxiaogang5 <lxiaogang5@gmail.com> wrote:
>>> There is a logical defect in the function ReplicationSlotCreate (creating a new slot) when it internally traverses
theReplicationSlotCtl->replication_slots[max_replication_slots] array. When an available slot is found (slot = s), it
shouldbreak the current for loop. This issue still exists in the latest code, resulting in wasted CPU resources. 
>
>> We could exit early, but any system which max_replication_slots set high enough
>> that the savings are measurable in a non-hot codepath is likely having other
>> performance problems as well (max_replication_slots is by default 10).
>
> Are we reading the same code?
>
>    for (i = 0; i < max_replication_slots; i++)
>    {
>        ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
>
>        if (s->in_use && strcmp(name, NameStr(s->data.name)) == 0)
>            ereport(ERROR,
>                    (errcode(ERRCODE_DUPLICATE_OBJECT),
>                     errmsg("replication slot \"%s\" already exists", name)));
>        if (!s->in_use && slot == NULL)
>            slot = s;
>    }
>
> In the first place, breaking early would be wrong: we would fail to
> scan all slots to check for duplicate name.  In the second place,
> the code does choose the first not-in-use slot, because of the
> check for "slot == NULL" before changing the value of "slot".

My interpretation of the proposal was that once the first not-in-use slot
chosen and slot set to s, the for loop can be exited since no further slot
assignment will happen.  You are right though that we still need to inspect all
replication slots to check for duplicates.

--
Daniel Gustafsson