Re: Copy function for logical replication slots

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Copy function for logical replication slots
Дата
Msg-id CAD21AoCKK2wh269J40kL2mwmAf2MgLYfPC5cy71NSOGVRu2Xeg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Copy function for logical replication slots  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Copy function for logical replication slots  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
On Thu, Jul 5, 2018 at 4:47 PM, Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Jul 03, 2018 at 05:27:07PM +0900, Masahiko Sawada wrote:
>> On Tue, Jul 3, 2018 at 1:01 PM, Michael Paquier <michael@paquier.xyz> wrote:
>>> One property which seems important to me is to make sure that the target
>>> slot has the same data as the origin slot once the caller knows that the
>>> copy has completed, and not that the target slot may perhaps have the
>>> same data as the origin while creating the target.  Do you see the
>>> difference?  Your patch does the latter, because it creates the new slot
>>> after releasing the lock it held on the origin, while I would think that
>>> the former is more important, aka keep the lock for the duration of the
>>> copy.
>>
>> Did you mean "the caller" is clients who call
>> pg_copy_logical_replication_slot function? If so, I think it's very
>> difficult to ensure the former because the origin slot can advance
>> during returning the result to the client. Also if we keep the lock on
>> the origin slot during the copy I think that one problem is that we
>> will own two replication slots at a time. But there are a lot of code
>> that premise a process holds at most one replication slot.
>
> When I mean the caller, I mean the client in charge of the slot
> creation.  Anyway, this bit is really worrying me in your patch:
> -   ReplicationSlotReserveWal();
> +   /* Find start location to read WAL if not specified */
> +   if (XLogRecPtrIsInvalid(start_lsn))
> +       ReplicationSlotReserveWal();
> +   else
> +   {
> +       SpinLockAcquire(&slot->mutex);
> +       slot->data.restart_lsn = start_lsn;
> +       SpinLockRelease(&slot->mutex);
> +   }
> Your patch simply increases the window mentioned here in slot.c because
> there is no interlock between the checkpointer and the process creating
> the slot.  Please see here:
> /*
>  * The replication slot mechanism is used to prevent removal of required
>  * WAL. As there is no interlock between this routine and checkpoints, WAL
>  * segments could concurrently be removed when a now stale return value of
>  * ReplicationSlotsComputeRequiredLSN() is used. In the unlikely case that
>  * this happens we'll just retry.
>  */
> So, slots created without a non-arbitrary start position have at least
> the will to restart if a checkpointer is running in parallel, but your
> patch increases the window used, and does not even retry nor does it
> fail to create the slot if the restart LSN points to a segment not here
> anymore.  The trick I think here is that the slot copy should not cause
> checkpoint slowdowns, so more thoughts are needed here, and this is not
> really trivial.

Yes, you're right. To guarantee that restart LSN of copied slot is
available, it seems to me that it's better to copy new slot while
holding the origin slot as you mentioned before. Since the replication
slot creation code assumes that a process creating a new slot doesn't
have any slots we should save origin slot temporary and create new
one, and then restore it. It might be a bit tricky but would work
fine.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Unexpected behavior of DROP VIEW/TABLE IF EXISTS
Следующее
От: Srinivas Karthik V
Дата:
Сообщение: Re: Bulk Insert into PostgreSQL