Re: [HACKERS] make async slave to wait for lsn to be replayed

Поиск
Список
Период
Сортировка
От Kartyshov Ivan
Тема Re: [HACKERS] make async slave to wait for lsn to be replayed
Дата
Msg-id 8b67e6514c9ac3562ed675393208506a@postgrespro.ru
обсуждение исходный текст
Ответ на Re: [HACKERS] make async slave to wait for lsn to be replayed  (Kartyshov Ivan <i.kartyshov@postgrespro.ru>)
Список pgsql-hackers
Thank you for your feedback.

On 2024-03-20 12:11, Alexander Korotkov wrote:
> On Wed, Mar 20, 2024 at 12:34 AM Kartyshov Ivan
> <i.kartyshov@postgrespro.ru> wrote:
>> > 4.2 With an unreasonably high future LSN, BEGIN command waits
>> > unboundedly, shouldn't we check if the specified LSN is more than
>> > pg_last_wal_receive_lsn() error out?
> 
> I think limiting wait lsn by current received lsn would destroy the
> whole value of this feature.  The value is to wait till given LSN is
> replayed, whether it's already received or not.

Ok sounds reasonable, I`ll rollback the changes.

> But I don't see a problem here. On the replica, it's out of our
> control to check which lsn is good and which is not.  We can't check
> whether the lsn, which is in future for the replica, is already issued
> by primary.
> 
> For the case of wrong lsn, which could cause potentially infinite
> wait, there is the timeout and the manual query cancel.

Fully agree with this take.

>> > 4.3 With an unreasonably high wait time, BEGIN command waits
>> > unboundedly, shouldn't we restrict the wait time to some max
> value,
>> > say a day or so?
>> > SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
>> > BEGIN AFTER :'future_receive_lsn' WITHIN 100000;
>> 
>> Good idea, I put it 1 day. But this limit we should to discuss.
> 
> Do you think that specifying timeout in milliseconds is suitable?  I
> would prefer to switch to seconds (with ability to specify fraction of
> second).  This was expressed before by Alexander Lakhin.

It sounds like an interesting idea. Please review the result.

>> > https://github.com/macdice/redo-bench or similar tools?
> 
> Ivan, could you do this?

Yes, test redo-bench/crash-recovery.sh
This patch on master
91.327, 1.973
105.907, 3.338
98.412, 4.579
95.818, 4.19

REL_13-STABLE
116.645, 3.005
113.212, 2.568
117.644, 3.183
111.411, 2.782

master
124.712, 2.047
117.012, 1.736
116.328, 2.035
115.662, 1.797

Strange behavior, patched version is faster then REL_13-STABLE and 
master.

> I don't see this change in the patch.  Normally if a process gets a
> signal, that causes WaitLatch() to exit immediately.  It also exists
> immediately on query cancel.  IIRC, this 1 minute timeout is needed to
> handle some extreme cases when an interrupt is missing.  Other places
> have it equal to 1 minute. I don't see why we should have it
> different.

Ok, I`ll rollback my changes.

>> 4) added and expanded sections in the documentation
> 
> I don't see this in the patch.  I see only a short description in
> func.sgml, which is definitely not sufficient.  We need at least
> everything we have in the docs before to be adjusted with the current
> approach of procedure.

I didn't find another section where to add the description of 
pg_wait_lsn().
So I extend description on the bottom of the table.

>> 5) add default variant of timeout
>> pg_wait_lsn(trg_lsn pg_lsn, delay int8 DEFAULT 0)
>> example: pg_wait_lsn('0/31B1B60') equal pg_wait_lsn('0/31B1B60', 0)
> 
> Does zero here mean no timeout?  I think this should be documented.
> Also, I would prefer to see the timeout by default.  Probably one
> minute would be good for default.

Lets discuss this point. Loop in function WaitForLSN is made that way,
if we choose delay=0, only then we can wait infinitely to wait LSN
without timeout. So default must be 0.

Please take one more look on the patch.

-- 
Ivan Kartyshov
Postgres Professional: www.postgrespro.com
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: documentation structure
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Cannot find a working 64-bit integer type on Illumos