Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

Поиск

Список

Период

Сортировка

От	Petr Jelinek
Тема	Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Дата	14 октября 2020 г. 05:04:24
Msg-id	4a37c0e0-88a5-5d09-19c6-390b8412d3e6@2ndquadrant.com обсуждение исходный текст
Ответ на	BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop (PG Bug reporting form <noreply@postgresql.org>)
Список	pgsql-bugs

Дерево обсуждения

Hi,

On 14/10/2020 03:12, Alvaro Herrera wrote:
> On 2020-Oct-12, Petr Jelinek wrote:
> 
>>> However, and this is one reason why I'd welcome Petr/Peter thoughts on
>>> this, I don't really understand what happens in LogicalRepApplyLoop
>>> afterwards with a tablesync worker; are we actually doing anything
>>> useful there, considering that the actual data copy seems to have
>>> occurred in the CopyFrom() call in copy_table?  In other words, by the
>>> time we return control to ApplyWorkerMain with a slot name, isn't the
>>> work all done, and the only thing we need is to synchronize protocol and
>>> close the connection?
>>
>> There are 2 possible states at that point, either tablesync is ahead (when
>> main apply lags or nothing is happening on publication side) or it's behind
>> the main apply. When tablesync is ahead we are indeed done and just need to
>> update the state of the table (which is what the code you removed did, but
>> LogicalRepApplyLoop should do it as well, just a bit later). When it's
>> behind we need to do catchup for that table only which still happens in the
>> tablesync worker. See the explanation at the beginning of tablesync.c, it
>> probably needs some small adjustments after the changes in your first patch.
> 
> ... Ooh, things start to make some sense now.  So how about the
> attached?  There are some not really related cleanups.  (Changes to
> protocol.sgml are still pending.)
> 

It would be nice if the new sentences at the beginning of tablesync.c 
started with uppercase, but that's about as nitpicky as I can be :)

> If I understand correcly, the early exit in tablesync.c is not saving *a
> lot* of time (we don't actually skip replaying any WAL), even if it's
> saving execution of a bunch of code.  So I stand by my position that
> removing the code is better because it's clearer about what is actually
> happening.
> 

I don't really have any problems with the simplification you propose. 
The saved time is probably in order of hundreds of ms which for table 
sync is insignificant.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop