Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
От | Petr Jelinek |
---|---|
Тема | Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop |
Дата | |
Msg-id | 4a37c0e0-88a5-5d09-19c6-390b8412d3e6@2ndquadrant.com обсуждение исходный текст |
Ответ на | BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop (PG Bug reporting form <noreply@postgresql.org>) |
Список | pgsql-bugs |
Hi, On 14/10/2020 03:12, Alvaro Herrera wrote: > On 2020-Oct-12, Petr Jelinek wrote: > >>> However, and this is one reason why I'd welcome Petr/Peter thoughts on >>> this, I don't really understand what happens in LogicalRepApplyLoop >>> afterwards with a tablesync worker; are we actually doing anything >>> useful there, considering that the actual data copy seems to have >>> occurred in the CopyFrom() call in copy_table? In other words, by the >>> time we return control to ApplyWorkerMain with a slot name, isn't the >>> work all done, and the only thing we need is to synchronize protocol and >>> close the connection? >> >> There are 2 possible states at that point, either tablesync is ahead (when >> main apply lags or nothing is happening on publication side) or it's behind >> the main apply. When tablesync is ahead we are indeed done and just need to >> update the state of the table (which is what the code you removed did, but >> LogicalRepApplyLoop should do it as well, just a bit later). When it's >> behind we need to do catchup for that table only which still happens in the >> tablesync worker. See the explanation at the beginning of tablesync.c, it >> probably needs some small adjustments after the changes in your first patch. > > ... Ooh, things start to make some sense now. So how about the > attached? There are some not really related cleanups. (Changes to > protocol.sgml are still pending.) > It would be nice if the new sentences at the beginning of tablesync.c started with uppercase, but that's about as nitpicky as I can be :) > If I understand correcly, the early exit in tablesync.c is not saving *a > lot* of time (we don't actually skip replaying any WAL), even if it's > saving execution of a bunch of code. So I stand by my position that > removing the code is better because it's clearer about what is actually > happening. > I don't really have any problems with the simplification you propose. The saved time is probably in order of hundreds of ms which for table sync is insignificant. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
В списке pgsql-bugs по дате отправления: