Re: Logical replication timeout problem

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Logical replication timeout problem
Дата
Msg-id 20230208200235.esfoggsmuvf4pugt@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: Logical replication timeout problem  (Andres Freund <andres@anarazel.de>)
Ответы Rework LogicalOutputPluginWriterUpdateProgress (WAS Re: Logical replication timeout ...)
Re: Logical replication timeout problem
Список pgsql-hackers
Hi,

On 2023-02-08 10:30:37 -0800, Andres Freund wrote:
> On 2023-02-08 10:18:41 -0800, Andres Freund wrote:
> > I don't think the syncrep logic in WalSndUpdateProgress really works as-is -
> > consider what happens if e.g. the origin filter filters out entire
> > transactions. We'll afaics never get to WalSndUpdateProgress(). In some cases
> > we'll be lucky because we'll return quickly to XLogSendLogical(), but not
> > reliably.
>
> Is it actually the right thing to check SyncRepRequested() in that logic? It's
> quite common to set up syncrep so that individual users or transactions opt
> into syncrep, but to leave the default disabled.
>
> I don't really see an alternative to making this depend solely on
> sync_standbys_defined.

Hacking on a rough prototype how I think this should rather look, I had a few
questions / remarks:

- We probably need to call UpdateProgress from a bunch of places in decode.c
  as well? Indicating that we're lagging by a lot, just because all
  transactions were in another database seems decidedly suboptimal.

- Why should lag tracking only be updated at commit like points? That seems
  like it adds odd discontinuinities?

- The mix of skipped_xact and ctx->end_xact in WalSndUpdateProgress() seems
  somewhat odd. They have very overlapping meanings IMO.

- there's no UpdateProgress calls in pgoutput_stream_abort(), but ISTM there
  should be? It's legit progress.

- That's from 6912acc04f0: I find LagTrackerRead(), LagTrackerWrite() quite
  confusing, naming-wise. IIUC "reading" is about receiving confirmation
  messages, "writing" about the time the record was generated.  ISTM that the
  current time is a quite poor approximation in XLogSendPhysical(), but pretty
  much meaningless in WalSndUpdateProgress()? Am I missing something?

- Aren't the wal_sender_timeout / 2 checks in WalSndUpdateProgress(),
  WalSndWriteData() missing wal_sender_timeout <= 0 checks?

- I don't really understand why f95d53edged55 added !end_xact to the if
  condition for ProcessPendingWrites(). Is the theory that we'll end up in an
  outer loop soon?


Attached is a current, quite rough, prototype. It addresses some of the points
raised, but far from all. There's also several XXXs/FIXMEs in it.  I changed
the file-ending to .txt to avoid hijacking the CF entry.

Greetings,

Andres Freund

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Smith
Дата:
Сообщение: Re: Deadlock between logrep apply worker and tablesync worker
Следующее
От: "Bagga, Rishu"
Дата:
Сообщение: Re: SLRUs in the main buffer pool - Page Header definitions