Re: Logical decoding and walsender timeouts

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: Logical decoding and walsender timeouts
Дата	31 октября 2016 г. 08:52:36
Msg-id	20161031085223.zjexqkuau5t32bfl@alap3.anarazel.de обсуждение исходный текст
Ответ на	Logical decoding and walsender timeouts (Craig Ringer <craig@2ndquadrant.com>)
Ответы	Re: Logical decoding and walsender timeouts
Список	pgsql-hackers

Дерево обсуждения

Hi,

On 2016-10-31 16:34:38 +0800, Craig Ringer wrote:
> TL;DR: Logical decoding clients need to generate their own keepalives
> and not rely on the server requesting them to prevent timeouts. Or
> admins should raise the wal_sender_timeout by a LOT when using logical
> decoding on DBs with any big rows.

Unconvinced.

> When sending a big message, WalSndWriteData() notices that it's
> approaching timeout and tries to send a keepalive request, but the
> request just gets buffered behind the remaining output plugin data and
> isn't seen by the client until the client has received the rest of the
> pending data.

Only for individual messages, not the entire transaction though.   Are
you sure the problem at hand is that we're sending a keepalive, but it's
too late? It might very well be that the actual issue is that we're
never sending keepalives, because the network is fast enough / the tcp
window is large enough.  IIRC we only send a keepalive if we're blocked
on network IO?

> So: We could ask output plugins to deal with this for us, by chunking
> up their data in small pieces and calling OutputPluginPrepareWrite()
> and OutputPluginWrite() more than once per output plugin callback if
> they expect to send a big message. But this pushes the complexity of
> splitting up and handling big rows, and big Datums, onto each plugin.
> It's awkward to do well and hard to avoid splitting things up
> unnecessarily.

There's decent reason for doing that independently though, namely that
it's a lot more efficient from a memory management POV.

I don't think the "unrequested keepalive" approach really solves the
problem on a fundamental enough level.

> (A separate issue is that we can also time out when doing logical
> _replication_ if the downstream side blocks on a lock, since it's not
> safe to send on a socket from a signal handler ... )

That's strictly speaking not true. write() / sendmsg() are signal safe
functions.  There's good reasons not to do that however, namely that the
non signal handler code might be busy writing data itself.

Greetings,

Andres Freund

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Logical decoding and walsender timeouts