Обсуждение: outdated references to replication timeout

Поиск
Список
Период
Сортировка

outdated references to replication timeout

От
John Naylor
Дата:
Hi,

The parameter replication_timeout was retired in commit 6f60fdd701 back in 2012, but some comments and error messages seem to refer to that old setting instead of wal_sender_timeout or wal_receiver_timeout. The attached patch replaces the old language with more specific references.

--
Вложения

Re: outdated references to replication timeout

От
Fujii Masao
Дата:
On Wed, Jan 13, 2021 at 5:39 AM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
> Hi,
>
> The parameter replication_timeout was retired in commit 6f60fdd701 back in 2012, but some comments and error messages
seemto refer to that old setting instead of wal_sender_timeout or wal_receiver_timeout. The attached patch replaces the
oldlanguage with more specific references. 

Thanks for the patch! I think this change makes sense.

-                               (errmsg("terminating walsender process
due to replication timeout")));
+                               (errmsg("terminating walsender process
due to WAL sender timeout")));

Isn't it a bit strange to include different expressions "walsender" and
"WAL sender" for the same thing in one message?


This is a bit related, but different topic, though. If we change the above
message about walsender timeout, I also want to change the message about
walreceiver timeout, so as to make them more consistent. For example,

- (errmsg("terminating walreceiver due to timeout")));
+ (errmsg("terminating WAL receiver process due to WAL receiver timeout")));

Regards,

--
Fujii Masao



Re: outdated references to replication timeout

От
John Naylor
Дата:

On Tue, Jan 12, 2021 at 9:37 PM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> Thanks for the patch! I think this change makes sense.
>
> -                               (errmsg("terminating walsender process
> due to replication timeout")));
> +                               (errmsg("terminating walsender process
> due to WAL sender timeout")));
>
> Isn't it a bit strange to include different expressions "walsender" and
> "WAL sender" for the same thing in one message?

It is strange, now that I think about it. My thinking was that the former wording of "replication timeout" was a less literal reference to the replication_timeout parameter, so I did the same for wal_sender_timeout. A quick look shows we are not consistent in the documentation as far as walsender vs. WAL sender. For the purpose of the patch I agree it should be consistent within a single message. Maybe the parameter should be spelled exactly as is, with underscores? I'll take a broader look and send an updated patch.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: outdated references to replication timeout

От
Fujii Masao
Дата:
On Wed, Jan 13, 2021 at 10:51 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
>
> On Tue, Jan 12, 2021 at 9:37 PM Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > Thanks for the patch! I think this change makes sense.
> >
> > -                               (errmsg("terminating walsender process
> > due to replication timeout")));
> > +                               (errmsg("terminating walsender process
> > due to WAL sender timeout")));
> >
> > Isn't it a bit strange to include different expressions "walsender" and
> > "WAL sender" for the same thing in one message?
>
> It is strange, now that I think about it. My thinking was that the former wording of "replication timeout" was a less
literalreference to the replication_timeout parameter, so I did the same for wal_sender_timeout. A quick look shows we
arenot consistent in the documentation as far as walsender vs. WAL sender. For the purpose of the patch I agree it
shouldbe consistent within a single message. Maybe the parameter should be spelled exactly as is, with underscores? 

I'm ok with that. But there seems no other timeout messages using
the parameter name.

src/backend/replication/logical/worker.c: (errmsg("terminating logical
replication worker due to timeout")));
src/backend/replication/walreceiver.c: (errmsg("terminating
walreceiver due to timeout")));
src/backend/replication/walsender.c: (errmsg("terminating walsender
process due to replication timeout")));
src/backend/tcop/postgres.c: errmsg("canceling authentication due to
timeout")));
src/backend/tcop/postgres.c: errmsg("canceling statement due to lock
timeout")));
src/backend/tcop/postgres.c: errmsg("canceling statement due to
statement timeout")));
src/backend/tcop/postgres.c: errmsg("terminating connection due to
idle-in-transaction timeout")));
src/backend/tcop/postgres.c: errmsg("terminating connection due to
idle-session timeout")));
src/backend/utils/misc/timeout.c: errmsg("cannot add more timeout reasons")));


> I'll take a broader look and send an updated patch.

Thanks!

Regards,

--
Fujii Masao



Re: outdated references to replication timeout

От
Michael Paquier
Дата:
On Wed, Jan 13, 2021 at 11:28:55PM +0900, Fujii Masao wrote:
> On Wed, Jan 13, 2021 at 10:51 PM John Naylor <john.naylor@enterprisedb.com> wrote:
>> It is strange, now that I think about it. My thinking was that the
>> former wording of "replication timeout" was a less literal
>> reference to the replication_timeout parameter, so I did the same
>> for wal_sender_timeout. A quick look shows we are not consistent
>> in the documentation as far as walsender vs. WAL sender. For the
>> purpose of the patch I agree it should be consistent within a
>> single message. Maybe the parameter should be spelled exactly as
>> is, with underscores?
>
> I'm ok with that. But there seems no other timeout messages using
> the parameter name.

Could it be that nothing needs to change here because this refers to
timeouts with the replication protocol?  The current state of things
on HEAD does not sound completely incorrect to me either.
--
Michael

Вложения

Re: outdated references to replication timeout

От
John Naylor
Дата:


On Thu, Jan 14, 2021 at 1:55 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jan 13, 2021 at 11:28:55PM +0900, Fujii Masao wrote:
> > On Wed, Jan 13, 2021 at 10:51 PM John Naylor <john.naylor@enterprisedb.com> wrote:
> >> It is strange, now that I think about it. My thinking was that the
> >> former wording of "replication timeout" was a less literal
> >> reference to the replication_timeout parameter, so I did the same
> >> for wal_sender_timeout. A quick look shows we are not consistent
> >> in the documentation as far as walsender vs. WAL sender. For the
> >> purpose of the patch I agree it should be consistent within a
> >> single message. Maybe the parameter should be spelled exactly as
> >> is, with underscores?
> >
> > I'm ok with that. But there seems no other timeout messages using
> > the parameter name.
>
> Could it be that nothing needs to change here because this refers to
> timeouts with the replication protocol?  The current state of things
> on HEAD does not sound completely incorrect to me either.

It was off enough to cause brief confusion during a customer emergency. To show a bit more context around one of the proposed corrections:

 timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
 wal_sender_timeout);

if (wal_sender_timeout > 0 && last_processing >= timeout)
{
/*
* Since typically expiration of replication timeout means
* communication problem, we don't send the error message to the
* standby.
*/
ereport(COMMERROR,
(errmsg("terminating walsender process due to replication timeout")));

My reading is, this is a case where the message didn't keep up with the change in parameter name.

--
John Naylor
EDB: http://www.enterprisedb.com