Обсуждение: outdated references to replication timeout
Hi,
The parameter replication_timeout was retired in commit 6f60fdd701 back in 2012, but some comments and error messages seem to refer to that old setting instead of wal_sender_timeout or wal_receiver_timeout. The attached patch replaces the old language with more specific references.
--
The parameter replication_timeout was retired in commit 6f60fdd701 back in 2012, but some comments and error messages seem to refer to that old setting instead of wal_sender_timeout or wal_receiver_timeout. The attached patch replaces the old language with more specific references.
Вложения
On Wed, Jan 13, 2021 at 5:39 AM John Naylor <john.naylor@enterprisedb.com> wrote: > > Hi, > > The parameter replication_timeout was retired in commit 6f60fdd701 back in 2012, but some comments and error messages seemto refer to that old setting instead of wal_sender_timeout or wal_receiver_timeout. The attached patch replaces the oldlanguage with more specific references. Thanks for the patch! I think this change makes sense. - (errmsg("terminating walsender process due to replication timeout"))); + (errmsg("terminating walsender process due to WAL sender timeout"))); Isn't it a bit strange to include different expressions "walsender" and "WAL sender" for the same thing in one message? This is a bit related, but different topic, though. If we change the above message about walsender timeout, I also want to change the message about walreceiver timeout, so as to make them more consistent. For example, - (errmsg("terminating walreceiver due to timeout"))); + (errmsg("terminating WAL receiver process due to WAL receiver timeout"))); Regards, -- Fujii Masao
On Tue, Jan 12, 2021 at 9:37 PM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> Thanks for the patch! I think this change makes sense.
>
> - (errmsg("terminating walsender process
> due to replication timeout")));
> + (errmsg("terminating walsender process
> due to WAL sender timeout")));
>
> Isn't it a bit strange to include different expressions "walsender" and
> "WAL sender" for the same thing in one message?
It is strange, now that I think about it. My thinking was that the former wording of "replication timeout" was a less literal reference to the replication_timeout parameter, so I did the same for wal_sender_timeout. A quick look shows we are not consistent in the documentation as far as walsender vs. WAL sender. For the purpose of the patch I agree it should be consistent within a single message. Maybe the parameter should be spelled exactly as is, with underscores? I'll take a broader look and send an updated patch.
--
John Naylor
EDB: http://www.enterprisedb.com
On Wed, Jan 13, 2021 at 10:51 PM John Naylor <john.naylor@enterprisedb.com> wrote: > > > On Tue, Jan 12, 2021 at 9:37 PM Fujii Masao <masao.fujii@gmail.com> wrote: > > > > Thanks for the patch! I think this change makes sense. > > > > - (errmsg("terminating walsender process > > due to replication timeout"))); > > + (errmsg("terminating walsender process > > due to WAL sender timeout"))); > > > > Isn't it a bit strange to include different expressions "walsender" and > > "WAL sender" for the same thing in one message? > > It is strange, now that I think about it. My thinking was that the former wording of "replication timeout" was a less literalreference to the replication_timeout parameter, so I did the same for wal_sender_timeout. A quick look shows we arenot consistent in the documentation as far as walsender vs. WAL sender. For the purpose of the patch I agree it shouldbe consistent within a single message. Maybe the parameter should be spelled exactly as is, with underscores? I'm ok with that. But there seems no other timeout messages using the parameter name. src/backend/replication/logical/worker.c: (errmsg("terminating logical replication worker due to timeout"))); src/backend/replication/walreceiver.c: (errmsg("terminating walreceiver due to timeout"))); src/backend/replication/walsender.c: (errmsg("terminating walsender process due to replication timeout"))); src/backend/tcop/postgres.c: errmsg("canceling authentication due to timeout"))); src/backend/tcop/postgres.c: errmsg("canceling statement due to lock timeout"))); src/backend/tcop/postgres.c: errmsg("canceling statement due to statement timeout"))); src/backend/tcop/postgres.c: errmsg("terminating connection due to idle-in-transaction timeout"))); src/backend/tcop/postgres.c: errmsg("terminating connection due to idle-session timeout"))); src/backend/utils/misc/timeout.c: errmsg("cannot add more timeout reasons"))); > I'll take a broader look and send an updated patch. Thanks! Regards, -- Fujii Masao
On Wed, Jan 13, 2021 at 11:28:55PM +0900, Fujii Masao wrote: > On Wed, Jan 13, 2021 at 10:51 PM John Naylor <john.naylor@enterprisedb.com> wrote: >> It is strange, now that I think about it. My thinking was that the >> former wording of "replication timeout" was a less literal >> reference to the replication_timeout parameter, so I did the same >> for wal_sender_timeout. A quick look shows we are not consistent >> in the documentation as far as walsender vs. WAL sender. For the >> purpose of the patch I agree it should be consistent within a >> single message. Maybe the parameter should be spelled exactly as >> is, with underscores? > > I'm ok with that. But there seems no other timeout messages using > the parameter name. Could it be that nothing needs to change here because this refers to timeouts with the replication protocol? The current state of things on HEAD does not sound completely incorrect to me either. -- Michael
Вложения
On Thu, Jan 14, 2021 at 1:55 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jan 13, 2021 at 11:28:55PM +0900, Fujii Masao wrote:
> > On Wed, Jan 13, 2021 at 10:51 PM John Naylor <john.naylor@enterprisedb.com> wrote:
> >> It is strange, now that I think about it. My thinking was that the
> >> former wording of "replication timeout" was a less literal
> >> reference to the replication_timeout parameter, so I did the same
> >> for wal_sender_timeout. A quick look shows we are not consistent
> >> in the documentation as far as walsender vs. WAL sender. For the
> >> purpose of the patch I agree it should be consistent within a
> >> single message. Maybe the parameter should be spelled exactly as
> >> is, with underscores?
> >
> > I'm ok with that. But there seems no other timeout messages using
> > the parameter name.
>
> Could it be that nothing needs to change here because this refers to
> timeouts with the replication protocol? The current state of things
> on HEAD does not sound completely incorrect to me either.
It was off enough to cause brief confusion during a customer emergency. To show a bit more context around one of the proposed corrections:
timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
wal_sender_timeout);
if (wal_sender_timeout > 0 && last_processing >= timeout)
{
/*
* Since typically expiration of replication timeout means
* communication problem, we don't send the error message to the
* standby.
*/
ereport(COMMERROR,
(errmsg("terminating walsender process due to replication timeout")));
My reading is, this is a case where the message didn't keep up with the change in parameter name.
--
John Naylor
EDB: http://www.enterprisedb.com
wal_sender_timeout);
if (wal_sender_timeout > 0 && last_processing >= timeout)
{
/*
* Since typically expiration of replication timeout means
* communication problem, we don't send the error message to the
* standby.
*/
ereport(COMMERROR,
(errmsg("terminating walsender process due to replication timeout")));
My reading is, this is a case where the message didn't keep up with the change in parameter name.
--
John Naylor
EDB: http://www.enterprisedb.com