Re: Postgresql 11: terminating walsender process due to replication timeout
От | Kyotaro Horiguchi |
---|---|
Тема | Re: Postgresql 11: terminating walsender process due to replication timeout |
Дата | |
Msg-id | 20210909.155635.1680635236525675012.horikyota.ntt@gmail.com обсуждение исходный текст |
Ответ на | Postgresql 11: terminating walsender process due to replication timeout (Abhishek Bhola <abhishek.bhola@japannext.co.jp>) |
Ответы |
Re: Postgresql 11: terminating walsender process due to replication timeout
|
Список | pgsql-general |
At Thu, 9 Sep 2021 14:52:25 +0900, Abhishek Bhola <abhishek.bhola@japannext.co.jp> wrote in > I have found some questions about the same error, but didn't find any of > them answering my problem. > > The setup is that I have two Postgres11 clusters (A and B) and they are > making use of publication and subscription features to copy data from A to > B. > > A (source DB- publication) --------------> B (target DB - subscription) > > This works fine, but often (not always) when the data volume being inserted > on a table in node A increases, it gives the following error. > > "terminating walsender process due to replication timeout" > > The data volume at the moment being entered is about 30K rows per second > continuously for hours through COPY command. > > Earlier the wal_sender_timeout was set to 5 sec and I would see this error > much often. I then increased it to 1 min and the frequency of this error > reduced. But I don't want to keep increasing it without understanding what > is causing it. I looked at the code of walsender.c and know the exact lines > where it's coming from. > > But I am still not clear which parameter is making the sender assume that > the receiver node is inactive and therefore it should stop the wal_sender. > > Can anyone please suggest what changes I should make to remove this error? What minor-version is the Postgres server mentioned? PostgreSQL 11 have gotten the following fix at 11.6, which could be related to the trouble. https://www.postgresql.org/docs/11/release-11-6.html > Fix timeout handling in logical replication walreceiver processes > (Julien Rouhaud) > > Erroneous logic prevented wal_receiver_timeout from working in > logical replication deployments. The details of the fix is here. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3f60f690fac1bf375b92cf2f8682e8fe8f69098 > Fix timeout handling in logical replication worker > > The timestamp tracking the last moment a message is received in a > logical replication worker was initialized in each loop checking if a > message was received or not, causing wal_receiver_timeout to be ignored > in basically any logical replication deployments. This also broke the > ping sent to the server when reaching half of wal_receiver_timeout. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
В списке pgsql-general по дате отправления: