Обсуждение: canceling/terminating statement due to conflict with recovery in Replica/DR instances
Hi Team,
We are using Postgresql 16.9 in production and with large database about 25TB of size. We have patroni setup with replica instance and DR patroni setup with patroni streaming.
We are using Postgresql 16.9 in production and with large database about 25TB of size. We have patroni setup with replica instance and DR patroni setup with patroni streaming.
We have high volume and frequent commit in the database. There are few large tables for which we asked client to execute queries on DR/Replica instances but these queries are start getting failed with "canceling statement due to conflict with recovery" and "terminating statement due to conflict with recovery" error.
As I understand the behavior is correct but we need to get rid of this issue.
I gone through the old posts and some documentation and got to know that below parameters can help to reduce this error.
max_standby_streaming_delay
As I understand the behavior is correct but we need to get rid of this issue.
I gone through the old posts and some documentation and got to know that below parameters can help to reduce this error.
max_standby_streaming_delay
max_standby_archive_delay
hot_standby_feedback = off
Our queries are running for long period that makes me to set this value to some minutes/hours (lets set 900s) which is not feasible for production as it will start impacting the replication lag. Also, the queries will fail if it reaches to mentioned thresholds.
If I set these parameters to "-1" (disable) then there will be direct impact on replication lag which will impact further queries on replica node and DR cluster.
If I set these parameters to "-1" (disable) then there will be direct impact on replication lag which will impact further queries on replica node and DR cluster.
Can you please guide If any other better solution present for such scenario?
Thanks & Regards,
Ishan Joshi
On Tue, 2025-09-30 at 05:59 +0000, Ishan joshi wrote: > We are using Postgresql 16.9 in production and with large database about 25TB > of size. We have patroni setup with replica instance and DR patroni setup with > patroni streaming. > > We have high volume and frequent commit in the database. There are few large > tables for which we asked client to execute queries on DR/Replica instances but > these queries are start getting failed with "canceling statement due to conflict > with recovery" and "terminating statement due to conflict with recovery" error. > > As I understand the behavior is correct but we need to get rid of this issue. > > I gone through the old posts and some documentation and got to know that below > parameters can help to reduce this error. > > max_standby_streaming_delay > max_standby_archive_delay > hot_standby_feedback = off > > Our queries are running for long period that makes me to set this value to some > minutes/hours (lets set 900s) which is not feasible for production as it will > start impacting the replication lag. Also, the queries will fail if it reaches > to mentioned thresholds. > > If I set these parameters to "-1" (disable) then there will be direct impact on > replication lag which will impact further queries on replica node and DR cluster. > > Can you please guide If any other better solution present for such scenario? No, there is no better solution. You can reduce replication conflicts by turning on "hot_standby_feedback" and by turning off "vacuum_truncate", but you probably won't be able to get rid of all replication conflicts. You can either have a small replay delay and canceled queries or no canceled queries, but the occasional replay delay. If you need both no delay and no canceled queries, the only clean solution is to have two standby servers. Yours, Laurenz Albe
Hi Laurenz
Thanks for all the answers you give on this list.
Could you elaborate on why two or more standby servers would help in this case ?
Med venlig hilsen
Peter Gram
Sæbyholmsvej 18
Peter Gram
Sæbyholmsvej 18
2500 Valby
Mobile: (+45) 5374 7107
Email: peter.m.gram@gmail.com
On Tue, 30 Sept 2025 at 08:17, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2025-09-30 at 05:59 +0000, Ishan joshi wrote:
> We are using Postgresql 16.9 in production and with large database about 25TB
> of size. We have patroni setup with replica instance and DR patroni setup with
> patroni streaming.
>
> We have high volume and frequent commit in the database. There are few large
> tables for which we asked client to execute queries on DR/Replica instances but
> these queries are start getting failed with "canceling statement due to conflict
> with recovery" and "terminating statement due to conflict with recovery" error.
>
> As I understand the behavior is correct but we need to get rid of this issue.
>
> I gone through the old posts and some documentation and got to know that below
> parameters can help to reduce this error.
>
> max_standby_streaming_delay
> max_standby_archive_delay
> hot_standby_feedback = off
>
> Our queries are running for long period that makes me to set this value to some
> minutes/hours (lets set 900s) which is not feasible for production as it will
> start impacting the replication lag. Also, the queries will fail if it reaches
> to mentioned thresholds.
>
> If I set these parameters to "-1" (disable) then there will be direct impact on
> replication lag which will impact further queries on replica node and DR cluster.
>
> Can you please guide If any other better solution present for such scenario?
No, there is no better solution.
You can reduce replication conflicts by turning on "hot_standby_feedback" and by
turning off "vacuum_truncate", but you probably won't be able to get rid of all
replication conflicts.
You can either have a small replay delay and canceled queries or no canceled
queries, but the occasional replay delay.
If you need both no delay and no canceled queries, the only clean solution is
to have two standby servers.
Yours,
Laurenz Albe
On Tue, 2025-09-30 at 09:58 +0200, Peter Gram wrote: > On Tue, 30 Sept 2025 at 08:17, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Tue, 2025-09-30 at 05:59 +0000, Ishan joshi wrote: > > > There are few large > > > tables for which we asked client to execute queries on DR/Replica instances but > > > these queries are start getting failed with "canceling statement due to conflict > > > with recovery" and "terminating statement due to conflict with recovery" error. > > > > > > As I understand the behavior is correct but we need to get rid of this issue. > > > > > > I gone through the old posts and some documentation and got to know that below > > > parameters can help to reduce this error. > > > > > > max_standby_streaming_delay > > > max_standby_archive_delay > > > hot_standby_feedback = off > > > > > > Our queries are running for long period that makes me to set this value to some > > > minutes/hours (lets set 900s) which is not feasible for production as it will > > > start impacting the replication lag. Also, the queries will fail if it reaches > > > to mentioned thresholds. > > > > > > If I set these parameters to "-1" (disable) then there will be direct impact on > > > replication lag which will impact further queries on replica node and DR cluster. > > > > > > Can you please guide If any other better solution present for such scenario? > > > > No, there is no better solution. > > > > If you need both no delay and no canceled queries, the only clean solution is > > to have two standby servers. > > Could you elaborate on why two or more standby servers would help in this case ? One of the standby servers would have "max_standby_streaming_delay = 0" or "hot_standby = off", that one would be for high availability. The other one would have "max_standby_streaming_delay = -1" and would be used for queries. Yours, Laurenz Albe
Hi Isha,
I believe you have partitions and correct type of indexes created for those tables. Also, is this 25 TB size grown over many years or just few years old? Parameters tuning can help but won't be a permanent solution. Having multiple replicas I believe can make sense at this point.
Thanks,
Imran
On Tue, Sep 30, 2025, 8:59 AM Ishan joshi <ishanjoshi@live.com> wrote:
Hi Team,
We are using Postgresql 16.9 in production and with large database about 25TB of size. We have patroni setup with replica instance and DR patroni setup with patroni streaming.We have high volume and frequent commit in the database. There are few large tables for which we asked client to execute queries on DR/Replica instances but these queries are start getting failed with "canceling statement due to conflict with recovery" and "terminating statement due to conflict with recovery" error.
As I understand the behavior is correct but we need to get rid of this issue.
I gone through the old posts and some documentation and got to know that below parameters can help to reduce this error.
max_standby_streaming_delaymax_standby_archive_delayhot_standby_feedback = offOur queries are running for long period that makes me to set this value to some minutes/hours (lets set 900s) which is not feasible for production as it will start impacting the replication lag. Also, the queries will fail if it reaches to mentioned thresholds.
If I set these parameters to "-1" (disable) then there will be direct impact on replication lag which will impact further queries on replica node and DR cluster.Can you please guide If any other better solution present for such scenario?Thanks & Regards,Ishan Joshi
Hi Laurenz,
Thanks, for your explanations. It makes sense for having another replica instance but in our case, it is not possible to have another replica instance with huge database size.
Thanks, for your explanations. It makes sense for having another replica instance but in our case, it is not possible to have another replica instance with huge database size.
We will see the impact with delaying the reply lag and act accordingly.
Thanks & Regards,
Ishan Joshi
From: Laurenz Albe <laurenz.albe@cybertec.at>
Sent: 30 September 2025 15:10
To: Peter Gram <peter.m.gram@gmail.com>
Cc: Ishan joshi <ishanjoshi@live.com>; pgsql-admin@postgresql.org <pgsql-admin@postgresql.org>
Subject: Re: canceling/terminating statement due to conflict with recovery in Replica/DR instances
Sent: 30 September 2025 15:10
To: Peter Gram <peter.m.gram@gmail.com>
Cc: Ishan joshi <ishanjoshi@live.com>; pgsql-admin@postgresql.org <pgsql-admin@postgresql.org>
Subject: Re: canceling/terminating statement due to conflict with recovery in Replica/DR instances
On Tue, 2025-09-30 at 09:58 +0200, Peter Gram wrote:
> On Tue, 30 Sept 2025 at 08:17, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > On Tue, 2025-09-30 at 05:59 +0000, Ishan joshi wrote:
> > > There are few large
> > > tables for which we asked client to execute queries on DR/Replica instances but
> > > these queries are start getting failed with "canceling statement due to conflict
> > > with recovery" and "terminating statement due to conflict with recovery" error.
> > >
> > > As I understand the behavior is correct but we need to get rid of this issue.
> > >
> > > I gone through the old posts and some documentation and got to know that below
> > > parameters can help to reduce this error.
> > >
> > > max_standby_streaming_delay
> > > max_standby_archive_delay
> > > hot_standby_feedback = off
> > >
> > > Our queries are running for long period that makes me to set this value to some
> > > minutes/hours (lets set 900s) which is not feasible for production as it will
> > > start impacting the replication lag. Also, the queries will fail if it reaches
> > > to mentioned thresholds.
> > >
> > > If I set these parameters to "-1" (disable) then there will be direct impact on
> > > replication lag which will impact further queries on replica node and DR cluster.
> > >
> > > Can you please guide If any other better solution present for such scenario?
> >
> > No, there is no better solution.
> >
> > If you need both no delay and no canceled queries, the only clean solution is
> > to have two standby servers.
>
> Could you elaborate on why two or more standby servers would help in this case ?
One of the standby servers would have "max_standby_streaming_delay = 0" or
"hot_standby = off", that one would be for high availability.
The other one would have "max_standby_streaming_delay = -1" and would be used for
queries.
Yours,
Laurenz Albe
> On Tue, 30 Sept 2025 at 08:17, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > On Tue, 2025-09-30 at 05:59 +0000, Ishan joshi wrote:
> > > There are few large
> > > tables for which we asked client to execute queries on DR/Replica instances but
> > > these queries are start getting failed with "canceling statement due to conflict
> > > with recovery" and "terminating statement due to conflict with recovery" error.
> > >
> > > As I understand the behavior is correct but we need to get rid of this issue.
> > >
> > > I gone through the old posts and some documentation and got to know that below
> > > parameters can help to reduce this error.
> > >
> > > max_standby_streaming_delay
> > > max_standby_archive_delay
> > > hot_standby_feedback = off
> > >
> > > Our queries are running for long period that makes me to set this value to some
> > > minutes/hours (lets set 900s) which is not feasible for production as it will
> > > start impacting the replication lag. Also, the queries will fail if it reaches
> > > to mentioned thresholds.
> > >
> > > If I set these parameters to "-1" (disable) then there will be direct impact on
> > > replication lag which will impact further queries on replica node and DR cluster.
> > >
> > > Can you please guide If any other better solution present for such scenario?
> >
> > No, there is no better solution.
> >
> > If you need both no delay and no canceled queries, the only clean solution is
> > to have two standby servers.
>
> Could you elaborate on why two or more standby servers would help in this case ?
One of the standby servers would have "max_standby_streaming_delay = 0" or
"hot_standby = off", that one would be for high availability.
The other one would have "max_standby_streaming_delay = -1" and would be used for
queries.
Yours,
Laurenz Albe
Hi Imran,
Thanks for your reply.
We have migrated from Oracle to Postgres these 25TB database. As the storage is huge we are not in position to create new replica instance/cluster.
Yes, I also believe the tuning the parameter is not long-term solution but we will check the impact and validate the same.
Thanks & Regards,
Ishan Joshi
From: Imran Khan <imran.k.23@gmail.com>
Sent: 30 September 2025 18:06
To: Ishan joshi <ishanjoshi@live.com>
Sent: 30 September 2025 18:06
To: Ishan joshi <ishanjoshi@live.com>
Cc: pgsql-admin <pgsql-admin@postgresql.org>
Subject: Re: canceling/terminating statement due to conflict with recovery in Replica/DR instances
Hi Isha,
I believe you have partitions and correct type of indexes created for those tables. Also, is this 25 TB size grown over many years or just few years old? Parameters tuning can help but won't be a permanent solution. Having multiple replicas I believe can make sense at this point.
Thanks,
Imran
On Tue, Sep 30, 2025, 8:59 AM Ishan joshi <ishanjoshi@live.com> wrote:
Hi Team,
We are using Postgresql 16.9 in production and with large database about 25TB of size. We have patroni setup with replica instance and DR patroni setup with patroni streaming.We have high volume and frequent commit in the database. There are few large tables for which we asked client to execute queries on DR/Replica instances but these queries are start getting failed with "canceling statement due to conflict with recovery" and "terminating statement due to conflict with recovery" error.
As I understand the behavior is correct but we need to get rid of this issue.
I gone through the old posts and some documentation and got to know that below parameters can help to reduce this error.
max_standby_streaming_delaymax_standby_archive_delayhot_standby_feedback = offOur queries are running for long period that makes me to set this value to some minutes/hours (lets set 900s) which is not feasible for production as it will start impacting the replication lag. Also, the queries will fail if it reaches to mentioned thresholds.
If I set these parameters to "-1" (disable) then there will be direct impact on replication lag which will impact further queries on replica node and DR cluster.Can you please guide If any other better solution present for such scenario?Thanks & Regards,Ishan Joshi