Обсуждение: canceling/terminating statement due to conflict with recovery in Replica/DR instances

Поиск
Список
Период
Сортировка

canceling/terminating statement due to conflict with recovery in Replica/DR instances

От
Ishan joshi
Дата:
Hi Team,

We are using Postgresql 16.9 in production and with large database about 25TB of size. We have patroni setup with replica instance and DR patroni setup with patroni streaming.

We have high volume and frequent commit in the database. There are few large tables for which we asked client to execute queries on DR/Replica instances but these queries are start getting failed with "canceling statement due to conflict with recovery" and "terminating statement due to conflict with recovery" error.

As I understand the behavior is correct but we need to get rid of this issue.

I gone through the old posts and some documentation and got to know that below parameters can help to reduce this error. 

max_standby_streaming_delay 
max_standby_archive_delay 
hot_standby_feedback = off

Our queries are running for long period that makes me to set this value to some minutes/hours (lets set 900s) which is not feasible for production as it will start impacting the replication lag. Also, the queries will fail if it reaches to mentioned thresholds.

If I set these parameters to "-1" (disable) then there will be direct impact on replication lag which will impact further queries on replica node and DR cluster.

Can you please guide If any other better solution present for such scenario?

Thanks & Regards,
Ishan Joshi

Re: canceling/terminating statement due to conflict with recovery in Replica/DR instances

От
Laurenz Albe
Дата:
On Tue, 2025-09-30 at 05:59 +0000, Ishan joshi wrote:
> We are using Postgresql 16.9 in production and with large database about 25TB
> of size. We have patroni setup with replica instance and DR patroni setup with
> patroni streaming.
>
> We have high volume and frequent commit in the database. There are few large
> tables for which we asked client to execute queries on DR/Replica instances but
> these queries are start getting failed with "canceling statement due to conflict
> with recovery" and "terminating statement due to conflict with recovery" error.
>
> As I understand the behavior is correct but we need to get rid of this issue.
>
> I gone through the old posts and some documentation and got to know that below
> parameters can help to reduce this error. 
>
> max_standby_streaming_delay 
> max_standby_archive_delay 
> hot_standby_feedback = off
>
> Our queries are running for long period that makes me to set this value to some
> minutes/hours (lets set 900s) which is not feasible for production as it will
> start impacting the replication lag. Also, the queries will fail if it reaches
> to mentioned thresholds.
>
> If I set these parameters to "-1" (disable) then there will be direct impact on
> replication lag which will impact further queries on replica node and DR cluster.
>
> Can you please guide If any other better solution present for such scenario?

No, there is no better solution.

You can reduce replication conflicts by turning on "hot_standby_feedback" and by
turning off "vacuum_truncate", but you probably won't be able to get rid of all
replication conflicts.

You can either have a small replay delay and canceled queries or no canceled
queries, but the occasional replay delay.

If you need both no delay and no canceled queries, the only clean solution is
to have two standby servers.

Yours,
Laurenz Albe



Hi Laurenz

Thanks for all the answers you give on this list.
Could you elaborate on why two or more standby servers would help in this case ? 

Med venlig hilsen

Peter Gram
Sæbyholmsvej 18 
2500 Valby

Mobile: (+45) 5374 7107



On Tue, 30 Sept 2025 at 08:17, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2025-09-30 at 05:59 +0000, Ishan joshi wrote:
> We are using Postgresql 16.9 in production and with large database about 25TB
> of size. We have patroni setup with replica instance and DR patroni setup with
> patroni streaming.
>
> We have high volume and frequent commit in the database. There are few large
> tables for which we asked client to execute queries on DR/Replica instances but
> these queries are start getting failed with "canceling statement due to conflict
> with recovery" and "terminating statement due to conflict with recovery" error.
>
> As I understand the behavior is correct but we need to get rid of this issue.
>
> I gone through the old posts and some documentation and got to know that below
> parameters can help to reduce this error. 
>
> max_standby_streaming_delay 
> max_standby_archive_delay 
> hot_standby_feedback = off
>
> Our queries are running for long period that makes me to set this value to some
> minutes/hours (lets set 900s) which is not feasible for production as it will
> start impacting the replication lag. Also, the queries will fail if it reaches
> to mentioned thresholds.
>
> If I set these parameters to "-1" (disable) then there will be direct impact on
> replication lag which will impact further queries on replica node and DR cluster.
>
> Can you please guide If any other better solution present for such scenario?

No, there is no better solution.

You can reduce replication conflicts by turning on "hot_standby_feedback" and by
turning off "vacuum_truncate", but you probably won't be able to get rid of all
replication conflicts.

You can either have a small replay delay and canceled queries or no canceled
queries, but the occasional replay delay.

If you need both no delay and no canceled queries, the only clean solution is
to have two standby servers.

Yours,
Laurenz Albe


Re: canceling/terminating statement due to conflict with recovery in Replica/DR instances

От
Laurenz Albe
Дата:
On Tue, 2025-09-30 at 09:58 +0200, Peter Gram wrote:
> On Tue, 30 Sept 2025 at 08:17, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > On Tue, 2025-09-30 at 05:59 +0000, Ishan joshi wrote:
> > > There are few large
> > > tables for which we asked client to execute queries on DR/Replica instances but
> > > these queries are start getting failed with "canceling statement due to conflict
> > > with recovery" and "terminating statement due to conflict with recovery" error.
> > >
> > > As I understand the behavior is correct but we need to get rid of this issue.
> > >
> > > I gone through the old posts and some documentation and got to know that below
> > > parameters can help to reduce this error. 
> > >
> > > max_standby_streaming_delay 
> > > max_standby_archive_delay 
> > > hot_standby_feedback = off
> > >
> > > Our queries are running for long period that makes me to set this value to some
> > > minutes/hours (lets set 900s) which is not feasible for production as it will
> > > start impacting the replication lag. Also, the queries will fail if it reaches
> > > to mentioned thresholds.
> > >
> > > If I set these parameters to "-1" (disable) then there will be direct impact on
> > > replication lag which will impact further queries on replica node and DR cluster.
> > >
> > > Can you please guide If any other better solution present for such scenario?
> >
> > No, there is no better solution.
> >
> > If you need both no delay and no canceled queries, the only clean solution is
> > to have two standby servers.
>
> Could you elaborate on why two or more standby servers would help in this case ?

One of the standby servers would have "max_standby_streaming_delay = 0" or
"hot_standby = off", that one would be for high availability.

The other one would have "max_standby_streaming_delay = -1" and would be used for
queries.

Yours,
Laurenz Albe



Hi Isha, 

 I believe you have partitions and correct type of indexes created for those tables. Also, is this 25 TB size grown over many years or just few years old? Parameters tuning can help but won't be a permanent solution. Having multiple replicas I believe can make sense at this point.

Thanks, 
Imran

On Tue, Sep 30, 2025, 8:59 AM Ishan joshi <ishanjoshi@live.com> wrote:
Hi Team,

We are using Postgresql 16.9 in production and with large database about 25TB of size. We have patroni setup with replica instance and DR patroni setup with patroni streaming.

We have high volume and frequent commit in the database. There are few large tables for which we asked client to execute queries on DR/Replica instances but these queries are start getting failed with "canceling statement due to conflict with recovery" and "terminating statement due to conflict with recovery" error.

As I understand the behavior is correct but we need to get rid of this issue.

I gone through the old posts and some documentation and got to know that below parameters can help to reduce this error. 

max_standby_streaming_delay 
max_standby_archive_delay 
hot_standby_feedback = off

Our queries are running for long period that makes me to set this value to some minutes/hours (lets set 900s) which is not feasible for production as it will start impacting the replication lag. Also, the queries will fail if it reaches to mentioned thresholds.

If I set these parameters to "-1" (disable) then there will be direct impact on replication lag which will impact further queries on replica node and DR cluster.

Can you please guide If any other better solution present for such scenario?

Thanks & Regards,
Ishan Joshi
Hi Laurenz,

Thanks,  for your explanations. It makes sense for having another replica instance but in our case, it is not possible to have another replica instance with huge database size.

We will see the impact with delaying the reply lag and act accordingly.


Thanks & Regards,
Ishan Joshi


From: Laurenz Albe <laurenz.albe@cybertec.at>
Sent: 30 September 2025 15:10
To: Peter Gram <peter.m.gram@gmail.com>
Cc: Ishan joshi <ishanjoshi@live.com>; pgsql-admin@postgresql.org <pgsql-admin@postgresql.org>
Subject: Re: canceling/terminating statement due to conflict with recovery in Replica/DR instances
 
On Tue, 2025-09-30 at 09:58 +0200, Peter Gram wrote:
> On Tue, 30 Sept 2025 at 08:17, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > On Tue, 2025-09-30 at 05:59 +0000, Ishan joshi wrote:
> > > There are few large
> > > tables for which we asked client to execute queries on DR/Replica instances but
> > > these queries are start getting failed with "canceling statement due to conflict
> > > with recovery" and "terminating statement due to conflict with recovery" error.
> > >
> > > As I understand the behavior is correct but we need to get rid of this issue.
> > >
> > > I gone through the old posts and some documentation and got to know that below
> > > parameters can help to reduce this error. 
> > >
> > > max_standby_streaming_delay 
> > > max_standby_archive_delay 
> > > hot_standby_feedback = off
> > >
> > > Our queries are running for long period that makes me to set this value to some
> > > minutes/hours (lets set 900s) which is not feasible for production as it will
> > > start impacting the replication lag. Also, the queries will fail if it reaches
> > > to mentioned thresholds.
> > >
> > > If I set these parameters to "-1" (disable) then there will be direct impact on
> > > replication lag which will impact further queries on replica node and DR cluster.
> > >
> > > Can you please guide If any other better solution present for such scenario?
> >
> > No, there is no better solution.
> >
> > If you need both no delay and no canceled queries, the only clean solution is
> > to have two standby servers.
>
> Could you elaborate on why two or more standby servers would help in this case ?

One of the standby servers would have "max_standby_streaming_delay = 0" or
"hot_standby = off", that one would be for high availability.

The other one would have "max_standby_streaming_delay = -1" and would be used for
queries.

Yours,
Laurenz Albe
Hi Imran,

Thanks for your reply.

We have migrated from Oracle to Postgres these 25TB database. As the storage is huge we are not in position to create new replica instance/cluster.

Yes, I also believe the tuning the parameter is not long-term solution but we will check the impact and validate the same.

 

Thanks & Regards,
Ishan Joshi


From: Imran Khan <imran.k.23@gmail.com>
Sent: 30 September 2025 18:06
To: Ishan joshi <ishanjoshi@live.com>
Cc: pgsql-admin <pgsql-admin@postgresql.org>
Subject: Re: canceling/terminating statement due to conflict with recovery in Replica/DR instances
 
Hi Isha, 

 I believe you have partitions and correct type of indexes created for those tables. Also, is this 25 TB size grown over many years or just few years old? Parameters tuning can help but won't be a permanent solution. Having multiple replicas I believe can make sense at this point.

Thanks, 
Imran

On Tue, Sep 30, 2025, 8:59 AM Ishan joshi <ishanjoshi@live.com> wrote:
Hi Team,

We are using Postgresql 16.9 in production and with large database about 25TB of size. We have patroni setup with replica instance and DR patroni setup with patroni streaming.

We have high volume and frequent commit in the database. There are few large tables for which we asked client to execute queries on DR/Replica instances but these queries are start getting failed with "canceling statement due to conflict with recovery" and "terminating statement due to conflict with recovery" error.

As I understand the behavior is correct but we need to get rid of this issue.

I gone through the old posts and some documentation and got to know that below parameters can help to reduce this error. 

max_standby_streaming_delay 
max_standby_archive_delay 
hot_standby_feedback = off

Our queries are running for long period that makes me to set this value to some minutes/hours (lets set 900s) which is not feasible for production as it will start impacting the replication lag. Also, the queries will fail if it reaches to mentioned thresholds.

If I set these parameters to "-1" (disable) then there will be direct impact on replication lag which will impact further queries on replica node and DR cluster.

Can you please guide If any other better solution present for such scenario?

Thanks & Regards,
Ishan Joshi