Обсуждение: Re: pgsql: Preserve conflict-relevant data during logical replication.

Поиск
Список
Период
Сортировка

Re: pgsql: Preserve conflict-relevant data during logical replication.

От
Robert Haas
Дата:
On Wed, Jul 23, 2025 at 11:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> The fix looks good to me. I'll push your patch in sometime.

The tests in this patch are insufficient to prove that this logic
works properly. I tried with this patch:

diff --git a/src/backend/access/transam/twophase.c
b/src/backend/access/transam/twophase.c
index 7918176fc58..7c1d0ef07b5 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -2325,6 +2325,16 @@ RecordTransactionCommitPrepared(TransactionId xid,
     TimestampTz committs;
     bool        replorigin;

+    /*
+     * Note it is important to set committs value after marking ourselves as
+     * in the commit critical section (DELAY_CHKPT_IN_COMMIT). This is because
+     * we want to ensure all transactions that have acquired commit timestamp
+     * are finished before we allow the logical replication client to advance
+     * its xid which is used to hold back dead rows for conflict detection.
+     * See comments atop worker.c.
+     */
+    committs = GetCurrentTimestamp();
+
     /*
      * Are we using the replication origins feature?  Or, in other words, are
      * we replaying remote actions?
@@ -2344,16 +2354,6 @@ RecordTransactionCommitPrepared(TransactionId xid,
      */
     pg_write_barrier();

-    /*
-     * Note it is important to set committs value after marking ourselves as
-     * in the commit critical section (DELAY_CHKPT_IN_COMMIT). This is because
-     * we want to ensure all transactions that have acquired commit timestamp
-     * are finished before we allow the logical replication client to advance
-     * its xid which is used to hold back dead rows for conflict detection.
-     * See comments atop worker.c.
-     */
-    committs = GetCurrentTimestamp();
-
     /*
      * Emit the XLOG commit record. Note that we mark 2PC commits as
      * potentially having AccessExclusiveLocks since we don't know whether or

If it is in fact important to acquire the commit timestamp after
setting delayChkptFlags, you'd hope this would lead to a test failure,
but it doesn't for me. I understand it probably requires an injection
point to be certain of hitting the race condition, but I think that
would be worth doing. Otherwise, if something gets broken here by
accident, it might be a long time before anyone notices.

--
Robert Haas
EDB: http://www.enterprisedb.com



RE: pgsql: Preserve conflict-relevant data during logical replication.

От
"Zhijie Hou (Fujitsu)"
Дата:
On Tuesday, September 2, 2025 11:03 PM Robert Haas <robertmhaas@gmail.com> wrote:
> 
> On Wed, Jul 23, 2025 at 11:44 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > The fix looks good to me. I'll push your patch in sometime.
> 
> The tests in this patch are insufficient to prove that this logic works properly. I
> tried with this patch:
> 
> diff --git a/src/backend/access/transam/twophase.c
> b/src/backend/access/transam/twophase.c
> index 7918176fc58..7c1d0ef07b5 100644
> --- a/src/backend/access/transam/twophase.c
> +++ b/src/backend/access/transam/twophase.c
> @@ -2325,6 +2325,16 @@ RecordTransactionCommitPrepared(TransactionId
> xid,
>      TimestampTz committs;
>      bool        replorigin;
> 
> +    /*
> +     * Note it is important to set committs value after marking ourselves as
> +     * in the commit critical section (DELAY_CHKPT_IN_COMMIT). This is
> because
> +     * we want to ensure all transactions that have acquired commit
> timestamp
> +     * are finished before we allow the logical replication client to advance
> +     * its xid which is used to hold back dead rows for conflict detection.
> +     * See comments atop worker.c.
> +     */
> +    committs = GetCurrentTimestamp();
> +
>      /*
>       * Are we using the replication origins feature?  Or, in other words, are
>       * we replaying remote actions?
> @@ -2344,16 +2354,6 @@ RecordTransactionCommitPrepared(TransactionId
> xid,
>       */
>      pg_write_barrier();
> 
> -    /*
> -     * Note it is important to set committs value after marking ourselves as
> -     * in the commit critical section (DELAY_CHKPT_IN_COMMIT). This is
> because
> -     * we want to ensure all transactions that have acquired commit
> timestamp
> -     * are finished before we allow the logical replication client to advance
> -     * its xid which is used to hold back dead rows for conflict detection.
> -     * See comments atop worker.c.
> -     */
> -    committs = GetCurrentTimestamp();
> -
>      /*
>       * Emit the XLOG commit record. Note that we mark 2PC commits as
>       * potentially having AccessExclusiveLocks since we don't know whether
> or
> 
> If it is in fact important to acquire the commit timestamp after setting
> delayChkptFlags, you'd hope this would lead to a test failure, but it doesn't for
> me. I understand it probably requires an injection point to be certain of hitting
> the race condition, but I think that would be worth doing. Otherwise, if
> something gets broken here by accident, it might be a long time before anyone
> notices.

Thanks for pointing it out!

I agree that adding a test is valuable to mitigate the risk of future code
changes. We added a similar safeguard for the RecordTransactionCommit() function
by adding Assert(xactStopTimestamp == 0) after marking the DELAY_CHKPT_xxx flag,
and did not do any precautionary check for RecordTransactionCommitPrepared as
the code to acquire the timestamp and setting flag was close by and had explicit
comments.

I'll prepare a test case that uses the injection point and share it in the
original thread.

Best Regards,
Hou zj

Re: pgsql: Preserve conflict-relevant data during logical replication.

От
Amit Kapila
Дата:
On Wed, Sep 3, 2025 at 5:10 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Tuesday, September 2, 2025 11:03 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Wed, Jul 23, 2025 at 11:44 PM Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> >
> > If it is in fact important to acquire the commit timestamp after setting
> > delayChkptFlags, you'd hope this would lead to a test failure, but it doesn't for
> > me. I understand it probably requires an injection point to be certain of hitting
> > the race condition, but I think that would be worth doing. Otherwise, if
> > something gets broken here by accident, it might be a long time before anyone
> > notices.
>
> Thanks for pointing it out!
>
> I agree that adding a test is valuable to mitigate the risk of future code
> changes. We added a similar safeguard for the RecordTransactionCommit() function
> by adding Assert(xactStopTimestamp == 0) after marking the DELAY_CHKPT_xxx flag,
> and did not do any precautionary check for RecordTransactionCommitPrepared as
> the code to acquire the timestamp and setting flag was close by and had explicit
> comments.
>
> I'll prepare a test case that uses the injection point and share it in the
> original thread.
>

The test for this case is added in commit
6456c6e2c4ad1cf9752e09cce37bfcfe2190c5e0.

--
With Regards,
Amit Kapila.



Re: pgsql: Preserve conflict-relevant data during logical replication.

От
Robert Haas
Дата:
On Wed, Sep 10, 2025 at 5:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> The test for this case is added in commit
> 6456c6e2c4ad1cf9752e09cce37bfcfe2190c5e0.

Thanks!

--
Robert Haas
EDB: http://www.enterprisedb.com