Обсуждение: BUG #18241: PushTransaction may cause Standby to execute ItemIdMarkDead

Поиск
Список
Период
Сортировка

BUG #18241: PushTransaction may cause Standby to execute ItemIdMarkDead

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      18241
Logged by:          Fei Changhong
Email address:      feichanghong@qq.com
PostgreSQL version: 16.1
Operating system:   Operating system: centos 7,Kernel version: 5.10.13
Description:

In scenarios involving subtransactions, we've discovered that Standby might
mark index items as DEAD, which contravenes the design principle that
"killed tuples" should not emerge during recovery, as stated in the comments
of the RelationGetIndexScan function:
    /*
     * During recovery we ignore killed tuples and don't bother to kill them
     * either. We do this because the xmin on the primary node could easily
be
     * later than the xmin on the standby node, so that what the primary
     * thinks is killed is supposed to be visible on standby. So for correct
     * MVCC for queries during recovery we must ignore these hints and check
     * all tuples. Do *not* set ignore_killed_tuples to true when running in
a
     * transaction that was started during recovery. xactStartedInRecovery
     * should not be altered by index AMs.
     */

The issue can be reproduced through the following steps:
(1) Initialize data on the Primary
```
create table t(a int, b int);
alter table t alter column b type bigint;
```
(2)Standby gdb breakpoint at _bt_killitems
```
begin;
savepoint savepoint_a;
explain analyze select * from t;
```
We can observe that the query enters the _bt_killitems function, and an
index item is marked as dead by the ItemIdMarkDead.

We have analyzed the cause of the problem: The value of
IndexScanDesc->xactStartedInRecovery is a key condition for determining
whether an index item can be marked as DEAD. And it depends on
CurrentTransactionState->startedInRecovery. However, PushTransaction does
not assign a value to startedInRecovery when modifying
CurrentTransactionState.

Additionally, I have another question I'd like to ask: What specific
problems might arise from a Standby marking index items as DEAD?


Re:BUG #18241: PushTransaction may cause Standby to execute ItemIdMarkDead

От
"feichanghong"
Дата:
> We have analyzed the cause of the problem: The value of
> IndexScanDesc->xactStartedInRecovery is a key condition for determining
> whether an index item can be marked as DEAD. And it depends on
> CurrentTransactionState->startedInRecovery. However, PushTransaction does
> not assign a value to startedInRecovery when modifying
> CurrentTransactionState.
The attached patch has been verified to resolve the mentioned issue.
Вложения

Re: BUG #18241: PushTransaction may cause Standby to execute ItemIdMarkDead

От
Kyotaro Horiguchi
Дата:
At Mon, 11 Dec 2023 20:13:23 +0800, "feichanghong" <feichanghong@qq.com> wrote in 
> > We have analyzed the cause of the problem: The value of
> > IndexScanDesc->xactStartedInRecovery is a key condition for determining
> > whether an index item can be marked as DEAD. And it depends on
> > CurrentTransactionState->startedInRecovery. However, PushTransaction does
> > not assign a value to startedInRecovery when modifying
> > CurrentTransactionState.
> The attached patch has been verified to resolve the mentioned issue.

This appears to be a bug that has existed for a long time since commit
efc16ea520 (in 2009). Your fix looks correct to me, but as for me, the
comment is not particularly necessary, and it would be sufficient to
insert the new line in the location according to the member order
within TransactionStateData.

I have briefly checked and found no issues with other struct members
of TransactionStateData.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: BUG #18241: PushTransaction may cause Standby to execute ItemIdMarkDead

От
Michael Paquier
Дата:
On Tue, Dec 12, 2023 at 01:49:00PM +0900, Kyotaro Horiguchi wrote:
> This appears to be a bug that has existed for a long time since commit
> efc16ea520 (in 2009). Your fix looks correct to me, but as for me, the
> comment is not particularly necessary, and it would be sufficient to
> insert the new line in the location according to the member order
> within TransactionStateData.

Oops.  It's surprising that this has never been diagnosed but at the
same time I don't really see subtransactions being a common pattern in
a read-only workload for a standby, and it can easily cause MVCC
issues by removing tuples too eagerly and a standby may still need
them.  An issue if that this could cause problems if you do catalog
scans, which may explain a few bugs I recall seeing over the years,
even if these did not directly mention the use of ssavepoints.  I'd
need to double-check the archives.  If going through a driver layer
like the ODBC driver that enforces savepoints for each query, that
would be bad.

Your fix sounds good to me (no need for a comment), I'll take care of
it after looking a bit more at the area.
--
Michael

Вложения

Re: BUG #18241: PushTransaction may cause Standby to execute ItemIdMarkDead

От
Alvaro Herrera
Дата:
On 2023-Dec-12, Michael Paquier wrote:

> An issue if that this could cause problems if you do catalog
> scans, which may explain a few bugs I recall seeing over the years,
> even if these did not directly mention the use of ssavepoints.  I'd
> need to double-check the archives.  If going through a driver layer
> like the ODBC driver that enforces savepoints for each query, that
> would be bad. 

What a horrible bug.  Thanks for pushing the fix.  It looks OK to me:
nowhere else do we create a TransactionStateData that would need the
initialization.

I wonder if this can cause data corruption, if you happen to have a
broken standby that improperly kills some index entry and is later
promoted.  Maybe this bug explains these mysterious cases where an index
seems to lack a pointer to some heap tuple for no known reason --
especially notorious when you try to REINDEX a unique index and that
turns out not to work due to duplicates.

I would be happier if the order of initialization of the struct member
followed the order to the struct, with comments to note the missing
members.  That might make it more visible to future patches that would
add more members.  Something like this:

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 8442c5e6a7..b5874df821 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5291,17 +5291,25 @@ PushTransaction(void)
      */
     s->fullTransactionId = InvalidFullTransactionId;    /* until assigned */
     s->subTransactionId = currentSubTransactionId;
-    s->parent = p;
-    s->nestingLevel = p->nestingLevel + 1;
-    s->gucNestLevel = NewGUCNestLevel();
+    /* s->name = NULL; */
     s->savepointLevel = p->savepointLevel;
     s->state = TRANS_DEFAULT;
     s->blockState = TBLOCK_SUBBEGIN;
+    s->nestingLevel = p->nestingLevel + 1;
+    s->gucNestLevel = NewGUCNestLevel();
+    /* s->curTransactionContext = NULL; */
+    /* s->curTransactionOwner = NULL */
+    /* s->childXids = NULL; */
+    /* s->nChildXids = 0; */
+    /* s->maxChildXids = 0; */
     GetUserIdAndSecContext(&s->prevUser, &s->prevSecContext);
     s->prevXactReadOnly = XactReadOnly;
     s->startedInRecovery = p->startedInRecovery;
+    /* s->didLogXid = false; */
     s->parallelModeLevel = 0;
+    /* s->chain = false; */
     s->topXidLogged = false;
+    s->parent = p;
 
     CurrentTransactionState = s;
 

Also, the way this function checks for overflow is strange.  Why
increment then check, leading to a forced free and decrement, instead of
just testing and throwing error right away?  It's less code:

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 8442c5e6a7..fac9141b16 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5272,18 +5272,14 @@ PushTransaction(void)
         MemoryContextAllocZero(TopTransactionContext,
                                sizeof(TransactionStateData));
 
-    /*
-     * Assign a subtransaction ID, watching out for counter wraparound.
-     */
-    currentSubTransactionId += 1;
-    if (currentSubTransactionId == InvalidSubTransactionId)
-    {
-        currentSubTransactionId -= 1;
-        pfree(s);
+    /* Check for overflow before incrementing the counter */
+    if (currentSubTransactionId + 1 == InvalidSubTransactionId)
         ereport(ERROR,
                 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                  errmsg("cannot have more than 2^32-1 subtransactions in a transaction")));
-    }
+
+    /* Now we can increment it without fear */
+    currentSubTransactionId += 1;
 
     /*
      * We can now stack a minimally valid subtransaction without fear of

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"El que vive para el futuro es un iluso, y el que vive para el pasado,
un imbécil" (Luis Adler, "Los tripulantes de la noche")



Re: BUG #18241: PushTransaction may cause Standby to execute ItemIdMarkDead

От
Kyotaro Horiguchi
Дата:
At Tue, 12 Dec 2023 18:11:44 +0100, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in 
> On 2023-Dec-12, Michael Paquier wrote:
> 
> > An issue if that this could cause problems if you do catalog
> > scans, which may explain a few bugs I recall seeing over the years,
> > even if these did not directly mention the use of ssavepoints.  I'd
> > need to double-check the archives.  If going through a driver layer
> > like the ODBC driver that enforces savepoints for each query, that
> > would be bad. 
> 
> What a horrible bug.  Thanks for pushing the fix.  It looks OK to me:
> nowhere else do we create a TransactionStateData that would need the
> initialization.
> 
> I wonder if this can cause data corruption, if you happen to have a
> broken standby that improperly kills some index entry and is later
> promoted.  Maybe this bug explains these mysterious cases where an index
> seems to lack a pointer to some heap tuple for no known reason --
> especially notorious when you try to REINDEX a unique index and that
> turns out not to work due to duplicates.
> 
> I would be happier if the order of initialization of the struct member
> followed the order to the struct, with comments to note the missing
> members.  That might make it more visible to future patches that would
> add more members.  Something like this:

The thought briefly crossed my mind; perhaps we could also comment out
parallelModeLevel (the same can be done for topXidLogged, but I'm not
sure it's worthwhile).

> Also, the way this function checks for overflow is strange.  Why
> increment then check, leading to a forced free and decrement, instead of
> just testing and throwing error right away?  It's less code:

The current code seems to primarily focus on reducing the number of
add operations in the normal path.
> 
> diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
> index 8442c5e6a7..fac9141b16 100644
> --- a/src/backend/access/transam/xact.c
> +++ b/src/backend/access/transam/xact.c
> @@ -5272,18 +5272,14 @@ PushTransaction(void)
>          MemoryContextAllocZero(TopTransactionContext,
>                                 sizeof(TransactionStateData));
>  
> -    /*
> -     * Assign a subtransaction ID, watching out for counter wraparound.
> -     */
> -    currentSubTransactionId += 1;
> -    if (currentSubTransactionId == InvalidSubTransactionId)
> -    {
> -        currentSubTransactionId -= 1;
> -        pfree(s);
> +    /* Check for overflow before incrementing the counter */
> +    if (currentSubTransactionId + 1 == InvalidSubTransactionId)
>          ereport(ERROR,
>                  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
>                   errmsg("cannot have more than 2^32-1 subtransactions in a transaction")));
> -    }
> +
> +    /* Now we can increment it without fear */
> +    currentSubTransactionId += 1;
>  
>      /*
>       * We can now stack a minimally valid subtransaction without fear of
> 
> -- 

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: BUG #18241: PushTransaction may cause Standby to execute ItemIdMarkDead

От
Michael Paquier
Дата:
On Tue, Dec 12, 2023 at 06:11:44PM +0100, Alvaro Herrera wrote:
> What a horrible bug.  Thanks for pushing the fix.  It looks OK to me:
> nowhere else do we create a TransactionStateData that would need the
> initialization.

Yep.

> I wonder if this can cause data corruption, if you happen to have a
> broken standby that improperly kills some index entry and is later
> promoted.  Maybe this bug explains these mysterious cases where an index
> seems to lack a pointer to some heap tuple for no known reason --
> especially notorious when you try to REINDEX a unique index and that
> turns out not to work due to duplicates.

AFAIK, this only impacts the visibility of the data retrieved in
transactions began when recovery was still running when doing an index
scan, and, even if the HOT chains would get inconsistent, we'd still
clean up them one the xmin gets updated on the standby because there
are no snapshots requiring it.  REINDEX after promotion does not looks
like an issue to me as we'd take a lock or wait for the necessary
snapshots to be gone in the concurrent case.  Or perhaps my
imagination with the HOT chains is too limited?

> I would be happier if the order of initialization of the struct member
> followed the order to the struct, with comments to note the missing
> members.  That might make it more visible to future patches that would
> add more members.  Something like this:

That crossed my mind, but it is not like we have other code paths
where we assume some fields to be initialized based on a palloc0() or
similar, so I don't see why we'd need that for this case.

> Also, the way this function checks for overflow is strange.  Why
> increment then check, leading to a forced free and decrement, instead of
> just testing and throwing error right away?  It's less code:

Saying that, agreed that this suggestion is an improvement compared to
what's on HEAD.  So +1 from me to change this code as you are
suggesting.
--
Michael

Вложения