Обсуждение: Fix missing EvalPlanQual recheck for TID scans

Поиск

Список

Период

Сортировка

Fix missing EvalPlanQual recheck for TID scans

От

"Sophie Alpert"

Дата:

06 сентября, 10:54:08

Hi all,

I learned that the TID Scan node does not properly implement EvalPlanQual recheck, which has already been noted in
commentadded in 2002 (commit 6799a6ca21). This does seem to have the potential to manifest in observable and surprising
wayswhen queries are filtering on `ctid`, like for optimistic concurrency control as in
https://stackoverflow.com/q/78487441.

Attached is an attempt from me at fixing this issue for TID Scan as well as TID Range Scan which appears to have the
sameissue.
 

This is my first time looking at the Postgres source (nor do I work with C on a regular basis), so I will not be
surprisedto hear that I've done things wrong, but I'm hopeful that this is usable as written. Running `meson test`
passesfor me, including the added isolation tests, and I confirmed that both of the tests that now result in a negative
recheckwere previously failing.
 

In my implementation I am calling TidRangeEval every time TidRangeRecheck is called; let me know if this is a
performanceconcern. It didn't seem that any of the existing TidRangeScanState flags are quite right to know if the
boundshave been initialized, but I am happy to add one.
 

The original issue appears to have been present since the introduction of TID scans in 1999 (commit 6f9ff92cc0) so I
imagineit may make sense to backport the fix, although evidently not many people are running into this.
 

Thanks,
Sophie

Вложения

0001-Fix-missing-EvalPlanQual-recheck-for-TID-scans.v1.patch

Re: Fix missing EvalPlanQual recheck for TID scans

От

David Rowley

Дата:

08 сентября, 07:51:08

On Mon, 8 Sept 2025 at 10:31, Sophie Alpert <pg@sophiebits.com> wrote:
> I learned that the TID Scan node does not properly implement EvalPlanQual recheck, which has already been noted in
commentadded in 2002 (commit 6799a6ca21). This does seem to have the potential to manifest in observable and surprising
wayswhen queries are filtering on `ctid`, like for optimistic concurrency control as in
https://stackoverflow.com/q/78487441.

Thanks for the report and the patch.

I'll summarise this issue, as you provided an external link which
might one day disappear:

drop table if exists test_optimistic_lock;
create table public.test_optimistic_lock (
    id bigserial primary key,
    name varchar,
    rest_count integer,
    update_user varchar,
    update_time timestamp without time zone,
    create_time timestamp without time zone
);

insert into public.test_optimistic_lock (name, rest_count,
update_time, create_time) values
('Product1', 1, current_timestamp, current_timestamp);

S1: begin;
S1: select ctid, rest_count from public.test_optimistic_lock where
name = 'Product1';

S2: begin;
S2: select ctid, rest_count from public.test_optimistic_lock where
name = 'Product1';

S1: update public.test_optimistic_lock set rest_count = rest_count -
1, update_user = 'A', update_time = current_timestamp where name =
'Product1' and ctid = '(0,1)';
S2: update public.test_optimistic_lock set rest_count = rest_count -
1, update_user = 'A', update_time = current_timestamp where name =
'Product1' and ctid = '(0,1)'; -- waits for S1

S1: commit:
S2: Gets UPDATE 1 when it should get UPDATE 0!!

I agree that this is a bug and we should fix it.  The behaviour should
match what you'd get if you ran it with "SET enable_tidscan = 0;".
When that's done, the Seq Scan will have the ctid related expressions
within the qual and it will correctly filter out the non-matching
tuple.

> Attached is an attempt from me at fixing this issue for TID Scan as well as TID Range Scan which appears to have the
sameissue. 

I think this is generally pretty good. Here's a quick review:

1. This part is quite annoying:

+ if (node->tss_TidList == NULL)
+ TidListEval(node);

Of course, it's required since ExecReScanTidScan() wiped out that list.

Maybe we can add a boolean variable named tts_inScan to TidScanState
after tss_isCurrentOf (so as it consumes the existing padding bytes
and does not enlarge that struct), and have ExecReScanTidScan() set
that to false rather than wipe out the tss_TidList. Then just reuse
the existing list in TidRecheck(). That should save on the extra
overhead of having to rebuild the list each time there's an EPQ
recheck.

2. For TidRangeRecheck, I don't see why this part is needed:

+ if (!TidRangeEval(node))
+ return false;

The TID range is preserved already and shouldn't need to be recalculated.

3. In TidRecheck(), can you make "match" an "ItemPointer" and do:
match = (ItemPointer) bsearch(...);

4. Can you put this comment above the "if"?

+ if (node->tss_isCurrentOf)
+ /* WHERE CURRENT OF always intends to resolve to the latest tuple */
+ return true;

What you've got there doesn't meet the style we use. Technically there
should be braces because there are multiple lines (per our style (not
C)), but it's a bit needless to do that as the comment makes the same
amount of sense if it goes before the "if".

5. Can you make TidRangeRecheck() look like this?

static bool
TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot)
{
    /* Recheck the ctid is still within range */
    if (ItemPointerCompare(&slot->tts_tid, &node->trss_mintid) < 0 ||
        ItemPointerCompare(&slot->tts_tid, &node->trss_maxtid) > 0)
        return false;

    return true;
}

The reason being that it's closer to how heap_getnextslot_tidrange()
does it, and I'd rather the conditions were kept as similar as
possible to that function.

6. For the tests. It should be ok to make the Tid range scan test do
ctid BETWEEN '(0,1)' AND '(0,1)'. I feel this is more aligned to the
TID Range Scan version of what you're doing in the TID Scan test.

I'll need to give the tests a closer inspection later.

David

Re: Fix missing EvalPlanQual recheck for TID scans

От

"Sophie Alpert"

Дата:

08 сентября, 09:57:21

Thanks for taking a look.

On Sun, Sep  7, 2025 at  9:51 PM, David Rowley <dgrowleyml@gmail.com> wrote:
> I agree that this is a bug and we should fix it.  The behaviour should
> match what you'd get if you ran it with "SET enable_tidscan = 0;".

I've confirmed that my new tests already pass on current master if `SET enable_tidscan = off;` is added to both session
setups.

> 1. This part is quite annoying:
>
> + if (node->tss_TidList == NULL)
> + TidListEval(node);
>
> Of course, it's required since ExecReScanTidScan() wiped out that list.
>
> Maybe we can add a boolean variable named tts_inScan to TidScanState
> after tss_isCurrentOf (so as it consumes the existing padding bytes
> and does not enlarge that struct), and have ExecReScanTidScan() set
> that to false rather than wipe out the tss_TidList. Then just reuse
> the existing list in TidRecheck(). That should save on the extra
> overhead of having to rebuild the list each time there's an EPQ
> recheck.
>
> 2. For TidRangeRecheck, I don't see why this part is needed:
>
> + if (!TidRangeEval(node))
> + return false;
>
> The TID range is preserved already and shouldn't need to be recalculated.

I am a bit unclear on the exact logic around what order Next, ReScan, and Recheck are called in, so I may have written
thecode too defensively. Concretely: For ExecReScanTidScan wiping out tss_TidList, I was thinking that when
ExecReScanTidScanis called, there may be changed params that require the list to be recalculated. I also wasn't sure if
Recheckcan be called without a preceding Next and/or ReScan having been called with the same params, or if it can
alwaysrely on those having been called immediately prior. (I had reviewed IndexRecheck as I figured it seems the most
likelyto be a correct example here, but I wasn't able to infer from how it's written.)

If you could help clarify the guarantees here I'd appreciate it. For what it's worth, if I test removing the
tss_TidListclearing in ExecReScanTidScan and recomputation in TidRecheck (of course this is not correct in general),
thenmy `tid1 tidsucceed2 c1 c2` test now fails due to TidRecheck incorrectly returning false.

I'm actually digging in more now and if I'm reading the code correctly, EvalPlanQualStart calls ExecInitNode to create
anentirely separate PlanState tree for EPQ so I'm unsure if any of the state is carried over to the Recheck calls? If
I'munderstandign correctly, then it seems the best we can do in this patch is to leave TidRecheck as I had it but for
TidRangeRecheckI can add a node->trss_boundsInitialized flag, so that we recompute it once per EPQ instead of per tuple
checked.

(I'll be happy to make the stylistic changes you mentioned.)

Appreciate your time,

Sophie

Re: Fix missing EvalPlanQual recheck for TID scans

От

David Rowley

Дата:

08 сентября, 12:15:37

On Mon, 8 Sept 2025 at 18:57, Sophie Alpert <pg@sophiebits.com> wrote:
> I'm actually digging in more now and if I'm reading the code correctly, EvalPlanQualStart calls ExecInitNode to
createan entirely separate PlanState tree for EPQ so I'm unsure if any of the state is carried over to the Recheck
calls?

Oh, you're right. The EPQ executor start isn't the same as the normal
executor state, so the recalc seems to be needed.

David

Re: Fix missing EvalPlanQual recheck for TID scans

От

"Sophie Alpert"

Дата:

08 сентября, 19:35:53

Updated patch attached.

On Sun, Sep  7, 2025 at  9:51 PM, David Rowley <dgrowleyml@gmail.com> wrote:
> 1. This part is quite annoying:
>
> + if (node->tss_TidList == NULL)
> + TidListEval(node);
>
> Of course, it's required since ExecReScanTidScan() wiped out that list.

Given that EPQ uses separate PlanState, I've left this as is.

> 2. For TidRangeRecheck, I don't see why this part is needed:
>
> + if (!TidRangeEval(node))
> + return false;
>
> The TID range is preserved already and shouldn't need to be recalculated.

I've added a new trss_boundsInitialized flag such that we calculate the range once per EPQ rescan. In order to preserve
thesemantics when the min or max is NULL, I'm setting trss_mintid/trss_maxtid to have invalid ItemPointers as a
sentinelin the cases where TidRangeEval returns false. I added a ItemPointerIsValid assertion given that it's now more
relevantto correctness but I can remove it if it feels superfluous. Let me know if there is a more idiomatic way to
treatthis.

> 3. In TidRecheck(), can you make "match" an "ItemPointer" and do:
> match = (ItemPointer) bsearch(...);
> 4. Can you put this comment above the "if"?
> 5. Can you make TidRangeRecheck() look like this?
> 6. For the tests. It should be ok to make the Tid range scan test do
> ctid BETWEEN '(0,1)' AND '(0,1)'. I feel this is more aligned to the
> TID Range Scan version of what you're doing in the TID Scan test.

All done.

Do let me know if other changes would be helpful.

Sophie

Вложения

0001-Fix-missing-EvalPlanQual-recheck-for-TID-scans.v2.patch

Re: Fix missing EvalPlanQual recheck for TID scans

От

Chao Li

Дата:

09 сентября, 10:52:54

Hi Sophie,

Thanks for the fix. We do have a lot of applications that use ctid to update/delete rows for performance considerations.

On Sep 8, 2025, at 17:15, David Rowley <dgrowleyml@gmail.com> wrote:

On Mon, 8 Sept 2025 at 18:57, Sophie Alpert <pg@sophiebits.com> wrote:
I'm actually digging in more now and if I'm reading the code correctly, EvalPlanQualStart calls ExecInitNode to create an entirely separate PlanState tree for EPQ so I'm unsure if any of the state is carried over to the Recheck calls?

Oh, you're right. The EPQ executor start isn't the same as the normal
executor state, so the recalc seems to be needed.

David

Following the reproduce procedure that David provided, I tested and traced the fix, basically it works for me.

However, I noticed that, when “TidRecheck()” is called, it is actually passed with “epqstate->recheckplanstate” that is always newly created, which means we repeatedly call “TidListEval(node)” and run “bsearch()” when there are multiple row involved in the update. If update qual is just “ctid = <a single ctid>”, that should fine. But if we do “update .. where ctid in (a list of ctid)”, then duplicately call “TidListEval()” might not be a good thing.

As “TidRecheck(TidScanState *node, TupleTableSlot *slot)” gets the second parameter “slot” with the updated slot, so I am thinking the other solution could be that, we just check if the updated slot is visible to current transaction. So I made a local change like this:

```

static bool
TidRecheck(TidScanState *node, TupleTableSlot *slot)
{
HeapTuple tuple;
bool visible;

if (node->tss_isCurrentOf)
 /* WHERE CURRENT OF always intends to resolve to the latest tuple */
return true;

tuple = ExecCopySlotHeapTuple(slot);
visible = HeapTupleSatisfiesVisibility(tuple, GetActiveSnapshot(), slot->tts_tableOid);
return visible;
}

```

This seems also work for me. WDYT?

Best regards,

Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Fix missing EvalPlanQual recheck for TID scans

От

"Sophie Alpert"

Дата:

09 сентября, 21:20:47

Hi Chao,

Thanks for taking a look.

On Tue, Sep  9, 2025 at 12:52 AM, Chao Li <li.evan.chao@gmail.com> wrote:
> However, I noticed that, when “TidRecheck()” is called, it is actually passed with “epqstate->recheckplanstate” that
isalways newly created, which means we repeatedly call “TidListEval(node)” and run “bsearch()” when there are multiple
rowinvolved in the update. If update qual is just “ctid = <a single ctid>”, that should fine. But if we do “update ..
wherectid in (a list of ctid)”, then duplicately call “TidListEval()” might not be a good thing. 

Based on my current understanding, the EPQ PlanState is shared across all EPQ
checks for a given plan and that ReScan is called at similar times as it is for
the initial access path with, ExecScanFetch delegating to the appropriate *Next
or *Recheck method as each row is processed. Therefore, my patch does not
recompute the TidList for every row.

> As “TidRecheck(TidScanState *node, TupleTableSlot *slot)” gets the second parameter “slot” with the updated slot, so
Iam thinking the other solution could be that, we just check if the updated slot is visible to current transaction. So
Imade a local change like this: 
>
> ...
> tuple = ExecCopySlotHeapTuple(slot);
> visible = HeapTupleSatisfiesVisibility(tuple, GetActiveSnapshot(), slot->tts_tableOid);

This is semantically different from the patch I've proposed. My test
`permutation tid1 tidsucceed2 c1 c2 read` fails against your code, despite
passing with the `SET enable_tidscan = OFF;` flag that I understand we want to
match the behavior of. (In addition, I understand HeapTupleSatisfiesVisibility
is specific to the heap access method and can't be used here, though perhaps
tuple_fetch_row_version could be used instead.)

Sophie

Re: Fix missing EvalPlanQual recheck for TID scans

От

Chao Li

Дата:

10 сентября, 08:18:41

On Sep 10, 2025, at 02:20, Sophie Alpert <pg@sophiebits.com> wrote:

Hi Chao,

Thanks for taking a look.

On Tue, Sep 9, 2025 at 12:52 AM, Chao Li <li.evan.chao@gmail.com> wrote:
However, I noticed that, when “TidRecheck()” is called, it is actually passed with “epqstate->recheckplanstate” that is always newly created, which means we repeatedly call “TidListEval(node)” and run “bsearch()” when there are multiple row involved in the update. If update qual is just “ctid = <a single ctid>”, that should fine. But if we do “update .. where ctid in (a list of ctid)”, then duplicately call “TidListEval()” might not be a good thing.

Based on my current understanding, the EPQ PlanState is shared across all EPQ
checks for a given plan and that ReScan is called at similar times as it is for
the initial access path with, ExecScanFetch delegating to the appropriate *Next
or *Recheck method as each row is processed. Therefore, my patch does not
recompute the TidList for every row.

No, that’s not true. If you extend David’s procedure and use 3 sessions to reproduce the problem, s1 update (0,1), s2 update (0,2) and s3 update (0,1) or (0,2), and trace the backend process of s3, you will see every time when TidRecheck() is called, “node” is brand new, so it has to call TidListEval(node) every time.

As “TidRecheck(TidScanState *node, TupleTableSlot *slot)” gets the second parameter “slot” with the updated slot, so I am thinking the other solution could be that, we just check if the updated slot is visible to current transaction. So I made a local change like this:

...
tuple = ExecCopySlotHeapTuple(slot);
visible = HeapTupleSatisfiesVisibility(tuple, GetActiveSnapshot(), slot->tts_tableOid);

This is semantically different from the patch I've proposed. My test
`permutation tid1 tidsucceed2 c1 c2 read` fails against your code, despite
passing with the `SET enable_tidscan = OFF;` flag that I understand we want to
match the behavior of. (In addition, I understand HeapTupleSatisfiesVisibility
is specific to the heap access method and can't be used here, though perhaps
tuple_fetch_row_version could be used instead.)

I think TidScan only applies to HeapTable, so we can use HeapTupleSatisfiesVisibility(0 in TidRecheck().

With my proposal, my theory is that, TidNext() fetches a tuple by ctid, if other concurrent transaction update/delete the tuple, the tuple's visibility changes.

Best regards,

Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Fix missing EvalPlanQual recheck for TID scans

От

Chao Li

Дата:

11 сентября, 08:30:20

On Sep 10, 2025, at 13:18, Chao Li <li.evan.chao@gmail.com> wrote:

With my proposal, my theory is that, TidNext() fetches a tuple by ctid, if other concurrent transaction update/delete the tuple, the tuple's visibility changes.

I did some more debugging on this issue today, and I withdraw my previous proposal of checking visibility.

I can confirm that, every time when TidRecheck() is called, “node” is brand new, so the current patch do duplicately calculate TidListEval().

But, why can't we make TidRecheck() to simplify return FALSE?

Looks like the only case where TidRecheck() is called is a concurrent transaction upadated the row, and in that case, ctid have must be changed.

The following case WON’T call TidRecheck():

Case 1. Concurrent delete

=====

S1:

Begin;

delete t where ctid = ‘(0,1)’

S2:

Update t set something where ctid = ‘(0,1)’; // block

S1:

Commit;

S2 will just fail to update the row because the row has been deleted by s1.

Case 2: select for update/share

======

S1:

Begin;

Select * from t where ctid = ‘(0,1)’ for share;

S2:

Update t set something where ctid = ‘(0,1)’; // block

S1:

Commit;

S2 will just successfully update the row, because s1 release the lock and the row is still there.

Case 3: join update

======

S1:

Begin;

Update t2 set something where id = 1;

S2:

Update t2 set something where id = (select id from t where ctid = ‘(0, 1)’); // block

S1: commit

S2 will successfully update t2’s row. In this process, index scan’t recheck against t2 will be called, TidRecheck() against t will not be called.

======

Unless I missed some cases, we can simply return FALSE from TidRecheck().

Best regards,

Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Fix missing EvalPlanQual recheck for TID scans

От

"Sophie Alpert"

Дата:

14 сентября, 01:12:04

On Tue, Sep  9, 2025 at 10:18 PM, Chao Li <li.evan.chao@gmail.com> wrote:
> No, that’s not true. If you extend David’s procedure and use 3 sessions to reproduce the problem, s1 update (0,1), s2
update(0,2) and s3 update (0,1) or (0,2), and trace the backend process of s3, you will see every time when
TidRecheck()is called, “node” is brand new, so it has to call TidListEval(node) every time. 

After more investigation I agree you are correct here; in a single query, ReScan is called once for each subsequent
tuplebeing rechecked. (ExecUpdate calls EvalPlanQualBegin which always sets rcplanstate->chgParam.) 

On Wed, Sep 10, 2025 at 10:30 PM, Chao Li <li.evan.chao@gmail.com> wrote:
> But, why can't we make TidRecheck() to simplify return FALSE?
>
> Looks like the only case where TidRecheck() is called is a concurrent transaction upadated the row, and in that case,
ctidhave must be changed. 

Although returning FALSE is clever, it is only correct if several other unstated preconditions for the function hold,
whichI am loath to rely on. 

And indeed, like I mentioned in my previous message, my isolation test `permutation tid1 tidsucceed2 c1 c2 read` from
eval-plan-qual.specin my patch will fail if Recheck were to return false in this case. Though somewhat contrived, you
canimagine this happening with multiple sessions driven by the same application: 

setup: two rows exist with ctid=(0,1) and (0,2)
S1: BEGIN;
S2: BEGIN;
S1: UPDATE WHERE ctid=(0,1) RETURNING ctid;
-- returns (0,3), which the application uses in the next query from another session:
S2: UPDATE WHERE ctid=(0,1) OR ctid=(0,3); -- statement snapshot sees (0,1); recheck will see (0,3) and should pass
S1: COMMIT;
S2: COMMIT;

I would not defend this as good code from an application developer but the behavior is observable. So I understand it
wouldbe best to match the enable_tidscan = off behavior, which my existing strategy more verifiably does. Of course if
theteam disagrees with me then I will defer to everyone's better judgement. 

I hope it is rare that many tuples need to be rechecked in a single query but if this is common, then I would suggest
itwould be better to rework the generic EPQ machinery so that EPQ nodes do not need to be not reset for every row,
ratherthan depending on subtle implicit guarantees in the TidRecheck code. 

Sophie

Re: Fix missing EvalPlanQual recheck for TID scans

От

"Sophie Alpert"

Дата:

14 сентября, 06:02:00

On Sat, Sep 13, 2025 at  3:12 PM, Sophie Alpert <pg@sophiebits.com> wrote:
> And indeed, like I mentioned in my previous message, my isolation test 
> `permutation tid1 tidsucceed2 c1 c2 read` from eval-plan-qual.spec in 
> my patch will fail if Recheck were to return false in this case. Though 
> somewhat contrived, you can imagine this happening with multiple 
> sessions driven by the same application:

Another case where returning FALSE does not give the correct behavior is when two relations are involved, only one of
whichis modified:
 

S1: BEGIN;
S2: BEGIN;
S1: UPDATE accounts SET balance = balance + 100 WHERE ctid = '(0,1)' RETURNING accountid, balance;
S2: SELECT * FROM accounts JOIN accounts_ext USING (accountid) WHERE accounts_ext.ctid = '(0,1)' FOR UPDATE OF
accounts;
S1: COMMIT;
S2: COMMIT;

In my patch the S2 query correctly returns one row, whereas with your proposed change it incorrectly returns none.

accountid|balance|balance2|balance|other|newcol|newcol2
---------+-------+--------+-------+-----+------+-------
checking |    700|    1400|    600|other|    42|       

Sophie

Re: Fix missing EvalPlanQual recheck for TID scans

От

Chao Li

Дата:

15 сентября, 04:49:22

On Sep 14, 2025, at 06:12, Sophie Alpert <pg@sophiebits.com> wrote:

And indeed, like I mentioned in my previous message, my isolation test `permutation tid1 tidsucceed2 c1 c2 read` from eval-plan-qual.spec in my patch will fail if Recheck were to return false in this case. Though somewhat contrived, you can imagine this happening with multiple sessions driven by the same application:

setup: two rows exist with ctid=(0,1) and (0,2)
S1: BEGIN;
S2: BEGIN;
S1: UPDATE WHERE ctid=(0,1) RETURNING ctid;
-- returns (0,3), which the application uses in the next query from another session:
S2: UPDATE WHERE ctid=(0,1) OR ctid=(0,3); -- statement snapshot sees (0,1); recheck will see (0,3) and should pass
S1: COMMIT;
S2: COMMIT;

I would not defend this as good code from an application developer but the behavior is observable. So I understand it would be best to match the enable_tidscan = off behavior, which my existing strategy more verifiably does. Of course if the team disagrees with me then I will defer to everyone's better judgement.

This is a wrong example and (0,3) should NOT be updated. According to the definition of “read committed”:

13.2. Transaction Isolation

postgresql.org

“A query sees only data committed before the query began”.

In this example, the update is s1 (0,3) is committed after s2’s update, so s2’s update should not see (0,3).

This behavior can be easily proved with a simple example with master branch:

S1:

```

evantest=# select * from t;

id | a | b

----+---+----

1 | 5 | 20

(1 row)

evantest=# begin;

BEGIN

evantest=*# update t set b=30 where id = 1;

UPDATE 1

evantest=*# insert into t values (2, 3, 4); # this simulate (0,3) in your example

INSERT 0 1

```

S2:

```

evantest=# select * from t;

id | a | b

----+---+----

1 | 5 | 20

(1 row)

evantest=# begin;

BEGIN

evantest=*# update t set b = 30; # block here

```

S1:

```

evantest=*# commit;

COMMIT

```

S2:

```

UPDATE 1

evantest=*# select * from t;

id | a | b

----+---+----

2 | 3 | 4 <=== the newly inserted tuple by s1 is NOT updated

1 | 5 | 30 <=== only the old tuple is updated

(2 rows)

```

This example also proves that your solution of TidListEval() is wrong, it may lead to unexpected update: (0,3) in your example.

It also proves the my proposal of checking visibility should work, because (0,3) is invisible to s2. And my proposal also works for your second example of doing “select for update” in s2, in that case, (0,1). After s1 committed, (0,1) is dead, so select of s2 should return nothing. The behavior can also be easily proved with a non-ctid example:

S1:

```

evantest=# begin;

BEGIN

evantest=*# update t set id=2 where id = 1;

UPDATE 1

```

S2:

```

evantest=# select * from t;

id | a | b

----+---+----

1 | 5 | 30

(1 row)

evantest=# select * from t where id = 1 for update; <=== block here

```

S1:

```

evantest=*# commit;

COMMIT

```

S2:

```

id | a | b

----+---+---

(0 rows) <=== s2 returned nothing, because id=1 is no long valid

```

And this example also proves my solution of checking visibility works.

And from this two examples, always returning FALSE seems to also work. But I am still not 100% sure if there are other use case that returning FALSE may not work. So, feels like checking visibility is a safe solution.

Best regards,

Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Вложения

elephant.png

Re: Fix missing EvalPlanQual recheck for TID scans

От

"Sophie Alpert"

Дата:

15 сентября, 06:09:32

On Sun, Sep 14, 2025 at  6:49 PM, Chao Li <li.evan.chao@gmail.com> wrote:
> This is a wrong example and (0,3) should NOT be updated. According to the definition of “read committed”:
> https://www.postgresql.org/docs/current/transaction-iso.html#XACT-READ-COMMITTED
> “A query sees only data committed before the query began”.

You paraphrased the docs here but did so incorrectly: the actual quote is "a SELECT query (without a FOR UPDATE/SHARE
clause)sees only data committed before the query began". We are not discussing the behavior of a plain SELECT query so
thisdescription is not relevant. For Update and LockRows, the expected EvalPlanQual behavior is that rows are checked
againstthe predicate twice — once as of the statement snapshot and once as of locking time — and the rows that match
bothtimes are used. 

In my example with ctid (0,3), the row matches the 'ctid = (0,1) OR ctid = (0,3)' predicate both times. The row is not
newlycreated, so the newly-created row in your example is not analogous. 

I continue to believe that my implementation of TidRecheck plainly satisfies the contract for what the scan recheck is
meantto do; the fact that it matches the enable_tidscan=OFF behavior is further corroboration of that fact. 

Sophie

Re: Fix missing EvalPlanQual recheck for TID scans

От

David Rowley

Дата:

15 сентября, 06:10:30

On Mon, 15 Sept 2025 at 13:49, Chao Li <li.evan.chao@gmail.com> wrote:
> This is a wrong example and (0,3) should NOT be updated. According to the definition of “read committed”:
>
> 13.2. Transaction Isolation
> postgresql.org
>
> “A query sees only data committed before the query began”.

I'm afraid your understanding of the documentation isn't correct.
Maybe you've observed some translated version which isn't correct, or
you've made a mistake in your translation back to English.

For the quoted line, I'm not quite sure where that comes from, but I
don't see it as it is in the documents. The particular part of the
documentation that's relevant is in [1].  The following is the
relevant text:

"If the first updater commits, the second updater will ignore the row
if the first updater deleted it, otherwise it will attempt to apply
its operation to the updated version of the row. The search condition
of the command (the WHERE clause) is re-evaluated to see if the
updated version of the row still matches the search condition. If so,
the second updater proceeds with its operation using the updated
version of the row."

> And this example also proves my solution of checking visibility works.

The specific part that Sophie aims to fix is the incorrect behaviour
regarding the "re-evaluation" of the updated row. The correct way to
fix that is to ensure the updated tuple matches the WHERE clause of
the statement. Adding some visibility checks in place of that is
complete nonsense.  What does need to happen is validation that the
ctid of the updated tuple matches the WHERE clause of the UPDATE
statement. The reason anything special must happen for TID Scan and
TID Range Scan and not Seq Scan is that in Seq Scan the ctid qual will
be part of the scan's qual, whereas in the TID Scan variants, that's
been extracted as it's assumed the scan itself will handle only
getting the required rows.  It's just that's currently not enforced
during the recheck.

If you review the ExecScan() comments, you'll see the recheck
function's role is the following:

 * A 'recheck method' must also be provided that can check an
 * arbitrary tuple of the relation against any qual conditions
 * that are implemented internal to the access method.

If you need guidance about how this should be behaving, try with "SET
enable_tidscan = 0;" and see what that does.

David

[1] https://www.postgresql.org/docs/current/transaction-iso.html#XACT-READ-COMMITTED

Re: Fix missing EvalPlanQual recheck for TID scans

От

Chao Li

Дата:

15 сентября, 08:31:59

Hi David and Sophie,

If you need guidance about how this should be behaving, try with "SET
enable_tidscan = 0;" and see what that does.

David

[1] https://www.postgresql.org/docs/current/transaction-iso.html#XACT-READ-COMMITTED

Thanks for pointing out “SET enable_tidsan=0”, with that, yes, (0,3) can be updated.

I was reading the official doc at https://www.postgresql.org/docs/current/transaction-iso.html:

```

Read Committed is the default isolation level in PostgreSQL. When a transaction uses this isolation level, a SELECT query (without a FOR UPDATE/SHARE clause) sees only data committed before the query began; it never sees either uncommitted data or changes committed by concurrent transactions during the query's execution. In effect, a SELECT query sees a snapshot of the database as of the instant the query begins to run. However, SELECT does see the effects of previous updates executed within its own transaction, even though they are not yet committed. Also note that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes after the first SELECT starts and before the second SELECT starts.

UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time.

```

It says that UPDATE will only find target rows that were committed as of the command start time. I think the statement implies that an “update” statement will never update a “future” tuple.

With Sophie’s example, when s2 start “update where ctid=(0,1) or (0,3)”, s1 has not committed yet, so based on the doc, (0,3) should not be updated by s2. But with enable_tidscan off, the implementation actually updated (0,3). Is it a bug of the doc or the implementation of SeqScan as SeqScan also doesn’t have recheck implemented? Or any part of my understanding is wrong?

Also, for the example I put in my previous email, in s1, I inserted a tuple, and s2 didn’t update the inserted row, which complied with the behavior the doc described. So, does PG treat CTID query condition specially? Maybe I missed that part?

Best regards,

Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Fix missing EvalPlanQual recheck for TID scans

От

"David G. Johnston"

Дата:

15 сентября, 09:04:52

On Sunday, September 14, 2025, Chao Li <li.evan.chao@gmail.com> wrote:

It says that UPDATE will only find target rows that were committed as of the command start time. I think the statement implies that an “update” statement will never update a “future” tuple.

It will indeed see a future physical tuple so long as the logical row that said tuple refers to already was committed, and it found the corresponding past physical tuple.

Admittedly, I’m unclear on how exactly the system communicates/understands that the past and future physical tuples refer to same logical row reliably. In the docs, one is left to assume that feature just works.

David J.

Re: Fix missing EvalPlanQual recheck for TID scans

От

"David G. Johnston"

Дата:

15 сентября, 09:27:37

On Sunday, September 14, 2025, Chao Li <li.evan.chao@gmail.com> wrote:

evantest=*# update t set b=30 where id = 1;

evantest=*# update t set b = 30; # block here
```

1 | 5 | 30 <=== only the old tuple is updated

Can’t test right now myself but I believe you’ll find this much more illustrative if you don’t have both S1 and S2 set the same column to the same value when doing their updates. Also, include ctid in the select outputs. If the second update would have been a=10 your final output should be 1,10,30 - I.e. you’d find that both updates applied to id=1 after the second commit finished, and three tuples exist where id=1.

David J.

Re: Fix missing EvalPlanQual recheck for TID scans

От

David Rowley

Дата:

15 сентября, 09:27:59

On Mon, 15 Sept 2025 at 17:32, Chao Li <li.evan.chao@gmail.com> wrote:
> UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for
targetrows: they will only find target rows that were committed as of the command start time. 
>
> It says that UPDATE will only find target rows that were committed as of the command start time. I think the
statementimplies that an “update” statement will never update a “future” tuple. 

I think you've only read the first sentence in that paragraph. Please
read the entire paragraph. You'll see it goes on to explain "it will
attempt to apply its operation to the updated version of the row".

David

Re: Fix missing EvalPlanQual recheck for TID scans

От

Chao Li

Дата:

15 сентября, 11:12:23

On Sep 15, 2025, at 14:27, David Rowley <dgrowleyml@gmail.com> wrote:

I think you've only read the first sentence in that paragraph. Please
read the entire paragraph. You'll see it goes on to explain "it will
attempt to apply its operation to the updated version of the row".

Ah… Sorry for missing the part of the paragraph. Now it’s clear to me.

Then the solution of TidListEval() is correct, and the only issue is repeatedly calling TidListEval() on every recheck.

Best regards,

Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Fix missing EvalPlanQual recheck for TID scans

От

Chao Li

Дата:

15 сентября, 12:29:33

On Sep 15, 2025, at 16:12, Chao Li <li.evan.chao@gmail.com> wrote:

Then the solution of TidListEval() is correct, and the only issue is repeatedly calling TidListEval() on every recheck.

I want to add that, the issue of repeatedly calling TidListEval() is not a problem of this patch. I am okay if you decide to defer the problem to a separate patch.

Best regards,

Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Fix missing EvalPlanQual recheck for TID scans

От

David Rowley

Дата:

15 сентября, 14:23:06

On Tue, 9 Sept 2025 at 04:36, Sophie Alpert <pg@sophiebits.com> wrote:
> I've added a new trss_boundsInitialized flag such that we calculate the range once per EPQ rescan. In order to
preservethe semantics when the min or max is NULL, I'm setting trss_mintid/trss_maxtid to have invalid ItemPointers as
asentinel in the cases where TidRangeEval returns false. I added a ItemPointerIsValid assertion given that it's now
morerelevant to correctness but I can remove it if it feels superfluous. Let me know if there is a more idiomatic way
totreat this. 

This part seems a bit pointless as EPQ queries will only return 1 row
anyway (Check EvalPlanQual() where it calls EvalPlanQualNext()). There
will be a rescan, which will wipe out the range before the recheck
function is ever called again.

I think if we want to do better at not continuously recalculating the
TIDList or range, then something more generic might be better. If the
planner were to note down which ParamIds exist in the TID exprs, we
could just mark that the TID List / Range needs to be recalculated
whenever one of those parameters changes. That should help in non-EPQ
cases too.

For the v2 patch, I've hacked on that a bit and stripped out the
trss_boundsInitialized stuff and just make it so we recalculate the
TID List/Range on every recheck. I also added another isolation test
permutation to have s1 rollback and ensure that s2 updates the ctid =
'(0,1)' tuple.

The attached v3-0001 is the updated v2 patch, and v3-0002 is a POC of
what I described above. Seems there is something to it as the
performance is better.  It is a very contrived test case, however.

create table empty(a int);
create table million (a int);
insert into million select generate_series(1,1000000);

set max_parallel_Workers_per_gather=0;
set enable_seqscan=0;
set enable_material=0;
set jit=0;
select sum(c) from million m left join lateral (select count(*) c from
empty where ctid in

('(1,1)','(1,2)','(1,3)','(1,4)','(1,5)','(1,6)','(1,7)','(1,8)','(1,9)','(1,10)','(1,11)','(1,12)','(1,13)','(1,14)','(1,15)','(1,16)','(1,17)','(1,18)','(1,19)','(1,20)','(1,21)','(1,22)','(1,23)','(1,24)','(1,25)','(1,26)','(1,27)','(1,28)','(1,29)','(1,30)','(1,31)','(1,32)','(1,33)','(1,34)','(1,35)','(1,36)','(1,37)','(1,38)','(1,39)','(1,40)','(1,41)','(1,42)','(1,43)','(1,44)','(1,45)','(1,46)','(1,47)','(1,48)','(1,49)','(1,50)','(1,51)','(1,52)','(1,53)','(1,54)','(1,55)','(1,56)','(1,57)','(1,58)','(1,59)','(1,60)','(1,61)','(1,62)','(1,63)','(1,64)','(1,65)','(1,66)','(1,67)','(1,68)','(1,69)','(1,70)','(1,71)','(1,72)','(1,73)','(1,74)','(1,75)','(1,76)','(1,77)','(1,78)','(1,79)','(1,80)','(1,81)','(1,82)','(1,83)','(1,84)','(1,85)','(1,86)','(1,87)','(1,88)','(1,89)','(1,90)','(1,91)','(1,92)','(1,93)','(1,94)','(1,95)','(1,96)','(1,97)','(1,98)','(1,99)','(1,100)'))
on 1=1;

master:

Time: 613.541 ms
Time: 621.037 ms
Time: 623.430 ms

master + v3-0002:

Time: 298.863 ms
Time: 298.015 ms
Time: 297.172 ms

Overall, I'm keen to get moving fixing the initial reported problem.
Maybe the 0002 part can be done in master or not at all. 0002 modifies
the TidScan and TidRangeScan, which we can't really do in the back
branches anyway.

David

Вложения

Re: Fix missing EvalPlanQual recheck for TID scans

От

"Sophie Alpert"

Дата:

15 сентября, 20:41:27

On Mon, Sep 15, 2025 at  4:23 AM, David Rowley <dgrowleyml@gmail.com> wrote:
> For the v2 patch, I've hacked on that a bit and stripped out the
> trss_boundsInitialized stuff and just make it so we recalculate the
> TID List/Range on every recheck. I also added another isolation test
> permutation to have s1 rollback and ensure that s2 updates the ctid =
> '(0,1)' tuple.

Thanks, this seems sensible given the reality that rescan happens once per tuple. The `if (node->tss_TidList == NULL)`
checkyou kept from me seems likewise pointless in v3-0001 but not particularly intrusive (though of course you use it
inv3-0002).

Otherwise v3-0001 lgtm. (I also took a look at v3-0002 and don't see any immediate issues, though neither would I
reallyconsider myself qualified to review it.)

Sophie

Re: Fix missing EvalPlanQual recheck for TID scans

От

Chao Li

Дата:

16 сентября, 07:11:39

On Sep 15, 2025, at 19:23, David Rowley <dgrowleyml@gmail.com> wrote:

The attached v3-0001 is the updated v2 patch, and v3-0002 is a POC of
what I described above. Seems there is something to it as the
performance is better. It is a very contrived test case, however.

create table empty(a int);
create table million (a int);
insert into million select generate_series(1,1000000);

set max_parallel_Workers_per_gather=0;
set enable_seqscan=0;
set enable_material=0;
set jit=0;
select sum(c) from million m left join lateral (select count(*) c from
empty where ctid in
('(1,1)','(1,2)','(1,3)','(1,4)','(1,5)','(1,6)','(1,7)','(1,8)','(1,9)','(1,10)','(1,11)','(1,12)','(1,13)','(1,14)','(1,15)','(1,16)','(1,17)','(1,18)','(1,19)','(1,20)','(1,21)','(1,22)','(1,23)','(1,24)','(1,25)','(1,26)','(1,27)','(1,28)','(1,29)','(1,30)','(1,31)','(1,32)','(1,33)','(1,34)','(1,35)','(1,36)','(1,37)','(1,38)','(1,39)','(1,40)','(1,41)','(1,42)','(1,43)','(1,44)','(1,45)','(1,46)','(1,47)','(1,48)','(1,49)','(1,50)','(1,51)','(1,52)','(1,53)','(1,54)','(1,55)','(1,56)','(1,57)','(1,58)','(1,59)','(1,60)','(1,61)','(1,62)','(1,63)','(1,64)','(1,65)','(1,66)','(1,67)','(1,68)','(1,69)','(1,70)','(1,71)','(1,72)','(1,73)','(1,74)','(1,75)','(1,76)','(1,77)','(1,78)','(1,79)','(1,80)','(1,81)','(1,82)','(1,83)','(1,84)','(1,85)','(1,86)','(1,87)','(1,88)','(1,89)','(1,90)','(1,91)','(1,92)','(1,93)','(1,94)','(1,95)','(1,96)','(1,97)','(1,98)','(1,99)','(1,100)'))
on 1=1;

I just tested v3.

V3-0002 looks a nice solution. Now when the first time TidRecheck() is called, “node” is a brand-new one, next time “node” is from the previous recheck, thus it doesn’t need to recalculate TidList.

The change also helps the rescan situation by avoiding deleting tss_TidList.

So, overall v3 looks good to me.

Best regards,

Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Fix missing EvalPlanQual recheck for TID scans

От

David Rowley

Дата:

17 сентября, 03:26:11

On Tue, 16 Sept 2025 at 05:42, Sophie Alpert <pg@sophiebits.com> wrote:
> Thanks, this seems sensible given the reality that rescan happens once per tuple. The `if (node->tss_TidList ==
NULL)`check you kept from me seems likewise pointless in v3-0001 but not particularly intrusive (though of course you
useit in v3-0002).

That was a defensive coding choice. It could have been an
Assert(node->tss_TidList == NULL);, but I just prefer it as an "if".

> Otherwise v3-0001 lgtm. (I also took a look at v3-0002 and don't see any immediate issues, though neither would I
reallyconsider myself qualified to review it.)

Thanks for looking, and for the report and patches.  I've pushed and
backpatched the fixes.  v13 doesn't have TID Range Scans, so I opted
to break the fix into 2 parts and apply to the relevant branches to
try and keep the commit messages as accurate as possible.

David

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Fix missing EvalPlanQual recheck for TID scans

Вложения

Вложения

Вложения

Вложения