Обсуждение: Is this expected concurrency behaviour for EvalPlanQual and ctid?
Hi all,
I've run into a concurrency issue and am hoping someone here with more
internal PostgreSQL knowledge than me can explain why.
My scenario is keeping a sync table up to date with the latest message
by recipient key. If I use ctid, all the updated records are dropped
by the concurrent query, after the first query releases the locks. If
I use a standard column it works as expected.
I'm hopeful that the recently accepted patch [1] from Sophie is the
same issue, but I'm not knowledgeable enough to tell. I tried setting
enable_tidscan to 0 based on the discussion, but it made no
difference, so I thought I best ask. If this behaviour is as expected,
I'd really like to understand why, so I can better reason about ctid
in concurrent updates.
To test, I created 10 sync records and 200 messages for each of them.
I processed 1000 messages at a time concurrently in two different
sessions. I duplicated the test for the ctid case and my key column
(k) case. The only difference is the final where clause for the
update. The output should be each key synced to 200.
--setup
CREATE TABLE msg(k INT, m INT);
CREATE TABLE msg2(k INT, m INT);
CREATE TABLE sync(k INT PRIMARY KEY, m INT DEFAULT 0);
CREATE TABLE sync2(k INT PRIMARY KEY, m INT DEFAULT 0);
INSERT INTO sync(k) SELECT i FROM generate_series(1, 10) i;
INSERT INTO sync2(k) SELECT i FROM generate_series(1, 10) i;
INSERT INTO msg SELECT k, i FROM sync, generate_series(1, 200) i
ORDER BY random();
INSERT INTO msg2 SELECT k, i FROM sync, generate_series(1, 200) i
ORDER BY random();
SELECT pg_advisory_lock(1);
--run in two new sessions
SELECT pg_advisory_lock_shared(1);
SELECT pg_advisory_unlock_shared(1);
WITH
--by ctid
lock_msg AS (
SELECT ctid FROM msg FOR UPDATE SKIP LOCKED LIMIT 1000),
delete_msg AS (
DELETE FROM msg m USING lock_msg l WHERE m.ctid = l.ctid
RETURNING k, m),
msg_max AS (
SELECT DISTINCT ON (k) k, m FROM delete_msg ORDER BY k, m DESC),
lock_sync AS (
SELECT s.ctid, k, m.m from msg_max m join sync s USING (k)
ORDER BY k FOR UPDATE OF s),
update_sync AS (
UPDATE sync s set m = l.m FROM lock_sync l
WHERE s.ctid = l.ctid AND l.m > s.m --by ctid
RETURNING s.k, s.m, s.ctid),
--by k
lock_msg2 AS (
SELECT ctid FROM msg2 FOR UPDATE SKIP LOCKED LIMIT 1000),
delete_msg2 AS (
DELETE FROM msg2 m USING lock_msg2 l WHERE m.ctid = l.ctid
RETURNING k, m),
msg_max2 AS (
SELECT DISTINCT ON (k) k, m FROM delete_msg2 ORDER BY k, m DESC),
lock_sync2 AS (
SELECT s.ctid, k, m.m FROM msg_max2 m join sync2 s USING (k)
ORDER BY k FOR UPDATE OF s),
update_sync2 AS (
UPDATE sync2 s SET m = l.m FROM lock_sync2 l
WHERE s.k = l.k AND l.m > s.m --by k
RETURNING s.k, s.m, s.ctid)
--results
SELECT 'ctid', * FROM update_sync UNION ALL
SELECT 'k', * FROM update_sync2;
--run last in original session
SELECT pg_advisory_unlock(1);
All the records in both msg tables are deleted. In the sync tables,
'k' is updated as expected across the two sessions, but 'ctid' dropped
all the records in the second session.
Session 1
ctid 1 199 (0,11)
ctid 2 200 (0,12)
ctid 3 200 (0,13)
ctid 4 199 (0,14)
ctid 5 200 (0,15)
ctid 6 200 (0,16)
ctid 7 199 (0,17)
ctid 8 200 (0,18)
ctid 9 200 (0,19)
ctid 10 200 (0,20)
k 1 200 (0,11)
k 2 195 (0,12)
k 3 200 (0,13)
k 4 200 (0,14)
k 5 199 (0,15)
k 6 199 (0,16)
k 7 196 (0,17)
k 8 194 (0,18)
k 9 200 (0,19)
k 10 196 (0,20)
Session 2
k 2 200 (0,21)
k 5 200 (0,22)
k 6 200 (0,23)
k 7 200 (0,24)
k 8 200 (0,25)
k 10 200 (0,26)
Thanks,
Bernice
1 [https://www.postgresql.org/message-id/flat/4a6268ff-3340-453a-9bf5-c98d51a6f729%40app.fastmail.com]
Here's a much simpler example. This one works when I turn off
enable_tidscan, so I'm very hopeful Sophie's patch applies. It's very
similar to the example bug, but loses an update instead of gaining
one. It MSTM that both are because the ctid value wasn't being
rechecked in EvalPlanQual.
--session 1
CREATE TABLE t(p BOOL, q BOOL);
INSERT INTO t DEFAULT VALUES;
SELECT pg_advisory_lock(1);
--session 2
SELECT pg_advisory_lock_shared(1);
SELECT pg_advisory_unlock_shared(1);
WITH lock_t AS (SELECT ctid FROM t FOR UPDATE)
UPDATE t SET p = TRUE FROM lock_t l WHERE t.ctid = l.ctid
RETURNING p, q, t.ctid;
--session 3
SELECT pg_advisory_lock_shared(1);
SELECT pg_advisory_unlock_shared(1);
WITH lock_t AS (SELECT ctid FROM t FOR UPDATE)
UPDATE t SET q = TRUE FROM lock_t l WHERE t.ctid = l.ctid
RETURNING p, q, t.ctid;
--session 1
SELECT pg_advisory_unlock(1);
Only one of the updates succeeds.
This leads me to conclude that for concurrency and ctid:
safe:
- deleting a record, because one of the deletes will succeed
- updating a record with skip lock, because the lock won't ever be rechecked
unsafe (currently):
- updating a record without skip lock
The documentation for UPDATE and DELETE both include a FOR UPDATE with
ctid example, so I think this isn't expected behaviour, but will be
fixed in the next release. Thanks Sophie!
Thanks,
Bernice
On Wed, Nov 12, 2025 at 5:12 PM Bernice Southey
<bernice.southey@gmail.com> wrote:
>
> Hi all,
>
> I've run into a concurrency issue and am hoping someone here with more
> internal PostgreSQL knowledge than me can explain why.
>
> My scenario is keeping a sync table up to date with the latest message
> by recipient key. If I use ctid, all the updated records are dropped
> by the concurrent query, after the first query releases the locks. If
> I use a standard column it works as expected.
>
> I'm hopeful that the recently accepted patch [1] from Sophie is the
> same issue, but I'm not knowledgeable enough to tell. I tried setting
> enable_tidscan to 0 based on the discussion, but it made no
> difference, so I thought I best ask. If this behaviour is as expected,
> I'd really like to understand why, so I can better reason about ctid
> in concurrent updates.
>
> To test, I created 10 sync records and 200 messages for each of them.
> I processed 1000 messages at a time concurrently in two different
> sessions. I duplicated the test for the ctid case and my key column
> (k) case. The only difference is the final where clause for the
> update. The output should be each key synced to 200.
>
> --setup
> CREATE TABLE msg(k INT, m INT);
> CREATE TABLE msg2(k INT, m INT);
> CREATE TABLE sync(k INT PRIMARY KEY, m INT DEFAULT 0);
> CREATE TABLE sync2(k INT PRIMARY KEY, m INT DEFAULT 0);
> INSERT INTO sync(k) SELECT i FROM generate_series(1, 10) i;
> INSERT INTO sync2(k) SELECT i FROM generate_series(1, 10) i;
> INSERT INTO msg SELECT k, i FROM sync, generate_series(1, 200) i
> ORDER BY random();
> INSERT INTO msg2 SELECT k, i FROM sync, generate_series(1, 200) i
> ORDER BY random();
> SELECT pg_advisory_lock(1);
>
> --run in two new sessions
> SELECT pg_advisory_lock_shared(1);
> SELECT pg_advisory_unlock_shared(1);
> WITH
> --by ctid
> lock_msg AS (
> SELECT ctid FROM msg FOR UPDATE SKIP LOCKED LIMIT 1000),
> delete_msg AS (
> DELETE FROM msg m USING lock_msg l WHERE m.ctid = l.ctid
> RETURNING k, m),
> msg_max AS (
> SELECT DISTINCT ON (k) k, m FROM delete_msg ORDER BY k, m DESC),
> lock_sync AS (
> SELECT s.ctid, k, m.m from msg_max m join sync s USING (k)
> ORDER BY k FOR UPDATE OF s),
> update_sync AS (
> UPDATE sync s set m = l.m FROM lock_sync l
> WHERE s.ctid = l.ctid AND l.m > s.m --by ctid
> RETURNING s.k, s.m, s.ctid),
> --by k
> lock_msg2 AS (
> SELECT ctid FROM msg2 FOR UPDATE SKIP LOCKED LIMIT 1000),
> delete_msg2 AS (
> DELETE FROM msg2 m USING lock_msg2 l WHERE m.ctid = l.ctid
> RETURNING k, m),
> msg_max2 AS (
> SELECT DISTINCT ON (k) k, m FROM delete_msg2 ORDER BY k, m DESC),
> lock_sync2 AS (
> SELECT s.ctid, k, m.m FROM msg_max2 m join sync2 s USING (k)
> ORDER BY k FOR UPDATE OF s),
> update_sync2 AS (
> UPDATE sync2 s SET m = l.m FROM lock_sync2 l
> WHERE s.k = l.k AND l.m > s.m --by k
> RETURNING s.k, s.m, s.ctid)
> --results
> SELECT 'ctid', * FROM update_sync UNION ALL
> SELECT 'k', * FROM update_sync2;
>
> --run last in original session
> SELECT pg_advisory_unlock(1);
>
> All the records in both msg tables are deleted. In the sync tables,
> 'k' is updated as expected across the two sessions, but 'ctid' dropped
> all the records in the second session.
>
> Session 1
> ctid 1 199 (0,11)
> ctid 2 200 (0,12)
> ctid 3 200 (0,13)
> ctid 4 199 (0,14)
> ctid 5 200 (0,15)
> ctid 6 200 (0,16)
> ctid 7 199 (0,17)
> ctid 8 200 (0,18)
> ctid 9 200 (0,19)
> ctid 10 200 (0,20)
> k 1 200 (0,11)
> k 2 195 (0,12)
> k 3 200 (0,13)
> k 4 200 (0,14)
> k 5 199 (0,15)
> k 6 199 (0,16)
> k 7 196 (0,17)
> k 8 194 (0,18)
> k 9 200 (0,19)
> k 10 196 (0,20)
>
> Session 2
> k 2 200 (0,21)
> k 5 200 (0,22)
> k 6 200 (0,23)
> k 7 200 (0,24)
> k 8 200 (0,25)
> k 10 200 (0,26)
>
> Thanks,
> Bernice
>
> 1 [https://www.postgresql.org/message-id/flat/4a6268ff-3340-453a-9bf5-c98d51a6f729%40app.fastmail.com]
For the record, I was wrong, the patch in [1] doesn't affect lost concurrent updates with ctid. It was applied in 17.7 and after looking much harder at the thread and code, I can see it's unrelated. I've worked out what's happening, and it has nothing to do with ctid. Here's a much simplified example: create table t(i int); insert into t(i) values (1); --s1 begin; update t set i = 2 from (select i from t) x where t.i = x.i; --------------- UPDATE 1 --s2 update t set i = 2 from (select i from t) x where t.i = x.i; ---------------- UPDATE 0 (after commit s1) The same thing happens with update t set i = 2 from (select i from t for update) x where t.i = x.i; x.i is not updated when s1 releases the lock and so s2 is still looking for x.i = 1. Based on [2], I'm guessing that because the where clause is indirect, it doesn't qualify for re-evalution after the lock is released. So it continues to use the version of the record from the start of the transaction. But I don't know nearly enough about the internals to give a proper explanation. Here's the ctid version that helped me figure this out. with x as (select ctid from t for update), y as (update t set i = 2 from x where t.ctid = x.ctid returning t.ctid) select 'x', ctid from x union select 'y', ctid from y; --s1 x (0, 1) y (0, 2) --s2 x (0, 2) Even though x is updated in s2, the updated version isn't what's used by y. I suspect the x version is only updated here because of the final select. So don't use this pattern to avoid deadlocks if this is a one shot update. with x as (select ctid from t where ... order by id for update) update t set ... where t.ctid = x.ctid; Use an immutable unique column, or retry deadlocks. Thanks, Bernice [1] https://www.postgresql.org/message-id/flat/4a6268ff-3340-453a-9bf5-c98d51a6f729%40app.fastmail.com [2] https://www.postgresql.org/docs/current/transaction-iso.html#XACT-READ-COMMITTED
Bernice Southey <bernice.southey@gmail.com> writes:
> I've worked out what's happening, and it has nothing to do with ctid.
> Here's a much simplified example:
> create table t(i int);
> insert into t(i) values (1);
> --s1
> begin;
> update t set i = 2 from (select i from t) x where t.i = x.i;
> ---------------
> UPDATE 1
> --s2
> update t set i = 2 from (select i from t) x where t.i = x.i;
> ----------------
> UPDATE 0 (after commit s1)
I'm reminded of Markus Winand's talk at PGConf.EU 2019 [1], where
he was handing out coasters that said "Notice: All employees must
wash hands after using self-joins". (Think I've still got mine,
somewhere.) This is a mess because the initial table scans find
the row version with i = 1, so the join qual succeeds, and then the
UPDATE step blocks because it notices that the target row has an
uncommitted update pending. After waiting for that to commit,
we run the "EvalPlanQual" logic, which rechecks the updated version
of the target row to see if it still satisfies the WHERE condition.
But it doesn't, because as you say, we still think that x.i is 1.
(EvalPlanQual is careful to re-use the same row versions from
other tables that were used the first time.)
> The same thing happens with
> update t set i = 2 from (select i from t for update) x where t.i = x.i;
Right, the common advice if you need to make such scenarios work
is to add FOR UPDATE to the non-target relations. But here, that
just breaks in the opposite direction: the sub-select blocks
waiting for the concurrent commit and then returns x.i = 2.
But the UPDATE's initial scan of t only sees t.i = 1, so the join
fails before we ever get to EvalPlanQual. I'm not sure there is
any way to make self-join cases work when the concurrent update
is changing the join column.
> So don't use this pattern to avoid deadlocks if this is a one shot update.
> with x as (select ctid from t where ... order by id for update)
> update t set ... where t.ctid = x.ctid;
> Use an immutable unique column, or retry deadlocks.
Yup. True SERIALIZABLE mode with retries on serialization failures is
the best recipe if you need to deal with concurrent-update situations
this complicated. In this particular example though, I wonder if you
couldn't get rid of the self-join.
regards, tom lane
[1]
https://www.postgresql.eu/events/pgconfeu2019/schedule/session/2644-more-than-a-query-language-sql-in-the-21st-century/
On Wed, Nov 19, 2025 at 8:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > The same thing happens with > > update t set i = 2 from (select i from t for update) x where t.i = x.i; > > Right, the common advice if you need to make such scenarios work > is to add FOR UPDATE to the non-target relations. But here, that > just breaks in the opposite direction: the sub-select blocks > waiting for the concurrent commit and then returns x.i = 2. > But the UPDATE's initial scan of t only sees t.i = 1, so the join > fails before we ever get to EvalPlanQual. Thanks for the clear explanation. When I tried to think this through I couldn't fill in the details.