TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c
От | Masahiko Sawada |
---|---|
Тема | TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c |
Дата | |
Msg-id | CAD21AoBpo6Gx55FBOW+9s5X=nUw3Xpq64v35fpDEKsTERnc4TQ@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c |
Список | pgsql-bugs |
Hi all, Kristian Lejao (colleague, in CC) has found the following assertion failure in postgres_fdw.c when rechecking the result tuple via EvalPlanQual(): TRAP: failed Assert("outerPlan != NULL"), File: "postgres_fdw.c", Line: 2366, PID: 2043518 Here is the reproducible steps that I've simplified from the one Kristian originally created: 1. setup local node: create extension postgres_fdw; create server srv foreign data wrapper postgres_fdw options (host 'localhost', port '5433', dbname 'postgres'); create user mapping for public server srv; create table a (i int primary key); create foreign table b (i int) server srv; create foreign table c (i int) server srv; insert into a values (1); 2. setup remote node: create table b (i int); create table c (i int); insert into b values (1); insert into c values (1); 3. attach to the backend process started on the local node (say conn1) using gdb and set breakpoint at table_tuple_lock(). 4. run the following query on conn1 (which stops before locking the result tuple): select a.i, (select 1 from b, c where a.i = b.i and b.i = c.i) from a for update; 5. on another session, update the tuple concurrently: update a set i = i + 1; -- update 1 tuple 6. continue the query on conn1, the server crashes due to the assertion failure. The plan of the FOR UPDATE query lead to this issue is: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ LockRows (cost=0.00..615886.00 rows=2550 width=14) Output: a.i, ((SubPlan 1)), a.ctid -> Seq Scan on public.a (cost=0.00..615860.50 rows=2550 width=14) Output: a.i, (SubPlan 1), a.ctid SubPlan 1 -> Foreign Scan (cost=100.00..241.50 rows=225 width=4) Output: 1 Relations: (public.b) INNER JOIN (public.c) Remote SQL: SELECT NULL FROM (public.b r1 INNER JOIN public.c r2 ON (((r2.i = $1::integer)) AND ((r1.i = $1::integer)))) (9 rows) The point is that in the subquery in the target list we pushed the inner join to the foreign server. In postgresGetForeignJoinPaths(), we prepare the join path for EvalPlanQual() check (and used in postgresRecheckForeignScan()) if the query is DELETE, UPDATE, or FOR UPDATE/SHARE (as shown below) but we skip it since the subquery itself is parsed as a normal SELECT query without rowMarks, leaving fdw_outerpath of the ForeignScan node NULL: /* * If there is a possibility that EvalPlanQual will be executed, we need * to be able to reconstruct the row using scans of the base relations. * GetExistingLocalJoinPath will find a suitable path for this purpose in * the path list of the joinrel, if one exists. We must be careful to * call it before adding any ForeignPath, since the ForeignPath might * dominate the only suitable local path available. We also do it before * calling foreign_join_ok(), since that function updates fpinfo and marks * it as pushable if the join is found to be pushable. */ if (root->parse->commandType == CMD_DELETE || root->parse->commandType == CMD_UPDATE || root->rowMarks) { epq_path = GetExistingLocalJoinPath(joinrel); Therefore, if the tuple is concurrently updated before taking a lock, we recheck the traversed tuple via EvalPlanQual() but we end up with the assertion failure since we didn't prepare the join plan for that. The attached patch includes the draft fix and regression tests (using injection points). I don't have enough experience with the planner and FDW code area to evaluate whether the patch fixes the issue in the right approach. Feedback is very welcome. I've confirmed this assertion could happen with the same scenario on all supported branches. In addition to that, I realized that none of the regression tests execute postgresRecheckForeignScan()[1]. I think we need to add regression tests to cover that function. Regards, [1] https://coverage.postgresql.org/contrib/postgres_fdw/postgres_fdw.c.gcov.html#2354() -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Вложения
В списке pgsql-bugs по дате отправления: