Обсуждение: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

Поиск
Список
Период
Сортировка

TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Masahiko Sawada
Дата:
Hi all,

Kristian Lejao (colleague, in CC) has found the following assertion
failure in postgres_fdw.c when rechecking the result tuple via
EvalPlanQual():

TRAP: failed Assert("outerPlan != NULL"), File: "postgres_fdw.c",
Line: 2366, PID: 2043518

Here is the reproducible steps that I've simplified from the one
Kristian originally created:

1. setup local node:

create extension postgres_fdw;
create server srv foreign data wrapper postgres_fdw options (host
'localhost', port '5433', dbname 'postgres');
create user mapping for public server srv;
create table a (i int primary key);
create foreign table b (i int) server srv;
create foreign table c (i int) server srv;
insert into a values (1);

2. setup remote node:

create table b (i int);
create table c (i int);
insert into b values (1);
insert into c values (1);

3. attach to the backend process started on the local node (say conn1)
using gdb and set breakpoint at table_tuple_lock().

4. run the following query on conn1 (which stops before locking the
result tuple):

select a.i,
       (select 1 from b, c where a.i = b.i and b.i = c.i)
from a
for update;

5. on another session, update the tuple concurrently:

update a set i = i + 1; -- update 1 tuple

6. continue the query on conn1, the server crashes due to the assertion failure.


The plan of the FOR UPDATE query lead to this issue is:

                                                                QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 LockRows  (cost=0.00..615886.00 rows=2550 width=14)
   Output: a.i, ((SubPlan 1)), a.ctid
   ->  Seq Scan on public.a  (cost=0.00..615860.50 rows=2550 width=14)
         Output: a.i, (SubPlan 1), a.ctid
         SubPlan 1
           ->  Foreign Scan  (cost=100.00..241.50 rows=225 width=4)
                 Output: 1
                 Relations: (public.b) INNER JOIN (public.c)
                 Remote SQL: SELECT NULL FROM (public.b r1 INNER JOIN
public.c r2 ON (((r2.i = $1::integer)) AND ((r1.i = $1::integer))))
(9 rows)

The point is that in the subquery in the target list we pushed the
inner join to the foreign server. In postgresGetForeignJoinPaths(), we
prepare the join path for EvalPlanQual() check (and used in
postgresRecheckForeignScan()) if the query is DELETE, UPDATE, or FOR
UPDATE/SHARE (as shown below) but we skip it since the subquery itself
is parsed as a normal SELECT query without rowMarks, leaving
fdw_outerpath of the ForeignScan node NULL:

    /*
     * If there is a possibility that EvalPlanQual will be executed, we need
     * to be able to reconstruct the row using scans of the base relations.
     * GetExistingLocalJoinPath will find a suitable path for this purpose in
     * the path list of the joinrel, if one exists.  We must be careful to
     * call it before adding any ForeignPath, since the ForeignPath might
     * dominate the only suitable local path available.  We also do it before
     * calling foreign_join_ok(), since that function updates fpinfo and marks
     * it as pushable if the join is found to be pushable.
     */
    if (root->parse->commandType == CMD_DELETE ||
        root->parse->commandType == CMD_UPDATE ||
        root->rowMarks)
    {
        epq_path = GetExistingLocalJoinPath(joinrel);

Therefore, if the tuple is concurrently updated before taking a lock,
we recheck the traversed tuple via EvalPlanQual() but we end up with
the assertion failure since we didn't prepare the join plan for that.

The attached patch includes the draft fix and regression tests (using
injection points).

I don't have enough experience with the planner and FDW code area to
evaluate whether the patch fixes the issue in the right approach.
Feedback is very welcome. I've confirmed this assertion could happen
with the same scenario on all supported branches.

In addition to that, I realized that none of the regression tests
execute postgresRecheckForeignScan()[1]. I think we need to add
regression tests to cover that function.

Regards,

[1] https://coverage.postgresql.org/contrib/postgres_fdw/postgres_fdw.c.gcov.html#2354()

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Etsuro Fujita
Дата:
Sawada-san,

On Wed, Aug 6, 2025 at 3:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Kristian Lejao (colleague, in CC) has found the following assertion
> failure in postgres_fdw.c when rechecking the result tuple via
> EvalPlanQual():
>
> TRAP: failed Assert("outerPlan != NULL"), File: "postgres_fdw.c",
> Line: 2366, PID: 2043518
>
> Here is the reproducible steps that I've simplified from the one
> Kristian originally created:

[...]

> The point is that in the subquery in the target list we pushed the
> inner join to the foreign server. In postgresGetForeignJoinPaths(), we
> prepare the join path for EvalPlanQual() check (and used in
> postgresRecheckForeignScan()) if the query is DELETE, UPDATE, or FOR
> UPDATE/SHARE (as shown below) but we skip it since the subquery itself
> is parsed as a normal SELECT query without rowMarks, leaving
> fdw_outerpath of the ForeignScan node NULL:
>
>     /*
>      * If there is a possibility that EvalPlanQual will be executed, we need
>      * to be able to reconstruct the row using scans of the base relations.
>      * GetExistingLocalJoinPath will find a suitable path for this purpose in
>      * the path list of the joinrel, if one exists.  We must be careful to
>      * call it before adding any ForeignPath, since the ForeignPath might
>      * dominate the only suitable local path available.  We also do it before
>      * calling foreign_join_ok(), since that function updates fpinfo and marks
>      * it as pushable if the join is found to be pushable.
>      */
>     if (root->parse->commandType == CMD_DELETE ||
>         root->parse->commandType == CMD_UPDATE ||
>         root->rowMarks)
>     {
>         epq_path = GetExistingLocalJoinPath(joinrel);
>
> Therefore, if the tuple is concurrently updated before taking a lock,
> we recheck the traversed tuple via EvalPlanQual() but we end up with
> the assertion failure since we didn't prepare the join plan for that.
>
> The attached patch includes the draft fix and regression tests (using
> injection points).
>
> I don't have enough experience with the planner and FDW code area to
> evaluate whether the patch fixes the issue in the right approach.
> Feedback is very welcome. I've confirmed this assertion could happen
> with the same scenario on all supported branches.

Will review.  Thank you for the report and patch!

> In addition to that, I realized that none of the regression tests
> execute postgresRecheckForeignScan()[1]. I think we need to add
> regression tests to cover that function.

Yeah, I think so too.

Best regards,
Etsuro Fujita



Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Michael Paquier
Дата:
On Tue, Aug 05, 2025 at 11:00:54AM -0700, Masahiko Sawada wrote:
> The attached patch includes the draft fix and regression tests (using
> injection points).

+$psql_session->query_safe(qq[
+    select injection_points_set_local();
+    select injection_points_attach('heapam_lock_tuple-before-lock', 'wait');
+]);

It seems to me that an isolation test would be a better fit here.  TAP
tests are usually a good fit for injection points if you need to do
direct node manipulations, like restarts or stops.  The whole test
posted only does SQL-ish things, and you could rely on a loopback
server as we do in the SQL tests of postgres_fdw?
--
Michael

Вложения

Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Masahiko Sawada
Дата:
On Wed, Aug 6, 2025 at 8:39 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Aug 05, 2025 at 11:00:54AM -0700, Masahiko Sawada wrote:
> > The attached patch includes the draft fix and regression tests (using
> > injection points).
>
> +$psql_session->query_safe(qq[
> +    select injection_points_set_local();
> +    select injection_points_attach('heapam_lock_tuple-before-lock', 'wait');
> +]);
>
> It seems to me that an isolation test would be a better fit here.  TAP
> tests are usually a good fit for injection points if you need to do
> direct node manipulations, like restarts or stops.  The whole test
> posted only does SQL-ish things, and you could rely on a loopback
> server as we do in the SQL tests of postgres_fdw?

Yes, it's definitely possible to create the test using isolation. I
wasn't sure how to invoke isolation tests only when injection points
are enabled.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Michael Paquier
Дата:
On Thu, Aug 07, 2025 at 10:29:41AM -0700, Masahiko Sawada wrote:
> Yes, it's definitely possible to create the test using isolation. I
> wasn't sure how to invoke isolation tests only when injection points
> are enabled.

For meson, I think that I would tweak the isolation test list based on
get_option('injection_points'), and do a similar thing for Makefile.
--
Michael

Вложения

Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Masahiko Sawada
Дата:
On Thu, Aug 7, 2025 at 4:39 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Aug 07, 2025 at 10:29:41AM -0700, Masahiko Sawada wrote:
> > Yes, it's definitely possible to create the test using isolation. I
> > wasn't sure how to invoke isolation tests only when injection points
> > are enabled.
>
> For meson, I think that I would tweak the isolation test list based on
> get_option('injection_points'), and do a similar thing for Makefile.

Thank you for the advice! I've changed the regression tests to use
isolation tests.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Michael Paquier
Дата:
On Fri, Aug 08, 2025 at 12:15:19AM -0700, Masahiko Sawada wrote:
> On Thu, Aug 7, 2025 at 4:39 PM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Thu, Aug 07, 2025 at 10:29:41AM -0700, Masahiko Sawada wrote:
> > > Yes, it's definitely possible to create the test using isolation. I
> > > wasn't sure how to invoke isolation tests only when injection points
> > > are enabled.
> >
> > For meson, I think that I would tweak the isolation test list based on
> > get_option('injection_points'), and do a similar thing for Makefile.
>
> Thank you for the advice! I've changed the regression tests to use
> isolation tests.

That's enough to reproduce the MULL pointer dereference, thanks!

I would suggest to add a description at the top of foreign_recheck,
documenting the purpose of the test.

+# "s0_lock" execute a FOR UPDATE query but it stops before locking the result
s/execute/executes/, with an 's'.

Fujita-san, are you planning to look at the proposal?
--
Michael

Вложения

Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Michael Paquier
Дата:
On Mon, Aug 18, 2025 at 03:58:45PM +0900, Michael Paquier wrote:
> That's enough to reproduce the MULL pointer dereference, thanks!
>
> I would suggest to add a description at the top of foreign_recheck,
> documenting the purpose of the test.
>
> +# "s0_lock" execute a FOR UPDATE query but it stops before locking the result
> s/execute/executes/, with an 's'.

Another thing that I've noticed has been forgotten: postgres_fdw's
.gitignore needs entries for output_iso/ and tmp_check_iso/.  Based on
git-prompt, GIT_PS1_SHOWUNTRACKEDFILES reports these paths after a
test run.
--
Michael

Вложения

Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Etsuro Fujita
Дата:
Hi,

On Mon, Aug 18, 2025 at 3:58 PM Michael Paquier <michael@paquier.xyz> wrote:
> Fujita-san, are you planning to look at the proposal?

I have just started reviewing the patch.  I will provide my first
comments by this Friday at the very latest.

Best regards,
Etsuro Fujita



Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Michael Paquier
Дата:
On Mon, Aug 18, 2025 at 05:42:58PM +0900, Etsuro Fujita wrote:
> I have just started reviewing the patch.  I will provide my first
> comments by this Friday at the very latest.

Great, thanks a lot!
--
Michael

Вложения

Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Masahiko Sawada
Дата:
On Mon, Aug 18, 2025 at 12:02 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Aug 18, 2025 at 03:58:45PM +0900, Michael Paquier wrote:
> > That's enough to reproduce the MULL pointer dereference, thanks!
> >
> > I would suggest to add a description at the top of foreign_recheck,
> > documenting the purpose of the test.
> >
> > +# "s0_lock" execute a FOR UPDATE query but it stops before locking the result
> > s/execute/executes/, with an 's'.
>
> Another thing that I've noticed has been forgotten: postgres_fdw's
> .gitignore needs entries for output_iso/ and tmp_check_iso/.  Based on
> git-prompt, GIT_PS1_SHOWUNTRACKEDFILES reports these paths after a
> test run.

Thank you for reviewing the patch! I'll incorporate these comments in
the next version patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Masahiko Sawada
Дата:
On Fri, Aug 22, 2025 at 3:59 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
>
> On Wed, Aug 6, 2025 at 8:30 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> > On Wed, Aug 6, 2025 at 3:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > The attached patch includes the draft fix and regression tests (using
> > > injection points).
> > >
> > > I don't have enough experience with the planner and FDW code area to
> > > evaluate whether the patch fixes the issue in the right approach.
> > > Feedback is very welcome. I've confirmed this assertion could happen
> > > with the same scenario on all supported branches.
> >
> > Will review.  Thank you for the report and patch!
>
> First, my apologies for the delay.
>
> I reviewed the postgres_fdw.c part of the fix:
>
> @@ -6313,9 +6314,18 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
>      * calling foreign_join_ok(), since that function updates fpinfo and marks
>      * it as pushable if the join is found to be pushable.
>      */
> -   if (root->parse->commandType == CMD_DELETE ||
> -       root->parse->commandType == CMD_UPDATE ||
> -       root->rowMarks)
> +   for (PlannerInfo *proot = root; proot != NULL; proot = proot->parent_root)
> +   {
> +       if (proot->parse->commandType == CMD_DELETE ||
> +           proot->parse->commandType == CMD_UPDATE ||
> +           proot->rowMarks)
> +       {
> +           need_epq = true;
> +           break;
> +       }
> +   }
> +
> +   if (need_epq)
>     {
>         epq_path = GetExistingLocalJoinPath(joinrel);
>         if (!epq_path)
>
> I think this successfully avoids the assertion failure and produces
> the correct result, but sorry, I don't think it's the right way to go.
> I think the root cause of this issue is in the EPQ handling of
> foreign/custom joins in ExecScanFetch() in execScan.h: it runs the
> recheck method function for a given foreign/custom join whenever it is
> called for EPQ rechecking, but that is not 100% correct.  I think the
> correct handling is: run the recheck method function for the join if
> it is called for EPQ rechecking and the join is a *descendant* join in
> the EPQ plan tree; otherwise run the access method function for the
> join even if it is called for EPQ rechecking, like the attached (where
> I used the epqParam of the given EPQState to determine whether the
> join is a descendant join or not, which localizes the fix pretty
> well).  For the SELECT FOR UPDATE query shown upthread, when doing an
> EPQ recheck, the fix evaluates the sub-select expression in the target
> list by doing a remote join, not a local join, so it would work more
> efficiently than the fix you proposed.

Thank you for reviewing the patch! I've confirmed that your patch
fixes the issue too.

If I understand your proposed fix correctly, the reported problem is
fixed by not rechecking the test tuple by ForeignRecheck() (performing
a local join), but instead we simply call ForeignNext() and get the
next tuple. I think while this fix would have to have a regression
test like I've proposed, it's probably a good time to add some
regression tests to cover postgresRecheckForeignScan() too possibly as
a separate patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Etsuro Fujita
Дата:
On Sat, Aug 23, 2025 at 5:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Aug 22, 2025 at 3:59 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> > I reviewed the postgres_fdw.c part of the fix:
> >
> > @@ -6313,9 +6314,18 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
> >      * calling foreign_join_ok(), since that function updates fpinfo and marks
> >      * it as pushable if the join is found to be pushable.
> >      */
> > -   if (root->parse->commandType == CMD_DELETE ||
> > -       root->parse->commandType == CMD_UPDATE ||
> > -       root->rowMarks)
> > +   for (PlannerInfo *proot = root; proot != NULL; proot = proot->parent_root)
> > +   {
> > +       if (proot->parse->commandType == CMD_DELETE ||
> > +           proot->parse->commandType == CMD_UPDATE ||
> > +           proot->rowMarks)
> > +       {
> > +           need_epq = true;
> > +           break;
> > +       }
> > +   }
> > +
> > +   if (need_epq)
> >     {
> >         epq_path = GetExistingLocalJoinPath(joinrel);
> >         if (!epq_path)
> >
> > I think this successfully avoids the assertion failure and produces
> > the correct result, but sorry, I don't think it's the right way to go.
> > I think the root cause of this issue is in the EPQ handling of
> > foreign/custom joins in ExecScanFetch() in execScan.h: it runs the
> > recheck method function for a given foreign/custom join whenever it is
> > called for EPQ rechecking, but that is not 100% correct.  I think the
> > correct handling is: run the recheck method function for the join if
> > it is called for EPQ rechecking and the join is a *descendant* join in
> > the EPQ plan tree; otherwise run the access method function for the
> > join even if it is called for EPQ rechecking, like the attached (where
> > I used the epqParam of the given EPQState to determine whether the
> > join is a descendant join or not, which localizes the fix pretty
> > well).  For the SELECT FOR UPDATE query shown upthread, when doing an
> > EPQ recheck, the fix evaluates the sub-select expression in the target
> > list by doing a remote join, not a local join, so it would work more
> > efficiently than the fix you proposed.
>
> Thank you for reviewing the patch! I've confirmed that your patch
> fixes the issue too.

Thanks for testing!

> If I understand your proposed fix correctly, the reported problem is
> fixed by not rechecking the test tuple by ForeignRecheck() (performing
> a local join), but instead we simply call ForeignNext() and get the
> next tuple.

That's right.

> I think while this fix would have to have a regression
> test like I've proposed, it's probably a good time to add some
> regression tests to cover postgresRecheckForeignScan() too possibly as
> a separate patch.

Agreed.  Actually, we have fixed many EvalPlanQual issues with
postgres_fdw so far, but have always left adding test cases for future
work, so let's do that now!

The test case you showed upthread and added to the patch is useful, so
I think we should add it as well, but my question about it is: is it
really a good idea to use injection points?  Why don't you just use
BEGIN/COMMIT statements like this:

-- session 1
begin isolation level read committed;
update a set i = i + 1;

-- session 2
begin isolation level read committed;
select a.i,
       (select 1 from b, c where a.i = b.i and b.i = c.i)
from a
for update; -- waits for the transaction in session 1 to complete

-- session 1
commit;

-- session 2
select a.i,
       (select 1 from b, c where a.i = b.i and b.i = c.i)
from a
for update; -- produces results after doing an EvalPlanQual recheck
 i | ?column?
---+----------
 2 |
(1 row)

Again, my apologies for the late response.

Best regards,
Etsuro Fujita



Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Michael Paquier
Дата:
On Wed, Sep 17, 2025 at 07:42:36PM +0900, Etsuro Fujita wrote:
> The test case you showed upthread and added to the patch is useful, so
> I think we should add it as well, but my question about it is: is it
> really a good idea to use injection points?

> Why don't you just use BEGIN/COMMIT statements like this:
>
> -- session 1
> begin isolation level read committed;
> update a set i = i + 1;
>
> -- session 2
> begin isolation level read committed;
> select a.i,
>        (select 1 from b, c where a.i = b.i and b.i = c.i)
> from a
> for update; -- waits for the transaction in session 1 to complete
>
> -- session 1
> commit;
>
> -- session 2
> select a.i,
>        (select 1 from b, c where a.i = b.i and b.i = c.i)
> from a
> for update; -- produces results after doing an EvalPlanQual recheck
>  i | ?column?
> ---+----------
>  2 |
> (1 row)
>
> Again, my apologies for the late response.

As far as I can see, this causes the SELECT FOR UPDATE of session 2
that's waiting for the commit of session 1 to crash, if we don't have
the fix, of course.  Removing the dependency with injection points is
nice if we don't require it, so we can just tweak the isolation test
proposed upthread to use the same schema, but the queries you are
suggesting.

As a whole, +1 for your suggestion.
--
Michael

Вложения

Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Masahiko Sawada
Дата:
On Wed, Sep 17, 2025 at 5:25 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Sep 17, 2025 at 07:42:36PM +0900, Etsuro Fujita wrote:
> > The test case you showed upthread and added to the patch is useful, so
> > I think we should add it as well, but my question about it is: is it
> > really a good idea to use injection points?
>
> > Why don't you just use BEGIN/COMMIT statements like this:
> >
> > -- session 1
> > begin isolation level read committed;
> > update a set i = i + 1;
> >
> > -- session 2
> > begin isolation level read committed;
> > select a.i,
> >        (select 1 from b, c where a.i = b.i and b.i = c.i)
> > from a
> > for update; -- waits for the transaction in session 1 to complete
> >
> > -- session 1
> > commit;
> >
> > -- session 2
> > select a.i,
> >        (select 1 from b, c where a.i = b.i and b.i = c.i)
> > from a
> > for update; -- produces results after doing an EvalPlanQual recheck
> >  i | ?column?
> > ---+----------
> >  2 |
> > (1 row)
> >
> > Again, my apologies for the late response.
>
> As far as I can see, this causes the SELECT FOR UPDATE of session 2
> that's waiting for the commit of session 1 to crash, if we don't have
> the fix, of course.  Removing the dependency with injection points is
> nice if we don't require it, so we can just tweak the isolation test
> proposed upthread to use the same schema, but the queries you are
> suggesting.

+1

I changed the regression tests and used the fix proposed by
Fujita-san. Please review the attached new version patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Michael Paquier
Дата:
On Tue, Sep 23, 2025 at 02:41:45PM -0700, Masahiko Sawada wrote:
> I changed the regression tests and used the fix proposed by
> Fujita-san. Please review the attached new version patch.

Thanks for the updated patch.  Confirmed that I can still blow up the
backend as expected when not using the fix, only the test.
--
Michael

Вложения

Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Etsuro Fujita
Дата:
On Wed, Sep 24, 2025 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I changed the regression tests and used the fix proposed by
> Fujita-san. Please review the attached new version patch.

Thanks for updating the patch!

I took a quick glance at the patch.  My initial comment is: it only
includes the test case discussed here, but I think it's a good idea to
add more cases in it (that exercise code paths in
postgresRecheckForeignScan), as I mentioned upthread.  What do you
think about that?  If no objections, I'd like to update it to include
such cases.

Anyway I'll review it in more detail.

Best regards,
Etsuro Fujita



Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Masahiko Sawada
Дата:
On Wed, Sep 24, 2025 at 3:28 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
>
> On Wed, Sep 24, 2025 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > I changed the regression tests and used the fix proposed by
> > Fujita-san. Please review the attached new version patch.
>
> Thanks for updating the patch!
>
> I took a quick glance at the patch.  My initial comment is: it only
> includes the test case discussed here, but I think it's a good idea to
> add more cases in it (that exercise code paths in
> postgresRecheckForeignScan), as I mentioned upthread.  What do you
> think about that?  If no objections, I'd like to update it to include
> such cases.

+1 to add more test cases for the code paths in postgresRecheckForeignScan.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Michael Paquier
Дата:
On Wed, Sep 24, 2025 at 09:45:26AM -0700, Masahiko Sawada wrote:
> On Wed, Sep 24, 2025 at 3:28 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
>> I took a quick glance at the patch.  My initial comment is: it only
>> includes the test case discussed here, but I think it's a good idea to
>> add more cases in it (that exercise code paths in
>> postgresRecheckForeignScan), as I mentioned upthread.  What do you
>> think about that?  If no objections, I'd like to update it to include
>> such cases.
>
> +1 to add more test cases for the code paths in postgresRecheckForeignScan.

+1.  Would these be included in the same isolation test?  That may be
cleaner, especially as it seems like the EPQ re-evaluations should be
able to happen with the same schema as the one set up by the test.
--
Michael

Вложения

Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Etsuro Fujita
Дата:
On Thu, Sep 25, 2025 at 8:15 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Wed, Sep 24, 2025 at 09:45:26AM -0700, Masahiko Sawada wrote:
> > On Wed, Sep 24, 2025 at 3:28 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> >> I took a quick glance at the patch.  My initial comment is: it only
> >> includes the test case discussed here, but I think it's a good idea to
> >> add more cases in it (that exercise code paths in
> >> postgresRecheckForeignScan), as I mentioned upthread.  What do you
> >> think about that?  If no objections, I'd like to update it to include
> >> such cases.
> >
> > +1 to add more test cases for the code paths in postgresRecheckForeignScan.
>
> +1.  Would these be included in the same isolation test?  That may be
> cleaner, especially as it seems like the EPQ re-evaluations should be
> able to happen with the same schema as the one set up by the test.

Yes, I will add those cases to the same isolation test, as you mentioned.

Best regards,
Etsuro Fujita



Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Álvaro Herrera
Дата:
On 2025-Sep-24, Etsuro Fujita wrote:

> I took a quick glance at the patch.  My initial comment is: it only
> includes the test case discussed here, but I think it's a good idea to
> add more cases in it (that exercise code paths in
> postgresRecheckForeignScan), as I mentioned upthread.  What do you
> think about that?  If no objections, I'd like to update it to include
> such cases.

I would like to suggest a different approach: because this patch is
known to fix a server crash in a known case, it would be better to get
the fix (and its corresponding test) pushed as is without further delay.
Additional tests are an excellent idea, but I think we shouldn't make
this patch wait for them.  If further fixes are deemed necessary for
those other (thus far hypothetical) tests, we can still make them
afterwards.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Etsuro Fujita
Дата:
On Fri, Sep 26, 2025 at 10:34 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> On 2025-Sep-24, Etsuro Fujita wrote:
>
> > I took a quick glance at the patch.  My initial comment is: it only
> > includes the test case discussed here, but I think it's a good idea to
> > add more cases in it (that exercise code paths in
> > postgresRecheckForeignScan), as I mentioned upthread.  What do you
> > think about that?  If no objections, I'd like to update it to include
> > such cases.
>
> I would like to suggest a different approach: because this patch is
> known to fix a server crash in a known case, it would be better to get
> the fix (and its corresponding test) pushed as is without further delay.
> Additional tests are an excellent idea, but I think we shouldn't make
> this patch wait for them.  If further fixes are deemed necessary for
> those other (thus far hypothetical) tests, we can still make them
> afterwards.

The 2-step approach sounds reasonable, so I'll review the proposed
isolation test and report the results first.

Thanks!

Best regards,
Etsuro Fujita



Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Etsuro Fujita
Дата:
On Wed, Sep 24, 2025 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I changed the regression tests and used the fix proposed by
> Fujita-san. Please review the attached new version patch.

I reviewed the test part.  Here are my comments about it.

You renamed the spec file to concurrent_update.spec.  It's a broader
name to cover the test case discussed here, but seems a bit vague to
me.  How about eval_plan_qual.spec like
src/test/isolation/specs/eval-plan-qual.spec, instead?  Which I think
is more specific.

In the setup block in the test, you created schemas:

+    CREATE SCHEMA sch1;
+    CREATE TABLE sch1.a (i int);
+    CREATE FOREIGN TABLE sch1.b (i int) SERVER loopback OPTIONS
(schema_name 'sch2', table_name 'b');
+    CREATE FOREIGN TABLE sch1.c (i int) SERVER loopback OPTIONS
(schema_name 'sch2', table_name 'c');
+
+    CREATE SCHEMA sch2;
+    CREATE TABLE sch2.b (i int);
+    CREATE TABLE sch2.c (i int);

But actually, we don't need these schemas.  As I'm planning to add
more permutations using the same setup, and setup is executed once per
permutation, I'd like to propose to save cycles by creating all these
tables in the current schema like this:

    CREATE TABLE a (i int);
    CREATE TABLE b (i int);
    CREATE TABLE c (i int);
    CREATE FOREIGN TABLE fb (i int) SERVER loopback OPTIONS (table_name 'b');
    CREATE FOREIGN TABLE fc (i int) SERVER loopback OPTIONS (table_name 'c');

+step "s1_lock" {
+    SELECT a.i,
+       (SELECT 1 FROM sch1.b AS b, sch1.c AS c WHERE a.i = b.i AND b.i = c.i)
+    FROM sch1.a as a
+    FOR UPDATE;
+}

I think the important point in the test case discussed here is that
the sub-select has a foreign-join plan (without an alternative local
join plan).  So I'd like to propose to add an EXPLAIN command to this
step, to confirm that it has such a plan.

Also, how about s/s1_lock/s1_tuplock/?  This is just my taste, though.

+# "s1_lock" waits for concurrent update on the tuple on sch1.a table by
+# "s0_update", and once s0 transaction is committed it resumes and does EPQ
+# recheck for the locked tuple, which should not use postgresRecheckForeignScan
+# as the remote join is not a descendant join in the EPQ plan tree.
+permutation "s0_begin" "s0_update" "s1_begin" "s1_lock" "s0_commit"

Let's add to the permutation s1_commit as well, which I think is a
good practice.

Considering the permutation is a typical sequence to exercise an
EvalPlanQual recheck, I don't feel the need for the first part of the
comments, sorry.  Also, IMO I think the second part is a bit too
detailed.  How about simplifying the comments to something like this
(I used a comment for a similar test in
src/test/isolation/specs/eval-plan-qual.spec.):

"This test exercises EvalPlanQual with a SubLink sub-select (which
should be unaffected by any EPQ recheck behavior in the outer query)"

Best regards,
Etsuro Fujita



Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Etsuro Fujita
Дата:
On Wed, Oct 1, 2025 at 7:53 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> On Wed, Sep 24, 2025 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > I changed the regression tests and used the fix proposed by
> > Fujita-san. Please review the attached new version patch.
>
> I reviewed the test part.  Here are my comments about it.
>
> You renamed the spec file to concurrent_update.spec.  It's a broader
> name to cover the test case discussed here, but seems a bit vague to
> me.  How about eval_plan_qual.spec like
> src/test/isolation/specs/eval-plan-qual.spec, instead?  Which I think
> is more specific.
>
> In the setup block in the test, you created schemas:
>
> +    CREATE SCHEMA sch1;
> +    CREATE TABLE sch1.a (i int);
> +    CREATE FOREIGN TABLE sch1.b (i int) SERVER loopback OPTIONS
> (schema_name 'sch2', table_name 'b');
> +    CREATE FOREIGN TABLE sch1.c (i int) SERVER loopback OPTIONS
> (schema_name 'sch2', table_name 'c');
> +
> +    CREATE SCHEMA sch2;
> +    CREATE TABLE sch2.b (i int);
> +    CREATE TABLE sch2.c (i int);
>
> But actually, we don't need these schemas.  As I'm planning to add
> more permutations using the same setup, and setup is executed once per
> permutation, I'd like to propose to save cycles by creating all these
> tables in the current schema like this:
>
>     CREATE TABLE a (i int);
>     CREATE TABLE b (i int);
>     CREATE TABLE c (i int);
>     CREATE FOREIGN TABLE fb (i int) SERVER loopback OPTIONS (table_name 'b');
>     CREATE FOREIGN TABLE fc (i int) SERVER loopback OPTIONS (table_name 'c');
>
> +step "s1_lock" {
> +    SELECT a.i,
> +       (SELECT 1 FROM sch1.b AS b, sch1.c AS c WHERE a.i = b.i AND b.i = c.i)
> +    FROM sch1.a as a
> +    FOR UPDATE;
> +}
>
> I think the important point in the test case discussed here is that
> the sub-select has a foreign-join plan (without an alternative local
> join plan).  So I'd like to propose to add an EXPLAIN command to this
> step, to confirm that it has such a plan.
>
> Also, how about s/s1_lock/s1_tuplock/?  This is just my taste, though.
>
> +# "s1_lock" waits for concurrent update on the tuple on sch1.a table by
> +# "s0_update", and once s0 transaction is committed it resumes and does EPQ
> +# recheck for the locked tuple, which should not use postgresRecheckForeignScan
> +# as the remote join is not a descendant join in the EPQ plan tree.
> +permutation "s0_begin" "s0_update" "s1_begin" "s1_lock" "s0_commit"
>
> Let's add to the permutation s1_commit as well, which I think is a
> good practice.
>
> Considering the permutation is a typical sequence to exercise an
> EvalPlanQual recheck, I don't feel the need for the first part of the
> comments, sorry.  Also, IMO I think the second part is a bit too
> detailed.  How about simplifying the comments to something like this
> (I used a comment for a similar test in
> src/test/isolation/specs/eval-plan-qual.spec.):
>
> "This test exercises EvalPlanQual with a SubLink sub-select (which
> should be unaffected by any EPQ recheck behavior in the outer query)"

Some more comments:

+teardown
+{
+}

I think it's good to add cleanup statements here; otherwise let's
remove this for now, to suppress a "teardown failed:" error.

--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -16,6 +16,8 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql postgres_fdw--1.0--1.1.sql
postgres_fdw--1.1--1.2.sql

+ISOLATION = concurrent_update
+

I'd like to propose to add "ISOLATION_OPTS =
--load-extension=postgres_fdw", by which we don't need to do CREATE
EXTENSION in the setup block.

That's it.

Best regards,
Etsuro Fujita



Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Masahiko Sawada
Дата:
On Thu, Oct 2, 2025 at 4:01 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
>
> On Wed, Oct 1, 2025 at 7:53 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> > On Wed, Sep 24, 2025 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > I changed the regression tests and used the fix proposed by
> > > Fujita-san. Please review the attached new version patch.
> >
> > I reviewed the test part.  Here are my comments about it.
> >
> > You renamed the spec file to concurrent_update.spec.  It's a broader
> > name to cover the test case discussed here, but seems a bit vague to
> > me.  How about eval_plan_qual.spec like
> > src/test/isolation/specs/eval-plan-qual.spec, instead?  Which I think
> > is more specific.
> >
> > In the setup block in the test, you created schemas:
> >
> > +    CREATE SCHEMA sch1;
> > +    CREATE TABLE sch1.a (i int);
> > +    CREATE FOREIGN TABLE sch1.b (i int) SERVER loopback OPTIONS
> > (schema_name 'sch2', table_name 'b');
> > +    CREATE FOREIGN TABLE sch1.c (i int) SERVER loopback OPTIONS
> > (schema_name 'sch2', table_name 'c');
> > +
> > +    CREATE SCHEMA sch2;
> > +    CREATE TABLE sch2.b (i int);
> > +    CREATE TABLE sch2.c (i int);
> >
> > But actually, we don't need these schemas.  As I'm planning to add
> > more permutations using the same setup, and setup is executed once per
> > permutation, I'd like to propose to save cycles by creating all these
> > tables in the current schema like this:
> >
> >     CREATE TABLE a (i int);
> >     CREATE TABLE b (i int);
> >     CREATE TABLE c (i int);
> >     CREATE FOREIGN TABLE fb (i int) SERVER loopback OPTIONS (table_name 'b');
> >     CREATE FOREIGN TABLE fc (i int) SERVER loopback OPTIONS (table_name 'c');
> >
> > +step "s1_lock" {
> > +    SELECT a.i,
> > +       (SELECT 1 FROM sch1.b AS b, sch1.c AS c WHERE a.i = b.i AND b.i = c.i)
> > +    FROM sch1.a as a
> > +    FOR UPDATE;
> > +}
> >
> > I think the important point in the test case discussed here is that
> > the sub-select has a foreign-join plan (without an alternative local
> > join plan).  So I'd like to propose to add an EXPLAIN command to this
> > step, to confirm that it has such a plan.
> >
> > Also, how about s/s1_lock/s1_tuplock/?  This is just my taste, though.
> >
> > +# "s1_lock" waits for concurrent update on the tuple on sch1.a table by
> > +# "s0_update", and once s0 transaction is committed it resumes and does EPQ
> > +# recheck for the locked tuple, which should not use postgresRecheckForeignScan
> > +# as the remote join is not a descendant join in the EPQ plan tree.
> > +permutation "s0_begin" "s0_update" "s1_begin" "s1_lock" "s0_commit"
> >
> > Let's add to the permutation s1_commit as well, which I think is a
> > good practice.
> >
> > Considering the permutation is a typical sequence to exercise an
> > EvalPlanQual recheck, I don't feel the need for the first part of the
> > comments, sorry.  Also, IMO I think the second part is a bit too
> > detailed.  How about simplifying the comments to something like this
> > (I used a comment for a similar test in
> > src/test/isolation/specs/eval-plan-qual.spec.):
> >
> > "This test exercises EvalPlanQual with a SubLink sub-select (which
> > should be unaffected by any EPQ recheck behavior in the outer query)"
>
> Some more comments:
>
> +teardown
> +{
> +}
>
> I think it's good to add cleanup statements here; otherwise let's
> remove this for now, to suppress a "teardown failed:" error.
>
> --- a/contrib/postgres_fdw/Makefile
> +++ b/contrib/postgres_fdw/Makefile
> @@ -16,6 +16,8 @@ SHLIB_LINK_INTERNAL = $(libpq)
>  EXTENSION = postgres_fdw
>  DATA = postgres_fdw--1.0.sql postgres_fdw--1.0--1.1.sql
> postgres_fdw--1.1--1.2.sql
>
> +ISOLATION = concurrent_update
> +
>
> I'd like to propose to add "ISOLATION_OPTS =
> --load-extension=postgres_fdw", by which we don't need to do CREATE
> EXTENSION in the setup block.
>
> That's it.
>

Thank you for reviewing the patch!

I've updated the patch based on your comments. Please find the attached patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Michael Paquier
Дата:
On Thu, Oct 02, 2025 at 03:20:32PM -0700, Masahiko Sawada wrote:
> I've updated the patch based on your comments. Please find the attached patch.

Looks sensible here in shape, a few style nits noticed while reading.
x
+# This test exerices EvalPlanQual with a SubLink sub-select (which should
+# be unaffected by any EPQ recheck behavior in the outer query)

The setup phase has a mix of tabs (used in the DO block) and
spaces used, that may render weirdly depending on one's setup when
showing the diffs.  It looks to me that your intention is to use
spaces.

s/exerices/exercises/  And I would add a short explanation about the
reason why this script exists at the top of eval_plan_qual.spec.  If
you don't feel strongly about that, feel free to ignore me.

One other bit I have noticed on the way, due to the extra whitespaces
at the bottom of the file:
$ git diff master --check
contrib/postgres_fdw/specs/eval_plan_qual.spec:52: new blank line at EOF.
--
Michael

Вложения

Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

От
Etsuro Fujita
Дата:
On Fri, Oct 3, 2025 at 7:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I've updated the patch based on your comments. Please find the attached patch.

Thanks for updating the patch!

+step "s1_tuplock" {
+    EXPLAIN (COSTS OFF, ANALYZE ON, TIMING OFF, SUMMARY OFF, BUFFERS OFF)
+    SELECT a.i,
+       (SELECT 1 FROM fb, fc WHERE a.i = fb.i AND fb.i = fc.i)
+    FROM a as a
+    FOR UPDATE;
+}

Maybe my comment about this step was not enough, but I'm wondering we
should run EXPLAIN and then SELECT here like below, rather than
running EXPLAIN ANALYZE, as that seems more usual to me:

step "s1_tuplock" {
    EXPLAIN (VERBOSE, COSTS OFF)
    SELECT a.i,
       (SELECT 1 FROM fb, fc WHERE a.i = fb.i AND fb.i = fc.i)
    FROM a
    FOR UPDATE;
    SELECT a.i,
       (SELECT 1 FROM fb, fc WHERE a.i = fb.i AND fb.i = fc.i)
    FROM a
    FOR UPDATE;
}

I added the VERBOSE option to show the remote query, and removed the
alias for a.

BTW: you added quotation marks around a name for each session or step
like "s1_tuplock".  Do we really need them?  This is nitpicking,
though.

Best regards,
Etsuro Fujita