Обсуждение: BUG #17344: Assert failed on queiring async_capable foreign table with inheritance

Поиск
Список
Период
Сортировка

BUG #17344: Assert failed on queiring async_capable foreign table with inheritance

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17344
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 14.1
Operating system:   Ubuntu 20.04
Description:

When executing the following query:
create extension postgres_fdw;
do $d$
    begin
        execute $$create server loopback foreign data wrapper postgres_fdw
            options (dbname '$$||current_database()||$$',
                     port '$$||current_setting('port')||$$',
                     async_capable 'true'
            )$$;
    end;
$d$;
create user mapping for current_user server loopback;

create table tab1 (a int);
insert into tab1 values(1), (2), (3);
create foreign table ftab1 (b int) server loopback options (table_name
'tab1');

create foreign table ftab2 () inherits (ftab1) server loopback;
select a from tab1 where a in (select a from ftab1);

I get an assertion failed with the stacktrace:
Core was generated by `postgres: law regression [local] SELECT
                        '.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f2a2f9fa859 in __GI_abort () at abort.c:79
#2  0x0000557dece3c5d8 in ExceptionalCondition
(conditionName=conditionName@entry=0x7f2a225a8da7 "areq != pendingAreq", 
    errorType=errorType@entry=0x7f2a225a76aa "FailedAssertion",
fileName=0x7ffde2262540 "\271\305\343\354}U", 
    fileName@entry=0x7f2a225a88af "postgres_fdw.c",
lineNumber=lineNumber@entry=6974) at assert.c:69
#3  0x00007f2a225a4e34 in produce_tuple_asynchronously (areq=<optimized
out>, fetch=<optimized out>)
    at postgres_fdw.c:6974
#4  0x0000557decb6e152 in ExecAsyncRequest (areq=0x557ded812670) at
execAsync.c:38
#5  0x0000557decb94e4e in ExecAppendAsyncBegin (node=0x557ded7f2040) at
nodeAppend.c:907
#6  ExecAppend (pstate=0x557ded7f2040) at nodeAppend.c:317
#7  0x0000557decbb0183 in ExecProcNode (node=0x557ded7f2040) at
../../../src/include/executor/executor.h:257
#8  ExecResult (pstate=0x557ded7f1e38) at nodeResult.c:115
#9  0x0000557decbb263d in ExecProcNode (node=0x557ded7f1e38) at
../../../src/include/executor/executor.h:257
#10 ExecScanSubPlan (isNull=0x557ded81ba35, econtext=0x557ded81b7e8,
node=0x557ded81c2e0) at nodeSubplan.c:323
#11 ExecSubPlan (node=node@entry=0x557ded81c2e0,
econtext=econtext@entry=0x557ded81b7e8, isNull=0x557ded81ba35)
    at nodeSubplan.c:89
#12 0x0000557decb77beb in ExecEvalSubPlan (econtext=0x557ded81b7e8,
op=<optimized out>, state=0x557ded81ba30)
    at execExprInterp.c:3932
#13 ExecInterpExpr (state=0x557ded81ba30, econtext=0x557ded81b7e8,
isnull=<optimized out>) at execExprInterp.c:1564
#14 0x0000557decb865d5 in ExecEvalExprSwitchContext (isNull=0x7ffde2262ad7,
econtext=0x557ded81b7e8, 
    state=0x557ded81ba30) at ../../../src/include/executor/executor.h:339
#15 ExecQual (econtext=0x557ded81b7e8, state=0x557ded81ba30) at
../../../src/include/executor/executor.h:408
#16 ExecScan (node=0x557ded7f3748, accessMtd=0x557decbb0ad0 <SeqNext>,
recheckMtd=0x557decbb0ac0 <SeqRecheck>)
    at execScan.c:227
#17 0x0000557decb7cc7a in ExecProcNode (node=0x557ded7f3748) at
../../../src/include/executor/executor.h:257
#18 ExecutePlan (execute_once=<optimized out>, dest=0x557ded80a428,
direction=<optimized out>, numberTuples=0, 
    sendTuples=<optimized out>, operation=CMD_SELECT,
use_parallel_mode=<optimized out>, planstate=0x557ded7f3748, 
    estate=0x557ded7f1b90) at execMain.c:1551
#19 standard_ExecutorRun (queryDesc=0x557ded800c20, direction=<optimized
out>, count=0, execute_once=<optimized out>)
    at execMain.c:361
#20 0x0000557decd18b9c in PortalRunSelect (portal=0x557ded756860,
forward=<optimized out>, count=0, 
    dest=<optimized out>) at pquery.c:921
#21 0x0000557decd1a2cb in PortalRun (portal=portal@entry=0x557ded756860,
count=count@entry=9223372036854775807, 
    isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true,
dest=dest@entry=0x557ded80a428, 
    altdest=altdest@entry=0x557ded80a428, qc=0x7ffde2262d80) at
pquery.c:765
#22 0x0000557decd1614b in exec_simple_query (
    query_string=0x557ded6e7960 "select a from tab1 where a in (select a
from ftab1);") at postgres.c:1214
#23 0x0000557decd17d51 in PostgresMain (argc=argc@entry=1,
argv=argv@entry=0x7ffde22631f0, dbname=<optimized out>, 
    username=<optimized out>) at postgres.c:4486
#24 0x0000557decc83c8d in BackendRun (port=0x557ded70b3d0,
port=0x557ded70b3d0) at postmaster.c:4530
#25 BackendStartup (port=0x557ded70b3d0) at postmaster.c:4252
#26 ServerLoop () at postmaster.c:1745
#27 0x0000557decc84c31 in PostmasterMain (argc=<optimized out>,
argv=<optimized out>) at postmaster.c:1417
#28 0x0000557dec9b24b2 in main (argc=3, argv=0x557ded6e1990) at main.c:209

With async_capable = 'false' I get expected results.
But when ftab1 defined as
create foreign table ftab1 (a int) server loopback options (table_name
'tab1');
The select fails with:
psql:query.sql:19: ERROR:  could not connect to server "loopback"
DETAIL:  connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL:
sorry, too many clients already
CONTEXT:  remote SQL command: SELECT a FROM public.ftab2
remote SQL command: SELECT a FROM public.ftab2
remote SQL command: SELECT a FROM public.ftab2
remote SQL command: SELECT a FROM public.ftab2
...
with or without async_capable.


Re: BUG #17344: Assert failed on queiring async_capable foreign table with inheritance

От
Dmitry Dolgov
Дата:
> On Sat, Dec 25, 2021 at 10:00:01AM +0000, PG Bug reporting form wrote:
>
> create extension postgres_fdw;
> do $d$
>     begin
>         execute $$create server loopback foreign data wrapper postgres_fdw
>             options (dbname '$$||current_database()||$$',
>                      port '$$||current_setting('port')||$$',
>                      async_capable 'true'
>             )$$;
>     end;
> $d$;
> create user mapping for current_user server loopback;
>
> create table tab1 (a int);
> insert into tab1 values(1), (2), (3);
> create foreign table ftab1 (b int) server loopback options (table_name
> 'tab1');
>
> create foreign table ftab2 () inherits (ftab1) server loopback;
> select a from tab1 where a in (select a from ftab1);
>
> With async_capable = 'false' I get expected results.
> But when ftab1 defined as
> create foreign table ftab1 (a int) server loopback options (table_name
> 'tab1');
> The select fails with:
> psql:query.sql:19: ERROR:  could not connect to server "loopback"
> DETAIL:  connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL:
> sorry, too many clients already
> CONTEXT:  remote SQL command: SELECT a FROM public.ftab2
> remote SQL command: SELECT a FROM public.ftab2
> remote SQL command: SELECT a FROM public.ftab2
> remote SQL command: SELECT a FROM public.ftab2

Apparently in both cases the second foreign scan of append ends up in
recursion:

    ->  Foreign Scan on public.ftab2 ftab1_2
        Remote SQL: SELECT NULL FROM public.ftab2

This leads to a chain of the same cursor declarations:

    DECLARE c1 CURSOR FOR SELECT a FROM public.ftab4

With a pattern:

    bind -> new backend with the same declare -> execute -> bind -> ...

In the case of async foreign scan it hits limitation when no new
requests be sent while there is a pending one. In the second case it
hits connection limit. Shouldn't this type of foreign paths be avoided
by FDW?



Re: BUG #17344: Assert failed on queiring async_capable foreign table with inheritance

От
Etsuro Fujita
Дата:
On Sun, Dec 26, 2021 at 8:26 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > On Sat, Dec 25, 2021 at 10:00:01AM +0000, PG Bug reporting form wrote:
> >
> > create extension postgres_fdw;
> > do $d$
> >     begin
> >         execute $$create server loopback foreign data wrapper postgres_fdw
> >             options (dbname '$$||current_database()||$$',
> >                      port '$$||current_setting('port')||$$',
> >                      async_capable 'true'
> >             )$$;
> >     end;
> > $d$;
> > create user mapping for current_user server loopback;
> >
> > create table tab1 (a int);
> > insert into tab1 values(1), (2), (3);
> > create foreign table ftab1 (b int) server loopback options (table_name
> > 'tab1');
> >
> > create foreign table ftab2 () inherits (ftab1) server loopback;
> > select a from tab1 where a in (select a from ftab1);
> >
> > With async_capable = 'false' I get expected results.
> > But when ftab1 defined as
> > create foreign table ftab1 (a int) server loopback options (table_name
> > 'tab1');
> > The select fails with:
> > psql:query.sql:19: ERROR:  could not connect to server "loopback"
> > DETAIL:  connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL:
> > sorry, too many clients already
> > CONTEXT:  remote SQL command: SELECT a FROM public.ftab2
> > remote SQL command: SELECT a FROM public.ftab2
> > remote SQL command: SELECT a FROM public.ftab2
> > remote SQL command: SELECT a FROM public.ftab2

Reproduced here.

> Apparently in both cases the second foreign scan of append ends up in
> recursion:
>
>     ->  Foreign Scan on public.ftab2 ftab1_2
>         Remote SQL: SELECT NULL FROM public.ftab2
>
> This leads to a chain of the same cursor declarations:
>
>     DECLARE c1 CURSOR FOR SELECT a FROM public.ftab4
>
> With a pattern:
>
>     bind -> new backend with the same declare -> execute -> bind -> ...
>
> In the case of async foreign scan it hits limitation when no new
> requests be sent while there is a pending one.

Yeah, but when executing a simple query “select NULL from ftab1”,
which would essentially be the same as the sublink subquery used in
the first case, I don’t get the assertion failure.  (The simple query
only throws the connection limit error.)  The root cause of the
assertion failure in the first case might be something other than the
limitation.  I’ll look into this in more detail.

Thanks, Alexander and Dmitry!

Best regards,
Etsuro Fujita



Re: BUG #17344: Assert failed on queiring async_capable foreign table with inheritance

От
Etsuro Fujita
Дата:
On Tue, Dec 28, 2021 at 10:14 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> The root cause of the
> assertion failure in the first case might be something other than the
> limitation.  I’ll look into this in more detail.

I think the root cause is that we fail to process a pending async
request (if any) in postgresReScanForeignScan() in the case where we
just reset the next_tuple counter in that function, without
destroying/recreating or rewinding the cursor.  (In the case where we
destroy/recreate or rewind the cursor in that function, the pending
async request would be processed by pgfdw_exec_query() called from
that function.)  This breaks the assumption about ExecAppend() that
when called for the first time after ReScan, there are no pending
async requests made for async subplans for the Append node, causing
the assertion failure in postgresForeignAsyncRequest() called from
that ExecAppend().  My oversight in commit 27e1f1456.  :-(

To fix, I modified postgresReScanForeignScan() so that we always
process a pending async request (if any) before restarting the foreign
scan.  Attached is a patch for that.  I tested the patch with the
first case, and it addresses the assertion failure.

Best regards,
Etsuro Fujita

Вложения

Re: BUG #17344: Assert failed on queiring async_capable foreign table with inheritance

От
Dmitry Dolgov
Дата:
> On Fri, Dec 31, 2021 at 04:36:42PM +0900, Etsuro Fujita wrote:
> On Tue, Dec 28, 2021 at 10:14 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> > The root cause of the
> > assertion failure in the first case might be something other than the
> > limitation.  I’ll look into this in more detail.
>
> I think the root cause is that we fail to process a pending async
> request (if any) in postgresReScanForeignScan() in the case where we
> just reset the next_tuple counter in that function, without
> destroying/recreating or rewinding the cursor.  (In the case where we
> destroy/recreate or rewind the cursor in that function, the pending
> async request would be processed by pgfdw_exec_query() called from
> that function.)  This breaks the assumption about ExecAppend() that
> when called for the first time after ReScan, there are no pending
> async requests made for async subplans for the Append node, causing
> the assertion failure in postgresForeignAsyncRequest() called from
> that ExecAppend().  My oversight in commit 27e1f1456.  :-(
>
> To fix, I modified postgresReScanForeignScan() so that we always
> process a pending async request (if any) before restarting the foreign
> scan.  Attached is a patch for that.  I tested the patch with the
> first case, and it addresses the assertion failure.

Yep, makes sense now, thank you. The fix works for me, but I'm curious
about the requestee condition:

    fsstate->conn_state->pendingAreq->requestee == (PlanState *) node)

You've mentioned that in those cases where the cursor will be
destroyed/recreated or rewind, pgfdw_exec_query will take care of
processing pending requests, and looks like it will do this without
checking the requestee. Just for me to understand, why is this condition
necessary?



Re: BUG #17344: Assert failed on queiring async_capable foreign table with inheritance

От
Alexander Lakhin
Дата:
Hello Etsuro-san,
31.12.2021 10:36, Etsuro Fujita wrote:
> On Tue, Dec 28, 2021 at 10:14 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
>> The root cause of the
>> assertion failure in the first case might be something other than the
>> limitation.  I’ll look into this in more detail.
> To fix, I modified postgresReScanForeignScan() so that we always
> process a pending async request (if any) before restarting the foreign
> scan.  Attached is a patch for that.  I tested the patch with the
> first case, and it addresses the assertion failure.
Thanks for the fix! I can confirm that it eliminates the failure and
propose a simple test case for the issue. Maybe you will find it useful.

Best regards,
Alexander

Вложения

Re: BUG #17344: Assert failed on queiring async_capable foreign table with inheritance

От
Etsuro Fujita
Дата:
On Fri, Dec 31, 2021 at 9:32 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > On Fri, Dec 31, 2021 at 04:36:42PM +0900, Etsuro Fujita wrote:
> > I think the root cause is that we fail to process a pending async
> > request (if any) in postgresReScanForeignScan() in the case where we
> > just reset the next_tuple counter in that function, without
> > destroying/recreating or rewinding the cursor.  (In the case where we
> > destroy/recreate or rewind the cursor in that function, the pending
> > async request would be processed by pgfdw_exec_query() called from
> > that function.)  This breaks the assumption about ExecAppend() that
> > when called for the first time after ReScan, there are no pending
> > async requests made for async subplans for the Append node, causing
> > the assertion failure in postgresForeignAsyncRequest() called from
> > that ExecAppend().  My oversight in commit 27e1f1456.  :-(
> >
> > To fix, I modified postgresReScanForeignScan() so that we always
> > process a pending async request (if any) before restarting the foreign
> > scan.  Attached is a patch for that.  I tested the patch with the
> > first case, and it addresses the assertion failure.
>
> Yep, makes sense now, thank you. The fix works for me, but I'm curious
> about the requestee condition:
>
>     fsstate->conn_state->pendingAreq->requestee == (PlanState *) node)
>
> You've mentioned that in those cases where the cursor will be
> destroyed/recreated or rewind, pgfdw_exec_query will take care of
> processing pending requests, and looks like it will do this without
> checking the requestee. Just for me to understand, why is this condition
> necessary?

The reason is that the pending request might not need to be processed.
Consider this query:

alter server loopback1 options (add async_capable 'true');
alter server loopback1 options (add use_remote_estimate 'true');
create table a (aa int, constraint a_pkey primary key (aa));
create table loct (aa int, bb text, constraint loct_pkey primary key (aa));
create foreign table b (bb text) inherits (a) server loopback1 options
(table_name 'loct');
insert into a select i from generate_series(1, 10000) i;
insert into b select i, to_char(i, 'FM00000') from generate_series(1, 10000) i;
analyze loct;
analyze a;

explain verbose select * from a t1, a t2 where t1.aa = t2.aa limit 1;
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..54.64 rows=1 width=8)
   Output: t1.aa, t2.aa
   ->  Nested Loop  (cost=0.29..2174178.00 rows=40000 width=8)
         Output: t1.aa, t2.aa
         ->  Append  (cost=0.00..700.00 rows=20000 width=4)
               ->  Seq Scan on public.a t1_1  (cost=0.00..145.00
rows=10000 width=4)
                     Output: t1_1.aa
               ->  Async Foreign Scan on public.b t1_2
(cost=100.00..455.00 rows=10000 width=4)
                     Output: t1_2.aa
                     Remote SQL: SELECT aa FROM public.loct
         ->  Append  (cost=0.29..108.65 rows=2 width=4)
               ->  Index Only Scan using a_pkey on public.a t2_1
(cost=0.29..0.31 rows=1 width=4)
                     Output: t2_1.aa
                     Index Cond: (t2_1.aa = t1.aa)
               ->  Async Foreign Scan on public.b t2_2
(cost=100.30..108.33 rows=1 width=4)
                     Output: t2_2.aa
                     Remote SQL: SELECT aa FROM public.loct WHERE
(($1::integer = aa))
(17 rows)

Note that when starting the scan for the outer side, we’ll begin an
asynchronous fetch for the foreign table on the outer side, but we
won’t wait for the results of it until after we get all tuples from
the local table on the outer side (see the documentation note about
the async_capable option for postgres_fdw [1]).  So when rescanning
the inner side after getting the first tuple from the local table on
the outer side, the asynchronous fetch begun for the foreign table on
the outer side will still be in progress, and we’ll have it as
pendingAreq when called from postgresReScanForeignScan() for
rescanning the foreign table on the inner side.  BUT it would be
useless to process the pending request there, because even without
processing the pending request, we can get a matching tuple for the
outer tuple from the local table on the inner side, as the local table
will be scanned first even in the inner side, and then return the
query result.  I think the condition mentioned above is useful to
avoid processing such a useless pending request and return result
tuples more quickly.

Thanks for reviewing!

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/docs/14/postgres-fdw.html#id-1.11.7.44.11



Re: BUG #17344: Assert failed on queiring async_capable foreign table with inheritance

От
Etsuro Fujita
Дата:
Hi Alexander,

On Mon, Jan 3, 2022 at 4:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
> 31.12.2021 10:36, Etsuro Fujita wrote:
> > To fix, I modified postgresReScanForeignScan() so that we always
> > process a pending async request (if any) before restarting the foreign
> > scan.  Attached is a patch for that.  I tested the patch with the
> > first case, and it addresses the assertion failure.

> Thanks for the fix! I can confirm that it eliminates the failure and

Good to know.  Thanks for testing!

> propose a simple test case for the issue. Maybe you will find it useful.

Thanks, but in my environment, the test case doesn’t cause any
failure.  How about something like the attached, which is made by
modifying the original test case to avoid the connection-limit-error
issue.  In the attached, I also added a comment to a function to match
other places.

Best regards,
Etsuro Fujita

Вложения

Re: BUG #17344: Assert failed on queiring async_capable foreign table with inheritance

От
Alexander Lakhin
Дата:
07.01.2022 09:15, Etsuro Fujita wrote:
> On Mon, Jan 3, 2022 at 4:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
>
>> propose a simple test case for the issue. Maybe you will find it useful.
> Thanks, but in my environment, the test case doesn’t cause any
> failure. 
It's interesting. I perform on REL_14_STABLE (61c8da50) the following:
wget
https://www.postgresql.org/message-id/attachment/129455/posgres_fdw.sql.patch
-O - | git apply
./configure --enable-debug --enable-cassert >/dev/null && make -j8
>/dev/null && make -j8 -C contrib >/dev/null && make check -C
contrib/postgres_fdw/
and get:
test postgres_fdw                 ... FAILED (test process exited with
exit code 2)     1116 ms

>  How about something like the attached, which is made by
> modifying the original test case to avoid the connection-limit-error
> issue.  In the attached, I also added a comment to a function to match
> other places.
Your modified test fails with unpatched postgres_fdw.c too (as
expected). I would prefer your test as more elaborated.

Best regards,
Alexander



Re: BUG #17344: Assert failed on queiring async_capable foreign table with inheritance

От
Etsuro Fujita
Дата:
On Sat, Jan 8, 2022 at 1:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> 07.01.2022 09:15, Etsuro Fujita wrote:
> > On Mon, Jan 3, 2022 at 4:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
> >> propose a simple test case for the issue. Maybe you will find it useful.
> > Thanks, but in my environment, the test case doesn’t cause any
> > failure.
> It's interesting. I perform on REL_14_STABLE (61c8da50) the following:
> wget
> https://www.postgresql.org/message-id/attachment/129455/posgres_fdw.sql.patch
> -O - | git apply
> ./configure --enable-debug --enable-cassert >/dev/null && make -j8
> >/dev/null && make -j8 -C contrib >/dev/null && make check -C
> contrib/postgres_fdw/
> and get:
> test postgres_fdw                 ... FAILED (test process exited with
> exit code 2)     1116 ms

Reproduced.  Sorry for that.  Maybe, I tested this on a
non-assertion-enabled build by mistake.  :-(

> >  How about something like the attached, which is made by
> > modifying the original test case to avoid the connection-limit-error
> > issue.  In the attached, I also added a comment to a function to match
> > other places.
> Your modified test fails with unpatched postgres_fdw.c too (as
> expected). I would prefer your test as more elaborated.

Ok, but I modified the added test case a bit further so that foreign
tables are well-defined like in your second test case, which I think
is good.  Attached is an updated version of the patch.

Thanks!

Best regards,
Etsuro Fujita

Вложения

Re: BUG #17344: Assert failed on queiring async_capable foreign table with inheritance

От
Alexander Lakhin
Дата:
Hello Etsuro-san,
08.01.2022 11:57, Etsuro Fujita wrote:
> Ok, but I modified the added test case a bit further so that foreign
> tables are well-defined like in your second test case, which I think
> is good.  Attached is an updated version of the patch.
Thank you! The updated version looks even better to me.

Best regards,
Alexander



Re: BUG #17344: Assert failed on queiring async_capable foreign table with inheritance

От
Etsuro Fujita
Дата:
On Sun, Jan 9, 2022 at 4:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
> 08.01.2022 11:57, Etsuro Fujita wrote:
> > Ok, but I modified the added test case a bit further so that foreign
> > tables are well-defined like in your second test case, which I think
> > is good.  Attached is an updated version of the patch.
> Thank you! The updated version looks even better to me.

Thanks for reviewing!  I've committed the patch.

Best regards,
Etsuro Fujita