Обсуждение: Server crashed with dense_rank on partition table.

Поиск
Список
Период
Сортировка

Server crashed with dense_rank on partition table.

От
Rajkumar Raghuwanshi
Дата:
Hi,

I am getting server crash with below query.

CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000', '0001', '0002', '0003');
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0004', '0005', '0006', '0007');
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0008', '0009', '0010', '0011');
INSERT INTO pagg_tab SELECT i % 20, i % 30, to_char(i % 12, 'FM0000') FROM generate_series(0, 36) i;
ANALYZE pagg_tab;
SELECT dense_rank(b) WITHIN GROUP (ORDER BY a) FROM pagg_tab GROUP BY b ORDER BY 1;

postgres=# SELECT dense_rank(b) WITHIN GROUP (ORDER BY a) FROM pagg_tab GROUP BY b ORDER BY 1;
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

logfile have this

2018-06-12 21:29:54.930 IST [69580] STATEMENT:  drop table pagg_tab;
TRAP: BadArgument("!(((context) != ((void *)0) && (((((const Node*)((context)))->type) == T_AllocSetContext) || ((((const Node*)((context)))->type) == T_SlabContext) || ((((const Node*)((context)))->type) == T_GenerationContext))))", File: "mcxt.c", Line: 775)
2018-06-12 21:29:55.552 IST [69571] LOG:  server process (PID 69580) was terminated by signal 6: Aborted
2018-06-12 21:29:55.552 IST [69571] DETAIL:  Failed process was running: SELECT dense_rank(b) WITHIN GROUP (ORDER BY a) FROM pagg_tab GROUP BY b ORDER BY 1;
2018-06-12 21:29:55.552 IST [69571] LOG:  terminating any other active server processes

and here is core file content

Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `postgres: edb postgres [local] SELECT                   '.
Program terminated with signal 6, Aborted.
#0  0x0000003dd2632495 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64      return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-23.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  0x0000003dd2632495 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003dd2633c75 in abort () at abort.c:92
#2  0x0000000000a32622 in ExceptionalCondition (
    conditionName=0xc99320 "!(((context) != ((void *)0) && (((((const Node*)((context)))->type) == T_AllocSetContext) || ((((const Node*)((context)))->type) == T_SlabContext) || ((((const Node*)((context)))->type) == T_Generatio"..., errorType=0xc99312 "BadArgument", fileName=0xc993f5 "mcxt.c", lineNumber=775) at assert.c:54
#3  0x0000000000a6be3a in MemoryContextAlloc (context=0x134a708, size=8) at mcxt.c:775
#4  0x0000000000a6d0a6 in MemoryContextStrdup (context=0x134a708, string=0xc60413 "integer") at mcxt.c:1153
#5  0x0000000000a6d0e9 in pstrdup (in=0xc60413 "integer") at mcxt.c:1163
#6  0x0000000000927328 in format_type_extended (type_oid=23, typemod=-1, flags=0) at format_type.c:224
#7  0x00000000009275c4 in format_type_be (type_oid=23) at format_type.c:330
#8  0x00000000006d41ed in CheckVarSlotCompatibility (slot=0x134a6a8, attnum=1, vartype=23) at execExprInterp.c:1883
#9  0x00000000006d4062 in CheckExprStillValid (state=0x1388370, econtext=0x134a6a8) at execExprInterp.c:1823
#10 0x00000000006d3f5e in ExecInterpExprStillValid (state=0x1388370, econtext=0x134a6a8, isNull=0x7ffe988f3907) at execExprInterp.c:1780
#11 0x000000000098a116 in ExecEvalExprSwitchContext (state=0x1388370, econtext=0x134a6a8, isNull=0x7ffe988f3907) at ../../../../src/include/executor/executor.h:303
#12 0x000000000098a191 in ExecQual (state=0x1388370, econtext=0x134a6a8) at ../../../../src/include/executor/executor.h:372
#13 0x000000000098a1e3 in ExecQualAndReset (state=0x1388370, econtext=0x134a6a8) at ../../../../src/include/executor/executor.h:389
#14 0x000000000098cb96 in hypothetical_dense_rank_final (fcinfo=0x7ffe988f3a40) at orderedsetaggs.c:1389
#15 0x00000000006f3a5d in finalize_aggregate (aggstate=0x1368f98, peragg=0x1382a38, pergroupstate=0x1382be8, resultVal=0x13829f8, resultIsNull=0x1382a18) at nodeAgg.c:965
#16 0x00000000006f3ff0 in finalize_aggregates (aggstate=0x1368f98, peraggs=0x1382a38, pergroup=0x1382be8) at nodeAgg.c:1172
#17 0x00000000006f516a in agg_retrieve_direct (aggstate=0x1368f98) at nodeAgg.c:1887
#18 0x00000000006f4a6d in ExecAgg (pstate=0x1368f98) at nodeAgg.c:1551
#19 0x0000000000718972 in ExecProcNode (node=0x1368f98) at ../../../src/include/executor/executor.h:237
#20 0x0000000000718abe in ExecSort (pstate=0x1368e80) at nodeSort.c:107
#21 0x00000000006e6b26 in ExecProcNodeFirst (node=0x1368e80) at execProcnode.c:445
#22 0x00000000006dbd61 in ExecProcNode (node=0x1368e80) at ../../../src/include/executor/executor.h:237
#23 0x00000000006de71b in ExecutePlan (estate=0x1368c68, planstate=0x1368e80, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0,
    direction=ForwardScanDirection, dest=0x137a928, execute_once=true) at execMain.c:1726
#24 0x00000000006dc34b in standard_ExecutorRun (queryDesc=0x1354318, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:363
#25 0x00000000006dc167 in ExecutorRun (queryDesc=0x1354318, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:306
#26 0x00000000008cadd2 in PortalRunSelect (portal=0x12f0c28, forward=true, count=0, dest=0x137a928) at pquery.c:932
#27 0x00000000008caa60 in PortalRun (portal=0x12f0c28, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x137a928, altdest=0x137a928,
    completionTag=0x7ffe988f43a0 "") at pquery.c:773
#28 0x00000000008c4a37 in exec_simple_query (query_string=0x128b798 "SELECT dense_rank(b) WITHIN GROUP (ORDER BY a) FROM pagg_tab GROUP BY b ORDER BY 1;") at postgres.c:1122
#29 0x00000000008c8d07 in PostgresMain (argc=1, argv=0x12b52a0, dbname=0x12b5100 "postgres", username=0x1288298 "edb") at postgres.c:4153
#30 0x00000000008264f7 in BackendRun (port=0x12ad060) at postmaster.c:4361
#31 0x0000000000825c65 in BackendStartup (port=0x12ad060) at postmaster.c:4033
#32 0x0000000000822047 in ServerLoop () at postmaster.c:1706
#33 0x0000000000821979 in PostmasterMain (argc=3, argv=0x12861f0) at postmaster.c:1379
#34 0x0000000000748bc4 in main (argc=3, argv=0x12861f0) at main.c:228

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Re: Server crashed with dense_rank on partition table.

От
Michael Paquier
Дата:
On Wed, Jun 13, 2018 at 11:08:38AM +0530, Rajkumar Raghuwanshi wrote:
> postgres=# SELECT dense_rank(b) WITHIN GROUP (ORDER BY a) FROM pagg_tab
> GROUP BY b ORDER BY 1;
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.

Indeed, thanks for the test case.  This used to work in v10 but this is
failing with v11 so I am adding an open item.  The plans of the pre-10
query and the query on HEAD are rather similar, and the memory context
at execution time looks messed up.
--
Michael

Вложения

Re: Server crashed with dense_rank on partition table.

От
David Rowley
Дата:
On 13 June 2018 at 17:55, Michael Paquier <michael@paquier.xyz> wrote:
> On Wed, Jun 13, 2018 at 11:08:38AM +0530, Rajkumar Raghuwanshi wrote:
>> postgres=# SELECT dense_rank(b) WITHIN GROUP (ORDER BY a) FROM pagg_tab
>> GROUP BY b ORDER BY 1;
>> server closed the connection unexpectedly
>>     This probably means the server terminated abnormally
>>     before or while processing the request.
>> The connection to the server was lost. Attempting reset: Failed.
>
> Indeed, thanks for the test case.  This used to work in v10 but this is
> failing with v11 so I am adding an open item.  The plans of the pre-10
> query and the query on HEAD are rather similar, and the memory context
> at execution time looks messed up.

Looks like some memory is being stomped on somewhere.

4b9094eb6 (Adapt to LLVM 7+ Orc API changes.) appears to be the first
bad commit.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Server crashed with dense_rank on partition table.

От
Amit Langote
Дата:
Hi.

On 2018/06/13 14:55, Michael Paquier wrote:
> On Wed, Jun 13, 2018 at 11:08:38AM +0530, Rajkumar Raghuwanshi wrote:
>> postgres=# SELECT dense_rank(b) WITHIN GROUP (ORDER BY a) FROM pagg_tab
>> GROUP BY b ORDER BY 1;
>> server closed the connection unexpectedly
>>     This probably means the server terminated abnormally
>>     before or while processing the request.
>> The connection to the server was lost. Attempting reset: Failed.
> 
> Indeed, thanks for the test case.  This used to work in v10 but this is
> failing with v11 so I am adding an open item.  The plans of the pre-10
> query and the query on HEAD are rather similar, and the memory context
> at execution time looks messed up.

Fwiw, I see that the crash can also occur even when using a
non-partitioned table in the query, as shown in the following example
which reuses Rajkumar's test data and query:

create table foo (a int, b int, c text);
postgres=# insert into foo select i%20, i%30, to_char(i%12, 'FM0000') from
generate_series(0, 36) i;

select dense_rank(b) within group (order by a) from foo group by b order by 1;
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Following query in the regression test suite can also be made to crash by
adding a group by clause:

select dense_rank(3) within group (order by x) from (values
(1),(1),(2),(2),(3),(3),(4)) v(x) group by (x);
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Looking at the core dump of this, it seems the following commit may be
relevant:

commit bf6c614a2f2c58312b3be34a47e7fb7362e07bcb
Author: Andres Freund <andres@anarazel.de>
Date:   Thu Feb 15 21:55:31 2018 -0800

    Do execGrouping.c via expression eval machinery, take two.

Thanks,
Amit



Re: Server crashed with dense_rank on partition table.

От
Amit Langote
Дата:
On 2018/06/13 16:35, Amit Langote wrote:
> Fwiw, I see that the crash can also occur even when using a
> non-partitioned table in the query, as shown in the following example
> which reuses Rajkumar's test data and query:
> 
> create table foo (a int, b int, c text);
> postgres=# insert into foo select i%20, i%30, to_char(i%12, 'FM0000') from
> generate_series(0, 36) i;
> 
> select dense_rank(b) within group (order by a) from foo group by b order by 1;
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> 
> Following query in the regression test suite can also be made to crash by
> adding a group by clause:
> 
> select dense_rank(3) within group (order by x) from (values
> (1),(1),(2),(2),(3),(3),(4)) v(x) group by (x);
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> 
> Looking at the core dump of this, it seems the following commit may be
> relevant:
> 
> commit bf6c614a2f2c58312b3be34a47e7fb7362e07bcb
> Author: Andres Freund <andres@anarazel.de>
> Date:   Thu Feb 15 21:55:31 2018 -0800
> 
>     Do execGrouping.c via expression eval machinery, take two.

I studied this a bit and found a bug that's causing the crash.

The above mentioned commit has this hunk:

@@ -1309,6 +1311,9 @@ hypothetical_dense_rank_final(PG_FUNCTION_ARGS)
         PG_RETURN_INT64(rank);

     osastate = (OSAPerGroupState *) PG_GETARG_POINTER(0);
+    econtext = osastate->qstate->econtext;
+    if (!econtext)
+        osastate->qstate->econtext = econtext =
CreateStandaloneExprContext();

In CreateStandloneExprContext(), we have this:

    econtext->ecxt_per_query_memory = CurrentMemoryContext;

    /*
     * Create working memory for expression evaluation in this context.
     */
    econtext->ecxt_per_tuple_memory =
        AllocSetContextCreate(CurrentMemoryContext,
                              "ExprContext",
                              ALLOCSET_DEFAULT_SIZES);

I noticed when debugging the crashing query that CurrentMemoryContext is
actually per-tuple memory context of some expression context of the
calling code, which would get reset before getting here again.  So, it's
wrong of hypothetical_dense_rank_final to call CreateStandloneExprContext
without first switching to an actual per-query context.

Attached patch seems to fix the crash.

Thanks,
Amit

Вложения

Re: Server crashed with dense_rank on partition table.

От
Andres Freund
Дата:
On 2018-06-13 16:35:58 +0900, Amit Langote wrote:
> Hi.
> 
> On 2018/06/13 14:55, Michael Paquier wrote:
> > On Wed, Jun 13, 2018 at 11:08:38AM +0530, Rajkumar Raghuwanshi wrote:
> >> postgres=# SELECT dense_rank(b) WITHIN GROUP (ORDER BY a) FROM pagg_tab
> >> GROUP BY b ORDER BY 1;
> >> server closed the connection unexpectedly
> >>     This probably means the server terminated abnormally
> >>     before or while processing the request.
> >> The connection to the server was lost. Attempting reset: Failed.
> > 
> > Indeed, thanks for the test case.  This used to work in v10 but this is
> > failing with v11 so I am adding an open item.  The plans of the pre-10
> > query and the query on HEAD are rather similar, and the memory context
> > at execution time looks messed up.
> 
> Fwiw, I see that the crash can also occur even when using a
> non-partitioned table in the query, as shown in the following example
> which reuses Rajkumar's test data and query:
> 
> create table foo (a int, b int, c text);
> postgres=# insert into foo select i%20, i%30, to_char(i%12, 'FM0000') from
> generate_series(0, 36) i;
> 
> select dense_rank(b) within group (order by a) from foo group by b order by 1;
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> 
> Following query in the regression test suite can also be made to crash by
> adding a group by clause:
> 
> select dense_rank(3) within group (order by x) from (values
> (1),(1),(2),(2),(3),(3),(4)) v(x) group by (x);
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> 
> Looking at the core dump of this, it seems the following commit may be
> relevant:
> 
> commit bf6c614a2f2c58312b3be34a47e7fb7362e07bcb
> Author: Andres Freund <andres@anarazel.de>
> Date:   Thu Feb 15 21:55:31 2018 -0800
> 
>     Do execGrouping.c via expression eval machinery, take two.

Andres, with RMT hat on: Andres, this needs looking at ASAP.
Andres, without RMT hat on: Oh, I had first missed it, and then was
  distracted reviewing pluggable storage.
Andres, with RMT hat on: that's not really an excuse
Andres, without RMT hat on: sorry, will start looking now.

Greetings,

Andres Freund


Re: Server crashed with dense_rank on partition table.

От
Andres Freund
Дата:
On 2018-07-02 17:14:14 +0900, Amit Langote wrote:
> I studied this a bit and found a bug that's causing the crash.
> 
> The above mentioned commit has this hunk:
> 
> @@ -1309,6 +1311,9 @@ hypothetical_dense_rank_final(PG_FUNCTION_ARGS)
>          PG_RETURN_INT64(rank);
> 
>      osastate = (OSAPerGroupState *) PG_GETARG_POINTER(0);
> +    econtext = osastate->qstate->econtext;
> +    if (!econtext)
> +        osastate->qstate->econtext = econtext =
> CreateStandaloneExprContext();
> 
> In CreateStandloneExprContext(), we have this:
> 
>     econtext->ecxt_per_query_memory = CurrentMemoryContext;
> 
>     /*
>      * Create working memory for expression evaluation in this context.
>      */
>     econtext->ecxt_per_tuple_memory =
>         AllocSetContextCreate(CurrentMemoryContext,
>                               "ExprContext",
>                               ALLOCSET_DEFAULT_SIZES);
> 
> I noticed when debugging the crashing query that CurrentMemoryContext is
> actually per-tuple memory context of some expression context of the
> calling code, which would get reset before getting here again.  So, it's
> wrong of hypothetical_dense_rank_final to call CreateStandloneExprContext
> without first switching to an actual per-query context.
> 
> Attached patch seems to fix the crash.

Thanks, that looks correct. Pushed!

- Andres


Re: Server crashed with dense_rank on partition table.

От
Amit Langote
Дата:
On 2018/07/05 9:40, Andres Freund wrote:
> On 2018-07-02 17:14:14 +0900, Amit Langote wrote:
>> I studied this a bit and found a bug that's causing the crash.
>>
>> The above mentioned commit has this hunk:
>>
>> @@ -1309,6 +1311,9 @@ hypothetical_dense_rank_final(PG_FUNCTION_ARGS)
>>          PG_RETURN_INT64(rank);
>>
>>      osastate = (OSAPerGroupState *) PG_GETARG_POINTER(0);
>> +    econtext = osastate->qstate->econtext;
>> +    if (!econtext)
>> +        osastate->qstate->econtext = econtext =
>> CreateStandaloneExprContext();
>>
>> In CreateStandloneExprContext(), we have this:
>>
>>     econtext->ecxt_per_query_memory = CurrentMemoryContext;
>>
>>     /*
>>      * Create working memory for expression evaluation in this context.
>>      */
>>     econtext->ecxt_per_tuple_memory =
>>         AllocSetContextCreate(CurrentMemoryContext,
>>                               "ExprContext",
>>                               ALLOCSET_DEFAULT_SIZES);
>>
>> I noticed when debugging the crashing query that CurrentMemoryContext is
>> actually per-tuple memory context of some expression context of the
>> calling code, which would get reset before getting here again.  So, it's
>> wrong of hypothetical_dense_rank_final to call CreateStandloneExprContext
>> without first switching to an actual per-query context.
>>
>> Attached patch seems to fix the crash.
> 
> Thanks, that looks correct. Pushed!

Thank you.

Regards,
Amit