Обсуждение: Server crashed with TRAP: FailedAssertion("!(parallel_workers > 0)"when partitionwise_aggregate true.

Поиск
Список
Период
Сортировка

Server crashed with TRAP: FailedAssertion("!(parallel_workers > 0)"when partitionwise_aggregate true.

От
Rajkumar Raghuwanshi
Дата:
Hi,

Below test case crashed, when set enable_partitionwise_aggregate to true.

CREATE TABLE part (c1 INTEGER,c2 INTEGER,c3 CHAR(10)) PARTITION BY RANGE(c1);
CREATE TABLE part_p1 PARTITION OF part FOR VALUES FROM (MINVALUE) TO (500);
CREATE TABLE part_p2 PARTITION OF part FOR VALUES FROM (500) TO (1000);
CREATE TABLE part_p3 PARTITION OF part FOR VALUES FROM (1000) TO (MAXVALUE);
INSERT INTO part SELECT i,i % 250, to_char(i % 4, 'FM0000') FROM GENERATE_SERIES(1,1500,2)i;
ANALYSE part;

ALTER TABLE part_p1 SET (parallel_workers = 0);
ALTER TABLE part_p2 SET (parallel_workers = 0);
ALTER TABLE part_p3 SET (parallel_workers = 0);

SET enable_partitionwise_join to on;

set enable_partitionwise_aggregate to off;
EXPLAIN (COSTS OFF)
SELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;

set enable_partitionwise_aggregate to on;
EXPLAIN (COSTS OFF)
SELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;

/*
postgres=# set enable_partitionwise_aggregate to off;
SET
postgres=# EXPLAIN (COSTS OFF)
postgres-# SELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;
                       QUERY PLAN                      
--------------------------------------------------------
 Sort
   Sort Key: (avg(t2.c1)), (sum(t1.c1))
   ->  HashAggregate
         Group Key: t1.c1, t2.c1
         Filter: ((sum(t1.c1) % '125'::bigint) = 0)
         ->  Append
               ->  Hash Join
                     Hash Cond: (t1.c1 = t2.c1)
                     ->  Seq Scan on part_p1 t1
                     ->  Hash
                           ->  Seq Scan on part_p1 t2
               ->  Hash Join
                     Hash Cond: (t1_1.c1 = t2_1.c1)
                     ->  Seq Scan on part_p2 t1_1
                     ->  Hash
                           ->  Seq Scan on part_p2 t2_1
               ->  Hash Join
                     Hash Cond: (t1_2.c1 = t2_2.c1)
                     ->  Seq Scan on part_p3 t1_2
                     ->  Hash
                           ->  Seq Scan on part_p3 t2_2
(21 rows)

postgres=#
postgres=# set enable_partitionwise_aggregate to on;
SET
postgres=# EXPLAIN (COSTS OFF)
postgres-# SELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q
*/

--logfile
TRAP: FailedAssertion("!(parallel_workers > 0)", File: "allpaths.c", Line: 1630)
2018-06-14 23:24:58.375 IST [69650] LOG:  server process (PID 69660) was terminated by signal 6: Aborted
2018-06-14 23:24:58.375 IST [69650] DETAIL:  Failed process was running: EXPLAIN (COSTS OFF)
        SELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;


--core.file
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `postgres: edb postgres [local] EXPLAIN                  '.
Program terminated with signal 6, Aborted.
#0  0x0000003dd2632495 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64      return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-23.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  0x0000003dd2632495 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003dd2633c75 in abort () at abort.c:92
#2  0x0000000000a326da in ExceptionalCondition (conditionName=0xc1a970 "!(parallel_workers > 0)", errorType=0xc1a426 "FailedAssertion", fileName=0xc1a476 "allpaths.c",
    lineNumber=1630) at assert.c:54
#3  0x0000000000797bda in add_paths_to_append_rel (root=0x1d6ff08, rel=0x1d45d80, live_childrels=0x0) at allpaths.c:1630
#4  0x00000000007d37e1 in create_partitionwise_grouping_paths (root=0x1d6ff08, input_rel=0x1da5380, grouped_rel=0x1d43520, partially_grouped_rel=0x1d45d80,
    agg_costs=0x7ffceb18dd20, gd=0x0, patype=PARTITIONWISE_AGGREGATE_FULL, extra=0x7ffceb18dbe0) at planner.c:7120
#5  0x00000000007ce58d in create_ordinary_grouping_paths (root=0x1d6ff08, input_rel=0x1da5380, grouped_rel=0x1d43520, agg_costs=0x7ffceb18dd20, gd=0x0, extra=0x7ffceb18dbe0,
    partially_grouped_rel_p=0x7ffceb18dc70) at planner.c:4011
#6  0x00000000007ce14b in create_grouping_paths (root=0x1d6ff08, input_rel=0x1da5380, target=0x1d446d0, target_parallel_safe=true, agg_costs=0x7ffceb18dd20, gd=0x0)
    at planner.c:3783
#7  0x00000000007cb344 in grouping_planner (root=0x1d6ff08, inheritance_update=false, tuple_fraction=0) at planner.c:2037
#8  0x00000000007c94e6 in subquery_planner (glob=0x1d6fe70, parse=0x1d2a658, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at planner.c:966
#9  0x00000000007c80a3 in standard_planner (parse=0x1d2a658, cursorOptions=256, boundParams=0x0) at planner.c:405
#10 0x00000000007c7dcb in planner (parse=0x1d2a658, cursorOptions=256, boundParams=0x0) at planner.c:263
#11 0x00000000008c4576 in pg_plan_query (querytree=0x1d2a658, cursorOptions=256, boundParams=0x0) at postgres.c:809
#12 0x000000000064a1d0 in ExplainOneQuery (query=0x1d2a658, cursorOptions=256, into=0x0, es=0x1d24460,
    queryString=0x1c68798 "EXPLAIN (COSTS OFF)\nSELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;", params=0x0, queryEnv=0x0) at explain.c:365
#13 0x0000000000649ed2 in ExplainQuery (pstate=0x1c8be28, stmt=0x1d34b08,
    queryString=0x1c68798 "EXPLAIN (COSTS OFF)\nSELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;", params=0x0, queryEnv=0x0, dest=0x1c8bd90) at explain.c:254
#14 0x00000000008ccd99 in standard_ProcessUtility (pstmt=0x1d34bd8,
    queryString=0x1c68798 "EXPLAIN (COSTS OFF)\nSELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1c8bd90, completionTag=0x7ffceb18e450 "") at utility.c:672
#15 0x00000000008cc520 in ProcessUtility (pstmt=0x1d34bd8,
    queryString=0x1c68798 "EXPLAIN (COSTS OFF)\nSELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1c8bd90, completionTag=0x7ffceb18e450 "") at utility.c:360
#16 0x00000000008cb4ce in PortalRunUtility (portal=0x1ccdc28, pstmt=0x1d34bd8, isTopLevel=true, setHoldSnapshot=true, dest=0x1c8bd90, completionTag=0x7ffceb18e450 "")
    at pquery.c:1178
#17 0x00000000008cb1c5 in FillPortalStore (portal=0x1ccdc28, isTopLevel=true) at pquery.c:1038
#18 0x00000000008caaf6 in PortalRun (portal=0x1ccdc28, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1d6d9e8, altdest=0x1d6d9e8,
    completionTag=0x7ffceb18e650 "") at pquery.c:768
#19 0x00000000008c4aef in exec_simple_query (
    query_string=0x1c68798 "EXPLAIN (COSTS OFF)\nSELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;") at postgres.c:1122
#20 0x00000000008c8dbf in PostgresMain (argc=1, argv=0x1c922a0, dbname=0x1c92100 "postgres", username=0x1c65298 "edb") at postgres.c:4153
#21 0x0000000000826703 in BackendRun (port=0x1c8a060) at postmaster.c:4361
#22 0x0000000000825e71 in BackendStartup (port=0x1c8a060) at postmaster.c:4033
#23 0x0000000000822253 in ServerLoop () at postmaster.c:1706
#24 0x0000000000821b85 in PostmasterMain (argc=3, argv=0x1c631f0) at postmaster.c:1379
#25 0x0000000000748d64 in main (argc=3, argv=0x1c631f0) at main.c:228

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


On Mon, Jun 18, 2018 at 5:02 PM, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi,

Below test case crashed, when set enable_partitionwise_aggregate to true.

I will have a look over this.

Thanks for reporting.
 

CREATE TABLE part (c1 INTEGER,c2 INTEGER,c3 CHAR(10)) PARTITION BY RANGE(c1);
CREATE TABLE part_p1 PARTITION OF part FOR VALUES FROM (MINVALUE) TO (500);
CREATE TABLE part_p2 PARTITION OF part FOR VALUES FROM (500) TO (1000);
CREATE TABLE part_p3 PARTITION OF part FOR VALUES FROM (1000) TO (MAXVALUE);
INSERT INTO part SELECT i,i % 250, to_char(i % 4, 'FM0000') FROM GENERATE_SERIES(1,1500,2)i;
ANALYSE part;

ALTER TABLE part_p1 SET (parallel_workers = 0);
ALTER TABLE part_p2 SET (parallel_workers = 0);
ALTER TABLE part_p3 SET (parallel_workers = 0);

SET enable_partitionwise_join to on;

set enable_partitionwise_aggregate to off;
EXPLAIN (COSTS OFF)
SELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;

set enable_partitionwise_aggregate to on;
EXPLAIN (COSTS OFF)
SELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;

/*
postgres=# set enable_partitionwise_aggregate to off;
SET
postgres=# EXPLAIN (COSTS OFF)
postgres-# SELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;
                       QUERY PLAN                      
--------------------------------------------------------
 Sort
   Sort Key: (avg(t2.c1)), (sum(t1.c1))
   ->  HashAggregate
         Group Key: t1.c1, t2.c1
         Filter: ((sum(t1.c1) % '125'::bigint) = 0)
         ->  Append
               ->  Hash Join
                     Hash Cond: (t1.c1 = t2.c1)
                     ->  Seq Scan on part_p1 t1
                     ->  Hash
                           ->  Seq Scan on part_p1 t2
               ->  Hash Join
                     Hash Cond: (t1_1.c1 = t2_1.c1)
                     ->  Seq Scan on part_p2 t1_1
                     ->  Hash
                           ->  Seq Scan on part_p2 t2_1
               ->  Hash Join
                     Hash Cond: (t1_2.c1 = t2_2.c1)
                     ->  Seq Scan on part_p3 t1_2
                     ->  Hash
                           ->  Seq Scan on part_p3 t2_2
(21 rows)

postgres=#
postgres=# set enable_partitionwise_aggregate to on;
SET
postgres=# EXPLAIN (COSTS OFF)
postgres-# SELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q
*/

--logfile
TRAP: FailedAssertion("!(parallel_workers > 0)", File: "allpaths.c", Line: 1630)
2018-06-14 23:24:58.375 IST [69650] LOG:  server process (PID 69660) was terminated by signal 6: Aborted
2018-06-14 23:24:58.375 IST [69650] DETAIL:  Failed process was running: EXPLAIN (COSTS OFF)
        SELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;


--core.file
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `postgres: edb postgres [local] EXPLAIN                  '.
Program terminated with signal 6, Aborted.
#0  0x0000003dd2632495 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64      return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-23.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  0x0000003dd2632495 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003dd2633c75 in abort () at abort.c:92
#2  0x0000000000a326da in ExceptionalCondition (conditionName=0xc1a970 "!(parallel_workers > 0)", errorType=0xc1a426 "FailedAssertion", fileName=0xc1a476 "allpaths.c",
    lineNumber=1630) at assert.c:54
#3  0x0000000000797bda in add_paths_to_append_rel (root=0x1d6ff08, rel=0x1d45d80, live_childrels=0x0) at allpaths.c:1630
#4  0x00000000007d37e1 in create_partitionwise_grouping_paths (root=0x1d6ff08, input_rel=0x1da5380, grouped_rel=0x1d43520, partially_grouped_rel=0x1d45d80,
    agg_costs=0x7ffceb18dd20, gd=0x0, patype=PARTITIONWISE_AGGREGATE_FULL, extra=0x7ffceb18dbe0) at planner.c:7120
#5  0x00000000007ce58d in create_ordinary_grouping_paths (root=0x1d6ff08, input_rel=0x1da5380, grouped_rel=0x1d43520, agg_costs=0x7ffceb18dd20, gd=0x0, extra=0x7ffceb18dbe0,
    partially_grouped_rel_p=0x7ffceb18dc70) at planner.c:4011
#6  0x00000000007ce14b in create_grouping_paths (root=0x1d6ff08, input_rel=0x1da5380, target=0x1d446d0, target_parallel_safe=true, agg_costs=0x7ffceb18dd20, gd=0x0)
    at planner.c:3783
#7  0x00000000007cb344 in grouping_planner (root=0x1d6ff08, inheritance_update=false, tuple_fraction=0) at planner.c:2037
#8  0x00000000007c94e6 in subquery_planner (glob=0x1d6fe70, parse=0x1d2a658, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at planner.c:966
#9  0x00000000007c80a3 in standard_planner (parse=0x1d2a658, cursorOptions=256, boundParams=0x0) at planner.c:405
#10 0x00000000007c7dcb in planner (parse=0x1d2a658, cursorOptions=256, boundParams=0x0) at planner.c:263
#11 0x00000000008c4576 in pg_plan_query (querytree=0x1d2a658, cursorOptions=256, boundParams=0x0) at postgres.c:809
#12 0x000000000064a1d0 in ExplainOneQuery (query=0x1d2a658, cursorOptions=256, into=0x0, es=0x1d24460,
    queryString=0x1c68798 "EXPLAIN (COSTS OFF)\nSELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;", params=0x0, queryEnv=0x0) at explain.c:365
#13 0x0000000000649ed2 in ExplainQuery (pstate=0x1c8be28, stmt=0x1d34b08,
    queryString=0x1c68798 "EXPLAIN (COSTS OFF)\nSELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;", params=0x0, queryEnv=0x0, dest=0x1c8bd90) at explain.c:254
#14 0x00000000008ccd99 in standard_ProcessUtility (pstmt=0x1d34bd8,
    queryString=0x1c68798 "EXPLAIN (COSTS OFF)\nSELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1c8bd90, completionTag=0x7ffceb18e450 "") at utility.c:672
#15 0x00000000008cc520 in ProcessUtility (pstmt=0x1d34bd8,
    queryString=0x1c68798 "EXPLAIN (COSTS OFF)\nSELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1c8bd90, completionTag=0x7ffceb18e450 "") at utility.c:360
#16 0x00000000008cb4ce in PortalRunUtility (portal=0x1ccdc28, pstmt=0x1d34bd8, isTopLevel=true, setHoldSnapshot=true, dest=0x1c8bd90, completionTag=0x7ffceb18e450 "")
    at pquery.c:1178
#17 0x00000000008cb1c5 in FillPortalStore (portal=0x1ccdc28, isTopLevel=true) at pquery.c:1038
#18 0x00000000008caaf6 in PortalRun (portal=0x1ccdc28, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1d6d9e8, altdest=0x1d6d9e8,
    completionTag=0x7ffceb18e650 "") at pquery.c:768
#19 0x00000000008c4aef in exec_simple_query (
    query_string=0x1c68798 "EXPLAIN (COSTS OFF)\nSELECT AVG(t2.c1),SUM(t1.c1) FROM part t1 INNER JOIN part t2 ON (t1.c1 = t2.c1) GROUP BY t1.c1, t2.c1 HAVING SUM(t1.c1) % 125 = 0 ORDER BY 1,2;") at postgres.c:1122
#20 0x00000000008c8dbf in PostgresMain (argc=1, argv=0x1c922a0, dbname=0x1c92100 "postgres", username=0x1c65298 "edb") at postgres.c:4153
#21 0x0000000000826703 in BackendRun (port=0x1c8a060) at postmaster.c:4361
#22 0x0000000000825e71 in BackendStartup (port=0x1c8a060) at postmaster.c:4033
#23 0x0000000000822253 in ServerLoop () at postmaster.c:1706
#24 0x0000000000821b85 in PostmasterMain (argc=3, argv=0x1c631f0) at postmaster.c:1379
#25 0x0000000000748d64 in main (argc=3, argv=0x1c631f0) at main.c:228

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation



--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

On 2018-06-18 17:10:12 +0530, Jeevan Chalke wrote:
> On Mon, Jun 18, 2018 at 5:02 PM, Rajkumar Raghuwanshi <
> rajkumar.raghuwanshi@enterprisedb.com> wrote:
> 
> > Hi,
> >
> > Below test case crashed, when set enable_partitionwise_aggregate to true.
> >
> 
> I will have a look over this.
> 
> Thanks for reporting.

I've added an v11 open-items entry.

- Andres




On Mon, Jun 18, 2018 at 9:27 PM, Andres Freund <andres@anarazel.de> wrote:
On 2018-06-18 17:10:12 +0530, Jeevan Chalke wrote:
> On Mon, Jun 18, 2018 at 5:02 PM, Rajkumar Raghuwanshi <
> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>
> > Hi,
> >
> > Below test case crashed, when set enable_partitionwise_aggregate to true.
> >
>
> I will have a look over this.

In the reported testcase, parallel_workers is set to 0 for all partition (child) relations. Which means partial parallel paths are not possible for child rels. However, the parent can have partial parallel paths. Thus, when we have a full partitionwise aggregate possible (as the group by clause matches with the partition key), we end-up in a situation where we do create a partially_grouped_rel for the parent but there won't be any partially_grouped_live_children.

The current code was calling add_paths_to_append_rel() without making sure of any live children present or not (sorry, it was my fault). This causes an Assertion failure in add_paths_to_append_rel() as this function assumes that it will have non-NIL live_childrels list.

I have fixed partitionwise aggregate code which is calling add_paths_to_append_rel() by checking the live children list correctly. And for robustness, I have also added an Assert() in add_paths_to_append_rel().

I have verified the callers of add_paths_to_append_rel() and except one, all are calling it by making sure that they have a non-NIL live children list. The one which is calling add_paths_to_append_rel() directly is set_append_rel_pathlist(). And I think, at that place, we will never have an empty live children list, I may be wrong though. And if that's the case then newly added Assert() in add_paths_to_append_rel() will fail anyway to catch any programming error in that code path.

Attached patch fixing the crash and also added a simple test-coverage for that.

Let me know if I missed any.

Thanks
 
>
> Thanks for reporting.

I've added an v11 open-items entry.

- Andres

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Вложения


On Tue, Jun 19, 2018 at 2:13 PM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:


On Mon, Jun 18, 2018 at 9:27 PM, Andres Freund <andres@anarazel.de> wrote:
On 2018-06-18 17:10:12 +0530, Jeevan Chalke wrote:
> On Mon, Jun 18, 2018 at 5:02 PM, Rajkumar Raghuwanshi <
> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>
> > Hi,
> >
> > Below test case crashed, when set enable_partitionwise_aggregate to true.
> >
>
> I will have a look over this.

In the reported testcase, parallel_workers is set to 0 for all partition (child) relations. Which means partial parallel paths are not possible for child rels. However, the parent can have partial parallel paths. Thus, when we have a full partitionwise aggregate possible (as the group by clause matches with the partition key), we end-up in a situation where we do create a partially_grouped_rel for the parent but there won't be any partially_grouped_live_children.

The current code was calling add_paths_to_append_rel() without making sure of any live children present or not (sorry, it was my fault). This causes an Assertion failure in add_paths_to_append_rel() as this function assumes that it will have non-NIL live_childrels list.

I have fixed partitionwise aggregate code which is calling add_paths_to_append_rel() by checking the live children list correctly. And for robustness, I have also added an Assert() in add_paths_to_append_rel().

I have verified the callers of add_paths_to_append_rel() and except one, all are calling it by making sure that they have a non-NIL live children list. The one which is calling add_paths_to_append_rel() directly is set_append_rel_pathlist(). And I think, at that place, we will never have an empty live children list, I may be wrong though. And if that's the case then newly added Assert() in add_paths_to_append_rel() will fail anyway to catch any programming error in that code path.

Attached patch fixing the crash and also added a simple test-coverage for that.

Let me know if I missed any.

Rajkumar offlist reported another issue related to data-loss. If few of the partitions has parallel_workers = 0, not all, then PWA plan ended up with a plan having children which has parallel_workers != 0. So the partitions with parallel_workers = 0; were not scanned.

Fixed this in attached version of the patch.
 

Thanks
 
>
> Thanks for reporting.

I've added an v11 open-items entry.

- Andres

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company




--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Вложения


On Tue, Jun 19, 2018 at 7:14 PM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:

In the reported testcase, parallel_workers is set to 0 for all partition (child) relations. Which means partial parallel paths are not possible for child rels. However, the parent can have partial parallel paths. Thus, when we have a full partitionwise aggregate possible (as the group by clause matches with the partition key), we end-up in a situation where we do create a partially_grouped_rel for the parent but there won't be any partially_grouped_live_children.

The current code was calling add_paths_to_append_rel() without making sure of any live children present or not (sorry, it was my fault). This causes an Assertion failure in add_paths_to_append_rel() as this function assumes that it will have non-NIL live_childrels list.

I have fixed partitionwise aggregate code which is calling add_paths_to_append_rel() by checking the live children list correctly. And for robustness, I have also added an Assert() in add_paths_to_append_rel().

I have verified the callers of add_paths_to_append_rel() and except one, all are calling it by making sure that they have a non-NIL live children list. The one which is calling add_paths_to_append_rel() directly is set_append_rel_pathlist(). And I think, at that place, we will never have an empty live children list, I may be wrong though. And if that's the case then newly added Assert() in add_paths_to_append_rel() will fail anyway to catch any programming error in that code path.

Attached patch fixing the crash and also added a simple test-coverage for that.

Let me know if I missed any.

Rajkumar offlist reported another issue related to data-loss. If few of the partitions has parallel_workers = 0, not all, then PWA plan ended up with a plan having children which has parallel_workers != 0. So the partitions with parallel_workers = 0; were not scanned.

Fixed this in attached version of the patch.

There were few commits in this area due to which patch is not cleanly applying.

Attached rebased patch.

Thanks

 

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Вложения
On Tue, Jun 19, 2018 at 2:13 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
>
>
> In the reported testcase, parallel_workers is set to 0 for all partition
> (child) relations. Which means partial parallel paths are not possible for
> child rels. However, the parent can have partial parallel paths. Thus, when
> we have a full partitionwise aggregate possible (as the group by clause
> matches with the partition key), we end-up in a situation where we do create
> a partially_grouped_rel for the parent but there won't be any
> partially_grouped_live_children.
>
> The current code was calling add_paths_to_append_rel() without making sure
> of any live children present or not (sorry, it was my fault). This causes an
> Assertion failure in add_paths_to_append_rel() as this function assumes that
> it will have non-NIL live_childrels list.
>

Thanks for the detailed explanation.

> I have fixed partitionwise aggregate code which is calling
> add_paths_to_append_rel() by checking the live children list correctly. And
> for robustness, I have also added an Assert() in add_paths_to_append_rel().
>
> I have verified the callers of add_paths_to_append_rel() and except one, all
> are calling it by making sure that they have a non-NIL live children list.

I actually thought about moving if(live_childrel != NIL) test inside
this function, but then every caller is doing something different when
that condition occurs. So doesn't help much.

> The one which is calling add_paths_to_append_rel() directly is
> set_append_rel_pathlist(). And I think, at that place, we will never have an
> empty live children list,

Yes, I think so too. That's because set_append_rel_size() should have
marked such a parent as dummy and subsequent set_rel_pathlist() would
not create any paths for it.

Here are some review comments.

+    /* We should end-up here only when we have few live child rels. */

I think the right wording is " ... we have at least one child." I was actually
thinking if we should just return from here when live_children == NIL. But then
we will loose an opportunity to catch a bug earlier than set_cheapest().

+     * However, if there are no live children, then we cannot create any append
+     * path.

I think "no live children" is kind of misleading here. We usually use that term
to mean at least one non-dummy child. But in this case there is no child at
all, so we might want to better describe this situation. May be also explain
when this condition can happen.

+    if (patype == PARTITIONWISE_AGGREGATE_FULL && grouped_live_children != NIL)

I think for this to happen, the parent grouped relation and the underlying scan
itself should be dummy. So, would an Assert be better? I think we have
discussed this earlier, but I can not spot the mail.


+-- Test when partition tables has parallel_workers = 0 but not the parent

Better be worded as "Test when parent can produce parallel paths but not any of
its children". parallel_worker = 0 is just a means to test that. Although the
EXPLAIN output below doesn't really reflect that parent can have parallel
plans. Is it possible to create a scenario to show that.

I see that you have posted some newer versions of this patch, but I
think those still need to address these comments.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company




On Wed, Jun 20, 2018 at 7:11 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
On Tue, Jun 19, 2018 at 2:13 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
>
>
> In the reported testcase, parallel_workers is set to 0 for all partition
> (child) relations. Which means partial parallel paths are not possible for
> child rels. However, the parent can have partial parallel paths. Thus, when
> we have a full partitionwise aggregate possible (as the group by clause
> matches with the partition key), we end-up in a situation where we do create
> a partially_grouped_rel for the parent but there won't be any
> partially_grouped_live_children.
>
> The current code was calling add_paths_to_append_rel() without making sure
> of any live children present or not (sorry, it was my fault). This causes an
> Assertion failure in add_paths_to_append_rel() as this function assumes that
> it will have non-NIL live_childrels list.
>

Thanks for the detailed explanation.

> I have fixed partitionwise aggregate code which is calling
> add_paths_to_append_rel() by checking the live children list correctly. And
> for robustness, I have also added an Assert() in add_paths_to_append_rel().
>
> I have verified the callers of add_paths_to_append_rel() and except one, all
> are calling it by making sure that they have a non-NIL live children list.

I actually thought about moving if(live_childrel != NIL) test inside
this function, but then every caller is doing something different when
that condition occurs. So doesn't help much.

> The one which is calling add_paths_to_append_rel() directly is
> set_append_rel_pathlist(). And I think, at that place, we will never have an
> empty live children list,

Yes, I think so too. That's because set_append_rel_size() should have
marked such a parent as dummy and subsequent set_rel_pathlist() would
not create any paths for it.

Here are some review comments.

+    /* We should end-up here only when we have few live child rels. */

I think the right wording is " ... we have at least one child." I was actually
thinking if we should just return from here when live_children == NIL. But then
we will loose an opportunity to catch a bug earlier than set_cheapest().

Done.
 

+     * However, if there are no live children, then we cannot create any append
+     * path.

I think "no live children" is kind of misleading here. We usually use that term
to mean at least one non-dummy child. But in this case there is no child at
all, so we might want to better describe this situation. May be also explain
when this condition can happen.

Done. Tried re-phrasing it. Please check.


+    if (patype == PARTITIONWISE_AGGREGATE_FULL && grouped_live_children != NIL)

I think for this to happen, the parent grouped relation and the underlying scan
itself should be dummy. So, would an Assert be better? I think we have
discussed this earlier, but I can not spot the mail.

Yep, Assert() will be better. Done.



+-- Test when partition tables has parallel_workers = 0 but not the parent

Better be worded as "Test when parent can produce parallel paths but not any of
its children". parallel_worker = 0 is just a means to test that. Although the
EXPLAIN output below doesn't really reflect that parent can have parallel
plans. Is it possible to create a scenario to show that.

Added test.
 

I see that you have posted some newer versions of this patch, but I
think those still need to address these comments.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Thanks
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Вложения
Hi,

Off-list Ashutosh Bapat has suggested using a flag instead of counting number of
dummy rels and then manipulating on it. That will be simple and smoother.

I agree with his suggestion and updated my patch accordingly.

Thanks

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Вложения

Re: Server crashed with TRAP: FailedAssertion("!(parallel_workers >0)" when partitionwise_aggregate true.

От
Rajkumar Raghuwanshi
Дата:
On Fri, Jun 22, 2018 at 11:15 AM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Hi,

Off-list Ashutosh Bapat has suggested using a flag instead of counting number of
dummy rels and then manipulating on it. That will be simple and smoother.

I agree with his suggestion and updated my patch accordingly.
I have applied patch and checked reported issue. Patch applied cleanly and
issues not reproducible any more.
 
Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
On Fri, Jun 22, 2018 at 2:26 AM, Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:
> I have applied patch and checked reported issue. Patch applied cleanly and
> issues not reproducible any more.

Committed, with a few changes:

- Renamed found_partially_grouped_child to partial_grouping_valid.
The old name seemed to me to be a poor choice, because it sounds from
the name like it gets set to true whenever we've found at least one
partially grouped child, whereas really it gets set to false whenever
we've failed to find at least one partially grouped child.  The new
name is also more like the names in add_paths_to_append_rel.

- Modified the wording of the comment.

- Omitted the new assertion in add_paths_to_append_rel.  I doubt
whether that's correct.  I don't see any obvious reason why
live_childrels can't be NIL there, and further down I see this:

        /*
         * If we found unparameterized paths for all children, build
an unordered,
         * unparameterized Append path for the rel.  (Note: this is correct even
         * if we have zero or one live subpath due to constraint exclusion.)
         */

If it's not possible to have no live_childrels, then that comment is
highly suspect.

Also, even if this assertion *is* correct, I think it needs a better
comment explaining why it's correct, because there isn't anything
obvious in set_append_rel_pathlist that keeps IS_DUMMY_REL() from
being true for every child.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




On Fri, Jun 22, 2018 at 6:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jun 22, 2018 at 2:26 AM, Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:
> I have applied patch and checked reported issue. Patch applied cleanly and
> issues not reproducible any more.

Committed, with a few changes:

Thanks Robert.
 

- Renamed found_partially_grouped_child to partial_grouping_valid.
The old name seemed to me to be a poor choice, because it sounds from
the name like it gets set to true whenever we've found at least one
partially grouped child, whereas really it gets set to false whenever
we've failed to find at least one partially grouped child.  The new
name is also more like the names in add_paths_to_append_rel.

- Modified the wording of the comment.

- Omitted the new assertion in add_paths_to_append_rel.  I doubt
whether that's correct.  I don't see any obvious reason why
live_childrels can't be NIL there, and further down I see this:

        /*
         * If we found unparameterized paths for all children, build
an unordered,
         * unparameterized Append path for the rel.  (Note: this is correct even
         * if we have zero or one live subpath due to constraint exclusion.)
         */

If it's not possible to have no live_childrels, then that comment is
highly suspect.

 
OK, do these comments also holds true for partial_subpaths?

If we have NIL live_childrels, then the Assert() present inside the "if (partial_subpaths_valid)" block will hit. Which I think is wrong.

We either need to remove these Asserts altogether or we should enter inside this block only when we have non-NIL partial_subpaths. Also, if we don't have any partial_subpaths, then variable partial_subpaths_valid should be false. Currently, by default, it is set to true due to which we are ending up inside that if block and then hitting an Assert.
 
Also, even if this assertion *is* correct, I think it needs a better
comment explaining why it's correct, because there isn't anything
obvious in set_append_rel_pathlist that keeps IS_DUMMY_REL() from
being true for every child.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company