Обсуждение: join pushdown and issue with foreign update
Hi. There's issue with join pushdown after commit 86dc90056dfdbd9d1b891718d2e5614e3e432f35 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Wed Mar 31 11:52:34 2021 -0400 Rework planning and execution of UPDATE and DELETE To make sure that join pushdown path selected, one can patch contrib/postgres_fdw/postgres_fdw.c in the following way: diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index c48a421e88b..c2bf6833050 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -5959,6 +5959,8 @@ postgresGetForeignJoinPaths(PlannerInfo *root, /* Estimate costs for bare join relation */ estimate_path_cost_size(root, joinrel, NIL, NIL, NULL, &rows, &width, &startup_cost, &total_cost); + + startup_cost = total_cost = 0; /* Now update this information in the joinrel */ joinrel->rows = rows; joinrel->reltarget->width = width; Now, this simple test shows the issue: create extension postgres_fdw; DO $d$ BEGIN EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname '$$||current_database()||$$', port '$$||current_setting('port')||$$')$$; END; $d$; CREATE USER MAPPING FOR CURRENT_USER SERVER loopback; CREATE TABLE base_tbl (a int, b int); CREATE FOREIGN TABLE remote_tbl (a int, b int) SERVER loopback OPTIONS (table_name 'base_tbl'); insert into remote_tbl select generate_series(1,100), generate_series(1,100); explain verbose update remote_tbl d set a= case when current_timestamp> '2012-02-02'::timestamp then 5 else 6 end FROM remote_tbl AS t (a, b) WHERE d.a = (t.a); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Update on public.remote_tbl d (cost=0.00..42.35 rows=0 width=0) Remote SQL: UPDATE public.base_tbl SET a = $2 WHERE ctid = $1 -> Foreign Scan (cost=0.00..42.35 rows=8470 width=74) Output: CASE WHEN (CURRENT_TIMESTAMP > '2012-02-02 00:00:00'::timestamp without time zone) THEN 5 ELSE 6 END, d.ctid, d.*, t.* Relations: (public.remote_tbl d) INNER JOIN (public.remote_tbl t) Remote SQL: SELECT r1.ctid, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1.a, r1.b) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.a, r2.b) END FROM (public.base_tbl r1 INNER JOIN public.base_tbl r2 ON (((r1.a = r2.a)))) FOR UPDATE OF r1 -> Merge Join (cost=433.03..566.29 rows=8470 width=70) Output: d.ctid, d.*, t.* Merge Cond: (d.a = t.a) -> Sort (cost=211.00..214.10 rows=1241 width=42) Output: d.ctid, d.*, d.a Sort Key: d.a -> Foreign Scan on public.remote_tbl d (cost=100.00..147.23 rows=1241 width=42) Output: d.ctid, d.*, d.a Remote SQL: SELECT a, b, ctid FROM public.base_tbl FOR UPDATE -> Sort (cost=222.03..225.44 rows=1365 width=36) Output: t.*, t.a Sort Key: t.a -> Foreign Scan on public.remote_tbl t (cost=100.00..150.95 rows=1365 width=36) Output: t.*, t.a Remote SQL: SELECT a, b FROM public.base_tbl update remote_tbl d set a= case when current_timestamp> '2012-02-02'::timestamp then 5 else 6 end FROM remote_tbl AS t (a, b) WHERE d.a = (t.a); You'll get ERROR: input of anonymous composite types is not implemented CONTEXT: whole-row reference to foreign table "remote_tbl" make_tuple_from_result_row() (called by fetch_more_data()), will try to call InputFunctionCall() for ROW(r1.a, r1.b) and will get error in record_in(). Here ROW(r2.a, r2.b) would have attribute type id, corresponding to remote_tbl, but ROW(r1.a, r1.b) would have atttypid 2249 (RECORD). Before 86dc90056dfdbd9d1b891718d2e5614e3e432f35 the plan would be different and looked like ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Update on public.remote_tbl d (cost=0.00..73.54 rows=14708 width=46) Remote SQL: UPDATE public.base_tbl SET a = $2 WHERE ctid = $1 -> Foreign Scan (cost=0.00..73.54 rows=14708 width=46) Output: CASE WHEN (CURRENT_TIMESTAMP > '2012-02-02 00:00:00'::timestamp without time zone) THEN d.a ELSE 6 END, d.b, d.ctid, t.* Relations: (public.remote_tbl d) INNER JOIN (public.remote_tbl t) Remote SQL: SELECT r1.a, r1.b, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.a, r2.b) END FROM (public.base_tbl r1 INNER JOIN public.base_tbl r2 ON (((r1.a = r2.a)))) FOR UPDATE OF r1 -> Merge Join (cost=516.00..747.39 rows=14708 width=46) Output: d.a, d.b, d.ctid, t.* Merge Cond: (d.a = t.a) -> Sort (cost=293.97..299.35 rows=2155 width=14) Output: d.a, d.b, d.ctid Sort Key: d.a -> Foreign Scan on public.remote_tbl d (cost=100.00..174.65 rows=2155 width=14) Output: d.a, d.b, d.ctid Remote SQL: SELECT a, b, ctid FROM public.base_tbl FOR UPDATE -> Sort (cost=222.03..225.44 rows=1365 width=36) Output: t.*, t.a Sort Key: t.a -> Foreign Scan on public.remote_tbl t (cost=100.00..150.95 rows=1365 width=36) Output: t.*, t.a Remote SQL: SELECT a, b FROM public.base_tbl Here ROW(r2.a, r2.b) would have attribute type id, corresponding to remote_tbl. -- Best regards, Alexander Pyhalov, Postgres Professional
Alexander Pyhalov писал 2021-05-31 15:39: > Hi. > > There's issue with join pushdown after > > commit 86dc90056dfdbd9d1b891718d2e5614e3e432f35 > Author: Tom Lane <tgl@sss.pgh.pa.us> > Date: Wed Mar 31 11:52:34 2021 -0400 > ... > You'll get > ERROR: input of anonymous composite types is not implemented > CONTEXT: whole-row reference to foreign table "remote_tbl" > > make_tuple_from_result_row() (called by fetch_more_data()), will try > to call InputFunctionCall() for ROW(r1.a, r1.b) and will get error in > record_in(). > > Here ROW(r2.a, r2.b) would have attribute type id, corresponding to > remote_tbl, but ROW(r1.a, r1.b) would have atttypid 2249 (RECORD). > The issue seems to be that add_row_identity_columns() adds RECORD var to the query. Adding var with table's relation type fixes this issue, but breaks update of partitioned tables, as we add "wholerow" with type of one child relation and then try to use it with another child (of different table type). -- Best regards, Alexander Pyhalov, Postgres Professional
Hi, On Tue, Jun 1, 2021 at 1:04 AM Alexander Pyhalov <a.pyhalov@postgrespro.ru> wrote: > > Alexander Pyhalov писал 2021-05-31 15:39: > > Hi. > > > > There's issue with join pushdown after > > > > commit 86dc90056dfdbd9d1b891718d2e5614e3e432f35 > > Author: Tom Lane <tgl@sss.pgh.pa.us> > > Date: Wed Mar 31 11:52:34 2021 -0400 > > > ... > > You'll get > > ERROR: input of anonymous composite types is not implemented > > CONTEXT: whole-row reference to foreign table "remote_tbl" Interesting, thanks for reporting this. This sounds like a regression on 86dc90056's part. > > make_tuple_from_result_row() (called by fetch_more_data()), will try > > to call InputFunctionCall() for ROW(r1.a, r1.b) and will get error in > > record_in(). > > > > Here ROW(r2.a, r2.b) would have attribute type id, corresponding to > > remote_tbl, but ROW(r1.a, r1.b) would have atttypid 2249 (RECORD). > > > > The issue seems to be that add_row_identity_columns() adds RECORD var to > the query. > Adding var with table's relation type fixes this issue, but breaks > update of > partitioned tables, as we add "wholerow" with type of one child relation > and then > try to use it with another child (of different table type). Perhaps, we can get away with adding the wholerow Var with the target relation's reltype when the target foreign table is not a "child" relation, but the root target relation itself. Maybe like the attached? -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
Amit Langote писал 2021-06-01 15:47: > Perhaps, we can get away with adding the wholerow Var with the target > relation's reltype when the target foreign table is not a "child" > relation, but the root target relation itself. Maybe like the > attached? > Hi. I think the patch fixes this issue, but it still preserves chances to get RECORD in fetch_more_data() (at least with combination with asymmetric partition-wise join). What about the following patch? -- Best regards, Alexander Pyhalov, Postgres Professional
Вложения
Alexander Pyhalov <a.pyhalov@postgrespro.ru> writes: > What about the following patch? ISTM that using a specific rowtype rather than RECORD would be quite disastrous from the standpoint of bloating the number of distinct resjunk columns we need for a partition tree with a lot of children. Maybe we'll have to go that way, but it seems like an absolute last resort. I think a preferable fix involves making sure that the correct record-type typmod is propagated to record_in in this context. Alternatively, maybe we could insert the foreign table's rowtype during execution of the input operation, without touching the plan as such. Could we start by creating a test case that doesn't involve uncommittable hacks to the source code? regards, tom lane
Tom Lane писал 2021-06-01 21:19: > Alexander Pyhalov <a.pyhalov@postgrespro.ru> writes: >> What about the following patch? > > ISTM that using a specific rowtype rather than RECORD would be > quite disastrous from the standpoint of bloating the number of > distinct resjunk columns we need for a partition tree with a > lot of children. Maybe we'll have to go that way, but it seems > like an absolute last resort. Why do you think they are distinct? In suggested patch all of them will have type of the common ancestor (root of the partition tree). > > I think a preferable fix involves making sure that the correct > record-type typmod is propagated to record_in in this context. > Alternatively, maybe we could insert the foreign table's rowtype > during execution of the input operation, without touching the > plan as such. > > Could we start by creating a test case that doesn't involve > uncommittable hacks to the source code? Yes, it seems the following works fine to reproduce the issue. -- Best regards, Alexander Pyhalov, Postgres Professional
Вложения
Alexander Pyhalov <a.pyhalov@postgrespro.ru> writes: > Tom Lane писал 2021-06-01 21:19: >> ISTM that using a specific rowtype rather than RECORD would be >> quite disastrous from the standpoint of bloating the number of >> distinct resjunk columns we need for a partition tree with a >> lot of children. Maybe we'll have to go that way, but it seems >> like an absolute last resort. > Why do you think they are distinct? > In suggested patch all of them will have type of the common ancestor > (root of the partition tree). Seems moderately unlikely that that will work in cases where the partition children have rowtypes different from the ancestor (different column order etc). It'll also cause the problem we originally sought to avoid for selects across traditional inheritance trees, where there isn't a common partition ancestor. regards, tom lane
I wrote: > I think a preferable fix involves making sure that the correct > record-type typmod is propagated to record_in in this context. > Alternatively, maybe we could insert the foreign table's rowtype > during execution of the input operation, without touching the > plan as such. Here's a draft-quality patch based on that idea. It resolves the offered test case, but I haven't beat on it beyond that. regards, tom lane diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 7df30010f2..79bc08efb4 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -10231,3 +10231,50 @@ DROP TABLE result_tbl; DROP TABLE join_tbl; ALTER SERVER loopback OPTIONS (DROP async_capable); ALTER SERVER loopback2 OPTIONS (DROP async_capable); +CREATE TABLE base_tbl (a int, b int); +CREATE FOREIGN TABLE remote_tbl (a int, b int) + SERVER loopback OPTIONS (table_name 'base_tbl'); +INSERT INTO base_tbl SELECT a, a+1 FROM generate_series(1,10) a; +ANALYZE base_tbl; +ANALYZE remote_tbl; +EXPLAIN (VERBOSE, COSTS OFF) +UPDATE remote_tbl d SET a = CASE WHEN random() >= 0 THEN 5 ELSE 6 END + FROM remote_tbl AS t WHERE d.a = t.a; + QUERY PLAN +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ + Update on public.remote_tbl d + Remote SQL: UPDATE public.base_tbl SET a = $2 WHERE ctid = $1 + -> Foreign Scan + Output: CASE WHEN (random() >= '0'::double precision) THEN 5 ELSE 6 END, d.ctid, d.*, t.* + Relations: (public.remote_tbl d) INNER JOIN (public.remote_tbl t) + Remote SQL: SELECT r1.ctid, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1.a, r1.b) END, CASE WHEN (r2.*)::textIS NOT NULL THEN ROW(r2.a, r2.b) END FROM (public.base_tbl r1 INNER JOIN public.base_tbl r2 ON (((r1.a = r2.a))))FOR UPDATE OF r1 + -> Hash Join + Output: d.ctid, d.*, t.* + Hash Cond: (d.a = t.a) + -> Foreign Scan on public.remote_tbl d + Output: d.ctid, d.*, d.a + Remote SQL: SELECT a, b, ctid FROM public.base_tbl FOR UPDATE + -> Hash + Output: t.*, t.a + -> Foreign Scan on public.remote_tbl t + Output: t.*, t.a + Remote SQL: SELECT a, b FROM public.base_tbl +(17 rows) + +UPDATE remote_tbl d SET a = CASE WHEN random() >= 0 THEN 5 ELSE 6 END + FROM remote_tbl AS t WHERE d.a = t.a; +SELECT * FROM base_tbl ORDER BY b; + a | b +---+---- + 5 | 2 + 5 | 3 + 5 | 4 + 5 | 5 + 5 | 6 + 5 | 7 + 5 | 8 + 5 | 9 + 5 | 10 + 5 | 11 +(10 rows) + diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index c48a421e88..24ba60e00a 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -1439,6 +1439,57 @@ postgresGetForeignPlan(PlannerInfo *root, outer_plan); } +/* + * Construct a tuple descriptor for the scan tuples handled by a foreign join. + */ +static TupleDesc +get_tupdesc_for_join_scan_tuples(ForeignScanState *node) +{ + ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan; + EState *estate = node->ss.ps.state; + TupleDesc tupdesc; + + /* + * The core code has already set up a scan tuple slot based on + * fsplan->fdw_scan_tlist, and this slot's tupdesc is mostly good enough, + * but there's one case where it isn't. If we have any whole-row row + * identifier Vars, they may have vartype RECORD, and we need to replace + * that with the associated table's actual composite type. This ensures + * that when we read those ROW() expression values from the remote server, + * we can convert them to a composite type the local server knows. + */ + tupdesc = CreateTupleDescCopy(node->ss.ss_ScanTupleSlot->tts_tupleDescriptor); + for (int i = 0; i < tupdesc->natts; i++) + { + Form_pg_attribute att = TupleDescAttr(tupdesc, i); + Var *var; + RangeTblEntry *rte; + Oid reltype; + + /* Nothing to do if it's not a generic RECORD attribute */ + if (att->atttypid != RECORDOID || att->atttypmod >= 0) + continue; + + /* + * If we can't identify the referenced table, do nothing. This'll + * likely lead to failure later, but perhaps we can muddle through. + */ + var = (Var *) list_nth_node(TargetEntry, fsplan->fdw_scan_tlist, + i)->expr; + if (!IsA(var, Var)) + continue; + rte = list_nth(estate->es_range_table, var->varno - 1); + if (rte->rtekind != RTE_RELATION) + continue; + reltype = get_rel_type_id(rte->relid); + if (!OidIsValid(reltype)) + continue; + att->atttypid = reltype; + /* shouldn't need to change anything else */ + } + return tupdesc; +} + /* * postgresBeginForeignScan * Initiate an executor scan of a foreign PostgreSQL table. @@ -1523,7 +1574,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags) else { fsstate->rel = NULL; - fsstate->tupdesc = node->ss.ss_ScanTupleSlot->tts_tupleDescriptor; + fsstate->tupdesc = get_tupdesc_for_join_scan_tuples(node); } fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->tupdesc); @@ -2631,7 +2682,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags) TupleDesc tupdesc; if (fsplan->scan.scanrelid == 0) - tupdesc = node->ss.ss_ScanTupleSlot->tts_tupleDescriptor; + tupdesc = get_tupdesc_for_join_scan_tuples(node); else tupdesc = RelationGetDescr(dmstate->rel); diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql index 78379bdea5..a7070870c7 100644 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql @@ -3262,3 +3262,20 @@ DROP TABLE join_tbl; ALTER SERVER loopback OPTIONS (DROP async_capable); ALTER SERVER loopback2 OPTIONS (DROP async_capable); + +CREATE TABLE base_tbl (a int, b int); +CREATE FOREIGN TABLE remote_tbl (a int, b int) + SERVER loopback OPTIONS (table_name 'base_tbl'); + +INSERT INTO base_tbl SELECT a, a+1 FROM generate_series(1,10) a; + +ANALYZE base_tbl; +ANALYZE remote_tbl; + +EXPLAIN (VERBOSE, COSTS OFF) +UPDATE remote_tbl d SET a = CASE WHEN random() >= 0 THEN 5 ELSE 6 END + FROM remote_tbl AS t WHERE d.a = t.a; +UPDATE remote_tbl d SET a = CASE WHEN random() >= 0 THEN 5 ELSE 6 END + FROM remote_tbl AS t WHERE d.a = t.a; + +SELECT * FROM base_tbl ORDER BY b;
On 2/6/21 02:32, Tom Lane wrote: > I wrote: >> I think a preferable fix involves making sure that the correct >> record-type typmod is propagated to record_in in this context. >> Alternatively, maybe we could insert the foreign table's rowtype >> during execution of the input operation, without touching the >> plan as such. > > Here's a draft-quality patch based on that idea. It resolves > the offered test case, but I haven't beat on it beyond that. > > regards, tom lane > I played with your patch and couldn't find any errors. But what if ROW operation were allowed to be pushed to a foreign server? Potentially, I can imagine pushed-down JOIN with arbitrary ROW function in its target list. Amit's approach looks more safe for me. -- regards, Andrey Lepikhov Postgres Professional
On Wed, Jun 2, 2021 at 6:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: > > I think a preferable fix involves making sure that the correct > > record-type typmod is propagated to record_in in this context. > > Alternatively, maybe we could insert the foreign table's rowtype > > during execution of the input operation, without touching the > > plan as such. > > Here's a draft-quality patch based on that idea. This looks good to me. Yeah, I agree that reversing our decision to mark row-id wholerow Vars in as RECORD rather than a specific reltype will have to wait until we hear more complaints than just this one, which seems fixable with a patch like this. > It resolves > the offered test case, but I haven't beat on it beyond that. Given that we don't (no longer) support pushing down the join of child target relations with other relations, I don't think we have other cases that are affected at this point. I have a feeling that your patch will have fixed things enough that the same problem will not occur when we have join pushdown under UPDATE occurring in more cases. -- Amit Langote EDB: http://www.enterprisedb.com
Tom Lane писал 2021-06-02 00:32: > I wrote: >> I think a preferable fix involves making sure that the correct >> record-type typmod is propagated to record_in in this context. >> Alternatively, maybe we could insert the foreign table's rowtype >> during execution of the input operation, without touching the >> plan as such. > > Here's a draft-quality patch based on that idea. It resolves > the offered test case, but I haven't beat on it beyond that. > > regards, tom lane Hi. The patch seems to work fine for mentioned case. For now I'm working on function pushdown. When record-returning function (like unnest()) is pushed down, on this stage we've already lost any type information, so get the issue again. So far I'm not sure how to fix the issue, perhaps just avoid pushing foreign join if we have record, corresponding to function RTE var in joinrel->reltarget? -- Best regards, Alexander Pyhalov, Postgres Professional
On Wed, Jun 2, 2021 at 4:39 PM Andrey Lepikhov <a.lepikhov@postgrespro.ru> wrote: > On 2/6/21 02:32, Tom Lane wrote: > > I wrote: > >> I think a preferable fix involves making sure that the correct > >> record-type typmod is propagated to record_in in this context. > >> Alternatively, maybe we could insert the foreign table's rowtype > >> during execution of the input operation, without touching the > >> plan as such. > > > > Here's a draft-quality patch based on that idea. It resolves > > the offered test case, but I haven't beat on it beyond that. > > > I played with your patch and couldn't find any errors. But what if ROW > operation were allowed to be pushed to a foreign server? > > Potentially, I can imagine pushed-down JOIN with arbitrary ROW function > in its target list. Are you saying that a pushed down ROW() expression may not correspond with the Var chosen by the following code? + /* + * If we can't identify the referenced table, do nothing. This'll + * likely lead to failure later, but perhaps we can muddle through. + */ + var = (Var *) list_nth_node(TargetEntry, fsplan->fdw_scan_tlist, + i)->expr; + if (!IsA(var, Var)) + continue; + rte = list_nth(estate->es_range_table, var->varno - 1); + if (rte->rtekind != RTE_RELATION) + continue; + reltype = get_rel_type_id(rte->relid); + if (!OidIsValid(reltype)) + continue; + att->atttypid = reltype; That may be a valid concern. I wonder if it would make sense to also check varattno == 0 here somewhere for good measure. -- Amit Langote EDB: http://www.enterprisedb.com
Amit Langote <amitlangote09@gmail.com> writes: > On Wed, Jun 2, 2021 at 4:39 PM Andrey Lepikhov > <a.lepikhov@postgrespro.ru> wrote: >> I played with your patch and couldn't find any errors. But what if ROW >> operation were allowed to be pushed to a foreign server? >> Potentially, I can imagine pushed-down JOIN with arbitrary ROW function >> in its target list. I thought about this for awhile and I don't think it's a real concern. There's nothing stopping us from pushing an expression of the form "func(row(...))" or "row(...) op row(...)", because we're not asking to retrieve the value of the ROW() expression. Whether the remote server can handle that is strictly its concern. (Probably, it's going to do something involving a locally-assigned typmod to keep track of the rowtype, but it's not our problem.) Where things get sticky is if we try to *retrieve the value* of a ROW() expression. And except in this specific context, I don't see why we'd do that. There's no advantage compared to retrieving the component Vars or expressions. > ... I wonder if it would make sense to also > check varattno == 0 here somewhere for good measure. Yeah, I considered doing that but left it off in this version. It's not clear to me how there could be a table column of type RECORD, so it seemed unnecessary. On the other hand, it's also cheap insurance, so I'll put it back. regards, tom lane