Re: Oversight in reparameterize_path_by_child leading to executor crash

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: Oversight in reparameterize_path_by_child leading to executor crash
Дата
Msg-id CAExHW5sNd=EDGmkar5pWoibyZGLZvCFDxEhEjU-JWaGLDVyfrA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Oversight in reparameterize_path_by_child leading to executor crash  (Richard Guo <guofenglinux@gmail.com>)
Ответы Re: Oversight in reparameterize_path_by_child leading to executor crash  (Richard Guo <guofenglinux@gmail.com>)
Список pgsql-hackers


On Fri, Aug 4, 2023 at 6:08 AM Richard Guo <guofenglinux@gmail.com> wrote:

On Thu, Aug 3, 2023 at 7:20 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
However, if reparameterization can *not* happen at the planning time, we have chosen a plan which can not be realised meeting a dead end. So as long as we can ensure that the reparameterization is possible we can delay actual translations. I think it should be possible to decide whether reparameterization is possible based on the type of path alone. So seems doable.

That has been done in v2 patch.  See the new added function
path_is_reparameterizable_by_child().

Thanks. The patch looks good overall. I like it because of its potential to reduce memory consumption in reparameterization. That's cake on top of cream :)

Some comments here.

+ {
+ FLAT_COPY_PATH(new_path, path, Path);
+
+ if (path->pathtype == T_SampleScan)
+ {
+ Index scan_relid = path->parent->relid;
+ RangeTblEntry *rte;
+
+ /* it should be a base rel with a tablesample clause... */
+ Assert(scan_relid > 0);
+ rte = planner_rt_fetch(scan_relid, root);
+ Assert(rte->rtekind == RTE_RELATION);
+ Assert(rte->tablesample != NULL);
+
+ ADJUST_CHILD_EXPRS(rte->tablesample, TableSampleClause *);
+ }
+ }

This change makes this function usable only after the final plan has been
selected. If some code accidently uses it earlier, it would corrupt
rte->tablesample without getting detected easily. I think we should mention
this in the function prologue and move it somewhere else. Other option is to a.
leave rte->tablesample unchanged here with a comment as to why we aren't
changing it b. reparameterize tablesample expression in
create_samplescan_plan() if the path is parameterized by the parent. We call
reparameterize_path_by_child() in create_nestloop_plan() as this patch does
right now. I like that we are delaying reparameterization. It saves a bunch of
memory. I haven't measured it but from my recent experiments I know it will be
a lot.

  * If the inner path is parameterized, it is parameterized by the
- * topmost parent of the outer rel, not the outer rel itself.  Fix
- * that.
+ * topmost parent of the outer rel, not the outer rel itself.  We will

There's something wrong with the original sentence (which was probably written
by me back then :)). I think it should have read "If the inner path is
parameterized by the topmost parent of the outer rel instead of the outer rel
itself, fix that.". If you agree, the new comment should change it accordingly.

@@ -2430,6 +2430,16 @@ create_nestloop_path(PlannerInfo *root,
 {
  NestPath   *pathnode = makeNode(NestPath);
  Relids inner_req_outer = PATH_REQ_OUTER(inner_path);
+ Relids outerrelids;
+
+ /*
+ * Paths are parameterized by top-level parents, so run parameterization
+ * tests on the parent relids.
+ */
+ if (outer_path->parent->top_parent_relids)
+ outerrelids = outer_path->parent->top_parent_relids;
+ else
+ outerrelids = outer_path->parent->relids;

This looks like an existing bug. Did partitionwise join paths ended up with
extra restrict clauses without this fix? I am not able to see why this change
is needed by rest of the changes in the patch?

Anyway, let's rename outerrelids to something else e.g.  outer_param_relids to reflect
its true meaning.

+bool
+path_is_reparameterizable_by_child(Path *path)
+{
+ switch (nodeTag(path))
+ {
+ case T_Path:
... snip ...
+ case T_GatherPath:
+ return true;
+ default:
+
+ /* We don't know how to reparameterize this path. */
+ return false;
+ }
+
+ return false;
+}

How do we make sure that any changes to reparameterize_path_by_child() are
reflected here? One way is to call this function from
reparameterize_path_by_child() the first thing and return if the path can not
be reparameterized.

I haven't gone through each path's translation to make sure that it's possible
to do that during create_nestloop_plan(). I will do that in my next round of
review if time permits. But AFAIR, each of the translations are possible during
create_nestloop_plan() when it happens in this patch.

-#define ADJUST_CHILD_ATTRS(node) \
+#define ADJUST_CHILD_EXPRS(node, fieldtype) \
  ((node) = \
- (List *) adjust_appendrel_attrs_multilevel(root, (Node *) (node), \
- child_rel, \
- child_rel->top_parent))
+ (fieldtype) adjust_appendrel_attrs_multilevel(root, (Node *) (node), \
+   child_rel, \
+   child_rel->top_parent))
+
+#define ADJUST_CHILD_ATTRS(node) ADJUST_CHILD_EXPRS(node, List *)

IIRC, ATTRS here is taken from the function being called. I think we should
just change the macro definition, not its name. If you would like to have a
separate macro for List nodes, create ADJUST_CHILD_ATTRS_IN_LIST or some such.

--
Best Wishes,
Ashutosh Bapat

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: initial pruning in parallel append
Следующее
От: Tom Lane
Дата:
Сообщение: Re: initial pruning in parallel append