Re: WIP: Upper planner pathification

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: WIP: Upper planner pathification
Дата
Msg-id 27536.1457032797@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: WIP: Upper planner pathification  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: WIP: Upper planner pathification  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> Thanks for working on this.  Some review comments:

> - I think all of the new path creation functions should bubble up
> parallel_degree from their subpath.

Ah, thanks, I didn't have any clue how to set that (though in my defense,
the documentation about it is next to nonexistent).  Just to confirm,
I assume the rules are:

* parallel_aware: indicates whether the plan node itself has any
parallelism behavior

* parallel_safe: indicates that the entire plan tree rooted at this
node is safe to execute in a parallel worker

* parallel_degree: indicates number of parallel threads potentially
useful for this plan tree (0 if not parallel-safe)

This leads me to the conclusion that all these new node types should
set parallel_aware to false and copy up the other two fields from the
child, except for LockRows and ModifyTable which should set them all
to false/0.  Correct?  If so, I'll go fix.

> - RollupPath seems like a poor choice of name, if nothing else.  You
> would expect that it would be related to GROUP BY ROLLUP() but of
> course that's really the same thing as GROUP BY GROUPING SETS () or
> GROUP BY CUBE (), and the fundamental operation is actually GROUPING
> SETS, not ROLLUP.

As I noted to David, that thing seems to me to be in need of refactoring,
but I'd just as soon leave untangling the grouping-sets mess for later.
I don't mind substituting a different name if you have a better idea,
but don't really want to do more work than that right now.

> - It's not entirely related to this patch, but I'm starting to wonder
> if we've made the wrong bet about target lists.  It seems to me that
> there's a huge difference between a projection which simply throws
> away columns we don't need and one which actually computes something,
> and maybe those cases ought to be treated differently instead of
> saying "well, it's a target list".  It strikes me that there are
> probably execution-time optimizations that are possible in the former
> case, and maybe a more compact representation of the projection
> operation as well.  I can't shake the feeling that our extensive use
> of lists can't be the best thing ever for performance.

We do already have the "physical tlist" optimization.  I agree that
there's more work to be done here, but again would rather leave that
to a later patch.

> - A related point that is more connected to this patch is that you've
> added 13 (!) new calls to disuse_physical_tlist, and 8 of those are
> marked with /* XXX hack: need original tlist with sortgroupref marking
> */.  I don't quite understand what's going on there.  I think if we're
> going to be stuck with that hack we at least need some comments
> explaining what is going on there.  What has caused us to suddenly
> need these calls when we didn't before, and why these places and not
> some others?

Yeah, that's a hack to get things working.  The problem is that these node
types need to be able to identify sort/group columns in their inputs, but
if the child has switched to a "physical tlist" then the ressortgroupref
marking isn't there, and indeed the needed column might not be there at
all if it's a computed expression not a Var.  So what I did for the moment
was to force the inputs back to their nominal tlists.  In the old code we
didn't have this problem because none of the upper-level plan node types
could see a physical tlist unless make_subplanTargetList had allowed it,
and then we applied locate_grouping_columns() to re-identify the grouping
columns.  That logic probably needs to be transposed into createplan.c,
but I've not taken the time yet to figure out exactly how.  I don't know
if it's reasonable to do that separately from rethinking how the whole
disuse_physical_tlist thing works.

> - For SortPath, you mention that the SortGroupClauses representation
> isn't currently used.  It's not clear to me that it ever would be;
> what would be the case where that might be useful?  At any rate, I'd
> be inclined to rip it out for now; you can always put it back later.

Yeah, I was dithering about that.  It seems like createplan.c now has
a few too many ways to identify sort/group columns, and I was hoping
to consolidate them somehow.  That might lead to wanting to use
SortGroupClauses not PathKeys in some cases.  But until that's worked
out, I agree the extra field is useless and we can just drop it.

> - create_distinct_paths() disables the hash path if it seems like it
> would exceed work_mem, unless the sort path isn't viable.  But there's
> something that feels a bit uncomfortable about this.  Suppose the sort
> path isn't viable but some other kind of future path is viable.  It
> seems like it would be better to restructure this a bit so that the
> decision about whether to add the hash path is based on whether there
> are any other paths in the rel when we reach the bottom of the
> function.  create_grouping_paths() has  a similar issue.

OK, I'll take a look.  Quite a lot of these functions can probably stand
more local rearrangements; I've been mainly concerned about getting the
overall structure right.

(BTW, I am also suspicious that there's now dead code in places, but I've
not gone looking for that either.  There is a lot of rather boring mop-up
to be done, which I left out of the v1 patch mostly to keep it from being
even more unreviewably huge than it had to be.)

> In general, and I'm sure this is not a huge surprise, most of this
> looks very good to me.  I think the design is sound and that, if the
> performance is OK, we ought to move forward with it.

Thanks.  As I told Teodor last night, I can't reproduce a performance
issue here with pgbench-style queries.  Do you have any thoughts about
how we might satisfy ourselves whether there is or isn't a performance
problem?
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: improving GROUP BY estimation
Следующее
От: Vitaly Burovoy
Дата:
Сообщение: Re: [PATCH] Supporting +-Infinity values by to_timestamp(float8)