Re: WIP: Upper planner pathification

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: WIP: Upper planner pathification
Дата
Msg-id CAA4eK1KPmF0hU5yz7yP7ss6uXp99Ba5mJJ5+948xXmV=wt6ubw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP: Upper planner pathification  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: WIP: Upper planner pathification  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Tue, Mar 8, 2016 at 2:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
> >> Attached is a version that addresses today's concerns, and also finishes
> >> filling in the loose ends I'd left before, such as documentation and
> >> outfuncs.c support.  I think this is in a committable state now, though
> >> I plan to read through the whole thing again.
>
> The extra read-through located some minor bugs, mainly places where I'd
> forgotten to ensure that Path cost info was transposed into the generated
> Plan.  That would only have the cosmetic effect that EXPLAIN would print
> zeroes for estimated costs, and since we only use EXPLAIN COSTS OFF in
> the regression tests, no test failures ensued :-(.
>
> I've pushed it now; we'll soon see if the buildfarm finds any problems.
>

On latest commit-51c0f63e, I am seeing some issues w.r.t parallel query.  Consider a below case:

create table t1(c1 int, c2 char(1000));
insert into t1 values(generate_series(1,300000),'aaaa');
analyze t1;
set max_parallel_degree=2;
postgres=# explain select c1, count(c1) from t1 where c1 < 1000 group by c1;
ERROR:  ORDER/GROUP BY expression not found in targetlist

Without setting max_parallel_degree, it works fine and generate the appropriate results.  Here the issue seems to be that the code in grouping_planner doesn't apply the required PathTarget to Path below Gather Path due to which when we generate target list for scan plan, it misses the required information which in this case is sortgrouprefs and the same target list is then propagated for upper nodes which eventually leads to the above mentioned failure.  Due to same reason, if the target list contains some expression, it will give wrong results when parallel query is used.  I could see below ways to solve this issue.

Approach-1
-----------------
First way to solve this issue is to jam the PathTarget for partial paths similar to what we are doing for Paths and I have verified that resolves the issue, the patch for same is attached with this mail.  However, it won't work as-is, because this will make target lists pushed to workers as we have applied them below Gather Path which we don't want if the target list has any parallel unsafe functions.  To make this approach work, we need to ensure that we jam the PathTarget for partial paths (or generate a projection) only if they contain parallel-safe expressions.  Now while creating Gather Plan (create_gather_plan()), we need to ensure in some way that if path below doesn't generate the required tlist, then it should generate it's own tlist rather than using it from subplan.

Approach-2
------------------
Always generate a tlist in Gather plan as we do for many other cases.  I think this approach will also resolve the issue but haven't tried yet.


Approach-1 has a benefit that it can allow to push target lists to workers which will result in speed-up of parallel queries especially for cases when target list contain costly expressions, so I am slightly inclined to follow that even though that looks more work.

Thoughts?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Konstantin Knizhnik
Дата:
Сообщение: Re: Optimizer questions
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: More stable query plans via more predictable column statistics