Обсуждение: Improve choose_custom_plan for initial partition prune case

Поиск
Список
Период
Сортировка

Improve choose_custom_plan for initial partition prune case

От
Andy Fan
Дата:
Given the plan example: 

CREATE TABLE measurement (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
    FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
    FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

prepare s as select * from measurement where logdate = $1;
execute s('2006-02-01').  

The generic plan will probably not be chosen because it doesn't reduce the cost
which can be reduced at initial_prune while the custom plan reduces such cost
at  planning time. which makes the cost comparison not fair.  I'm thinking if we can
get an estimated cost reduction of initial_prunne for generic plan based on the 
partition pruned at plan time from custom plan and then reducing
such costs from the generic plan.  I just went through the related code but
didn't write anything now.  I'd like to see if this is a correct direction to go.

--
Best Regards
Andy Fan

Re: Improve choose_custom_plan for initial partition prune case

От
Amit Langote
Дата:
Hi Andy,

On Fri, Oct 2, 2020 at 1:04 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> Given the plan example:
>
> CREATE TABLE measurement (
>     city_id         int not null,
>     logdate         date not null,
>     peaktemp        int,
>     unitsales       int
> ) PARTITION BY RANGE (logdate);
>
> CREATE TABLE measurement_y2006m02 PARTITION OF measurement
>     FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
>
> CREATE TABLE measurement_y2006m03 PARTITION OF measurement
>     FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
>
> prepare s as select * from measurement where logdate = $1;
> execute s('2006-02-01').
>
> The generic plan will probably not be chosen because it doesn't reduce the cost
> which can be reduced at initial_prune while the custom plan reduces such cost
> at  planning time. which makes the cost comparison not fair.

I agree that there is something to be done here.  Actually, I think we
should try to find a solution that will allow us to consider not just
"initial" pruning, but also "execution-time" pruning.   The latter
will allow a nested loop join whose inner side scans a partitioned
table using a parameterized scan on the partition key to be favored
over other join plans, because that parameterized scan can use
execution-time pruning which can make the inner scan very cheap.

>  I'm thinking if we can
> get an estimated cost reduction of initial_prunne for generic plan based on the
> partition pruned at plan time from custom plan and then reducing
> such costs from the generic plan.  I just went through the related code but
> didn't write anything now.  I'd like to see if this is a correct direction to go.

That's an interesting idea, that is, to try to do this totally outside
the planner.  When I was thinking about this a little while ago, I was
trying to find a way to adjust the cost of the plan in the planner
itself by looking at the runtime pruning info in the nodes that
support it, that is, Append, MergeAppend.  Actually, such an approach
had also come up in the original run-time pruning discussion [1].

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/CA%2BTgmoZHYoAL4HYwnGO25B8CxCB%2BvNMdf%2B7rbUzYykR4sU9yUA%40mail.gmail.com



Re: Improve choose_custom_plan for initial partition prune case

От
Andy Fan
Дата:
Hi Amit:

  Very glad to see your comment!

On Fri, Oct 2, 2020 at 4:21 PM Amit Langote <amitlangote09@gmail.com> wrote:
Hi Andy,

On Fri, Oct 2, 2020 at 1:04 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> Given the plan example:
>
> CREATE TABLE measurement (
>     city_id         int not null,
>     logdate         date not null,
>     peaktemp        int,
>     unitsales       int
> ) PARTITION BY RANGE (logdate);
>
> CREATE TABLE measurement_y2006m02 PARTITION OF measurement
>     FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
>
> CREATE TABLE measurement_y2006m03 PARTITION OF measurement
>     FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
>
> prepare s as select * from measurement where logdate = $1;
> execute s('2006-02-01').
>
> The generic plan will probably not be chosen because it doesn't reduce the cost
> which can be reduced at initial_prune while the custom plan reduces such cost
> at  planning time. which makes the cost comparison not fair.

I agree that there is something to be done here.  Actually, I think we
should try to find a solution that will allow us to consider not just
"initial" pruning, but also "execution-time" pruning.   The latter
will allow a nested loop join whose inner side scans a partitioned
table using a parameterized scan on the partition key to be favored
over other join plans, because that parameterized scan can use
execution-time pruning which can make the inner scan very cheap.


This looks like to resolve another important issue of partition prune, which
may happen at planning time totally (no generic plan or custom plan involved). 
for example between choosing a Nest Loop plan which can use
some run-time partition prune and hash join which can't.  I "repeat" your idea
just to  make sure I understand you correctly. 
 
>  I'm thinking if we can
> get an estimated cost reduction of initial_prunne for generic plan based on the
> partition pruned at plan time from custom plan and then reducing
> such costs from the generic plan.  I just went through the related code but
> didn't write anything now.  I'd like to see if this is a correct direction to go.

That's an interesting idea, that is, to try to do this totally outside
the planner.  When I was thinking about this a little while ago, I was
trying to find a way to adjust the cost of the plan in the planner
itself by looking at the runtime pruning info in the nodes that
support it, that is, Append, MergeAppend.  Actually, such an approach
had also come up in the original run-time pruning discussion [1].

 
Thank you for your comments. Looks like your approach can be helpful
for the both cases, and I did think a bit for that as well, However, that looks
complex (for me) AND I am prefer to guess how many partitions can be
pruned with real data even it is the real data in the past (I assume that 
will not cause too much difference in practice). 

I'm not sure if I should treat Robert's comments as an opposed idea[1] , 
but I think there are some little differences.  I'd like to implement my idea
soon, and I'm glad to see any opposed idea at any time, of course the sooner 
the better:) 


--
Best Regards
Andy Fan

Re: Improve choose_custom_plan for initial partition prune case

От
Ashutosh Bapat
Дата:
On Thu, Oct 1, 2020 at 9:34 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> Given the plan example:
>
> CREATE TABLE measurement (
>     city_id         int not null,
>     logdate         date not null,
>     peaktemp        int,
>     unitsales       int
> ) PARTITION BY RANGE (logdate);
>
> CREATE TABLE measurement_y2006m02 PARTITION OF measurement
>     FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
>
> CREATE TABLE measurement_y2006m03 PARTITION OF measurement
>     FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
>
> prepare s as select * from measurement where logdate = $1;
> execute s('2006-02-01').
>
> The generic plan will probably not be chosen because it doesn't reduce the cost
> which can be reduced at initial_prune while the custom plan reduces such cost
> at  planning time. which makes the cost comparison not fair.  I'm thinking if we can
> get an estimated cost reduction of initial_prunne for generic plan based on the
> partition pruned at plan time from custom plan and then reducing
> such costs from the generic plan.  I just went through the related code but
> didn't write anything now.  I'd like to see if this is a correct direction to go.

I think the end result will depend upon the value passed to the first
few executions of the prepared plan. If they happen to have estimates
similar or higher compared to the generic plan, generic plan will be
chosen later. But if they happen to be way lesser than the generic
plan's estimates, a custom plan might be chosen. What happens when we
execute plans with values that have estimates similar to the generic
plan later when we moderate generic plan costs based on the custom
plans?

If the table has good distribution of a partition key, which also
results in good distribution of data across partitions, generic plan
cost will be similar to the custom plan costs. If not that's something
we want to fix. But if there's a large data skew, probably letting the
custom plan always win is better. [1] talks about generic plan being
not chosen just because it has higher cost even though its shape is
similar to a custom plan. Leveraging that idea might be a good option.
If the custom plans produced have shapes similar to the generic plan,
stop producing those.

[1] https://www.postgresql.org/message-id/CA%2BTgmoZHYoAL4HYwnGO25B8CxCB%2BvNMdf%2B7rbUzYykR4sU9yUA%40mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat



Re: Improve choose_custom_plan for initial partition prune case

От
Andy Fan
Дата:
Hi Ashutosh:

Thanks for coming. 

On Mon, Oct 5, 2020 at 9:27 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Thu, Oct 1, 2020 at 9:34 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> Given the plan example:
>
> CREATE TABLE measurement (
>     city_id         int not null,
>     logdate         date not null,
>     peaktemp        int,
>     unitsales       int
> ) PARTITION BY RANGE (logdate);
>
> CREATE TABLE measurement_y2006m02 PARTITION OF measurement
>     FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
>
> CREATE TABLE measurement_y2006m03 PARTITION OF measurement
>     FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
>
> prepare s as select * from measurement where logdate = $1;
> execute s('2006-02-01').
>
> The generic plan will probably not be chosen because it doesn't reduce the cost
> which can be reduced at initial_prune while the custom plan reduces such cost
> at  planning time. which makes the cost comparison not fair.  I'm thinking if we can
> get an estimated cost reduction of initial_prunne for generic plan based on the
> partition pruned at plan time from custom plan and then reducing
> such costs from the generic plan.  I just went through the related code but
> didn't write anything now.  I'd like to see if this is a correct direction to go.

What happens when we
execute plans with values that have estimates similar to the generic
plan later when we moderate generic plan costs based on the custom
plans?


The example at the beginning of this thread,  I used the exact same values
every time, the custom plan will be chosen all the time, which is bad,  
The main reason is the custom plan knows the exact value in Execute 
message, so it run plan time partition prune, then the total cost is low, however
for the generic plan the partition prune happens at Runtime initial_partition prune
stage,  so the cost of the partitions which can be pruned at that stage is still
included the total cost,  so generic plans can't be chosen. that would be the
thing I want to fix. 

If the table has good distribution of a partition key, which also
results in good distribution of data across partitions, generic plan
cost will be similar to the custom plan costs. If not that's something
we want to fix.
 
[1] talks about generic plan being
not chosen just because it has higher cost even though its shape is
similar to a custom plan. Leveraging that idea might be a good option.
If the custom plans produced have shapes similar to the generic plan,
stop producing those.

If I understand you correctly,  the issue I want to fix here matches this goal. 

--
Best Regards
Andy Fan

Re: Improve choose_custom_plan for initial partition prune case

От
Andy Fan
Дата:


On Sat, Oct 3, 2020 at 10:05 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
Hi Amit:

  Very glad to see your comment!

On Fri, Oct 2, 2020 at 4:21 PM Amit Langote <amitlangote09@gmail.com> wrote:
Hi Andy,

On Fri, Oct 2, 2020 at 1:04 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> Given the plan example:
>
> CREATE TABLE measurement (
>     city_id         int not null,
>     logdate         date not null,
>     peaktemp        int,
>     unitsales       int
> ) PARTITION BY RANGE (logdate);
>
> CREATE TABLE measurement_y2006m02 PARTITION OF measurement
>     FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
>
> CREATE TABLE measurement_y2006m03 PARTITION OF measurement
>     FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
>
> prepare s as select * from measurement where logdate = $1;
> execute s('2006-02-01').
>
> The generic plan will probably not be chosen because it doesn't reduce the cost
> which can be reduced at initial_prune while the custom plan reduces such cost
> at  planning time. which makes the cost comparison not fair.

I agree that there is something to be done here.  Actually, I think we
should try to find a solution that will allow us to consider not just
"initial" pruning, but also "execution-time" pruning.   The latter
will allow a nested loop join whose inner side scans a partitioned
table using a parameterized scan on the partition key to be favored
over other join plans, because that parameterized scan can use
execution-time pruning which can make the inner scan very cheap.


This looks like to resolve another important issue of partition prune, which
may happen at planning time totally (no generic plan or custom plan involved). 
for example between choosing a Nest Loop plan which can use
some run-time partition prune and hash join which can't.  I "repeat" your idea
just to  make sure I understand you correctly. 
 
>  I'm thinking if we can
> get an estimated cost reduction of initial_prunne for generic plan based on the
> partition pruned at plan time from custom plan and then reducing
> such costs from the generic plan.  I just went through the related code but
> didn't write anything now.  I'd like to see if this is a correct direction to go.

That's an interesting idea, that is, to try to do this totally outside
the planner.  When I was thinking about this a little while ago, I was
trying to find a way to adjust the cost of the plan in the planner
itself by looking at the runtime pruning info in the nodes that
support it, that is, Append, MergeAppend.  Actually, such an approach
had also come up in the original run-time pruning discussion [1].

 
Thank you for your comments. Looks like your approach can be helpful
for the both cases, and I did think a bit for that as well, However, that looks
complex (for me) AND I am prefer to guess how many partitions can be
pruned with real data even it is the real data in the past (I assume that 
will not cause too much difference in practice). 

I'm not sure if I should treat Robert's comments as an opposed idea[1] , 
but I think there are some little differences.  I'd like to implement my idea
soon, and I'm glad to see any opposed idea at any time, of course the sooner 
the better:) 

After some day's coding in this direction, I find a very upsetting situation.

The main idea here is if we have N partition survived for custom plan after
plan time partition prune, then I assume we can get the similar partition 
survived for generic plan after initial partition prune.  Then we should reduce
such costs from generic plans.

However the hard part in implementation is we can't associate the Append 
node in the generic plan with the survived partition information from the 
custom plan even Append node has apprelids in it.

1. Associate them with rtindex. The final rtindex is impacted by glob->finalrtable,
 however the glob->finalrtable includes the child partitions, we will get less 
finalrtable in the custom plan so rtindex can't be matched.

2. Associate them with RelationOid, and we can record such information in the
 Append node as well. The bad part is the same relation Oid may appear multiple
 times in a query. for example: SELECT .. FROM p p1,  p p2 where p1.partkey1 = $1
 AND p2.partkey2 = $2;

So any hint on this will be appreciated..
 
--
Best Regards
Andy Fan

Re: Improve choose_custom_plan for initial partition prune case

От
Andy Fan
Дата:

2. Associate them with RelationOid, and we can record such information in the
 Append node as well. The bad part is the same relation Oid may appear multiple
 times in a query. for example: SELECT .. FROM p p1,  p p2 where p1.partkey1 = $1
 AND p2.partkey2 = $2;


I just came up with a new idea.  Since this situation should be rare, we can just come back
to our original method (totally ignore the cost reduction) or use the average number.  Fixing
the 99% cases would be a big winner as well IMO.

So any hint on this will be appreciated..
 
--
Best Regards
Andy Fan


--
Best Regards
Andy Fan

Re: Improve choose_custom_plan for initial partition prune case

От
Andy Fan
Дата:


On Wed, Oct 7, 2020 at 2:43 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

2. Associate them with RelationOid, and we can record such information in the
 Append node as well. The bad part is the same relation Oid may appear multiple
 times in a query. for example: SELECT .. FROM p p1,  p p2 where p1.partkey1 = $1
 AND p2.partkey2 = $2;


I just came up with a new idea.  Since this situation should be rare, we can just come back
to our original method (totally ignore the cost reduction) or use the average number.  Fixing
the 99% cases would be a big winner as well IMO.


I just uploaded a runnable patch for this idea,  but it looks like the design is wrong 
at the beginning. for example:

Nest Loop:
    Append
       p_1
       p_2
    inner

The patch only reduces the cost of the Append node,  but in fact,  since the loop count
of the inner plan is reduced as well, such cost should be reduced as well. However even
we can reduce the cost for joining, that is not a smart solution as well, since 
the generic plan itself might be wrong as well at the beginning. 

--
Best Regards
Andy Fan
Вложения

Re: Improve choose_custom_plan for initial partition prune case

От
Ashutosh Bapat
Дата:
On Wed, Oct 7, 2020 at 11:20 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> Hi Ashutosh:
>
> Thanks for coming.
>
> On Mon, Oct 5, 2020 at 9:27 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
>>
>> On Thu, Oct 1, 2020 at 9:34 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>> >
>> > Given the plan example:
>> >
>> > CREATE TABLE measurement (
>> >     city_id         int not null,
>> >     logdate         date not null,
>> >     peaktemp        int,
>> >     unitsales       int
>> > ) PARTITION BY RANGE (logdate);
>> >
>> > CREATE TABLE measurement_y2006m02 PARTITION OF measurement
>> >     FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
>> >
>> > CREATE TABLE measurement_y2006m03 PARTITION OF measurement
>> >     FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
>> >
>> > prepare s as select * from measurement where logdate = $1;
>> > execute s('2006-02-01').
>> >
>> > The generic plan will probably not be chosen because it doesn't reduce the cost
>> > which can be reduced at initial_prune while the custom plan reduces such cost
>> > at  planning time. which makes the cost comparison not fair.  I'm thinking if we can
>> > get an estimated cost reduction of initial_prunne for generic plan based on the
>> > partition pruned at plan time from custom plan and then reducing
>> > such costs from the generic plan.  I just went through the related code but
>> > didn't write anything now.  I'd like to see if this is a correct direction to go.
>>
>> What happens when we
>> execute plans with values that have estimates similar to the generic
>> plan later when we moderate generic plan costs based on the custom
>> plans?
>>
>
> The example at the beginning of this thread,  I used the exact same values
> every time, the custom plan will be chosen all the time, which is bad,
> The main reason is the custom plan knows the exact value in Execute
> message, so it run plan time partition prune, then the total cost is low, however
> for the generic plan the partition prune happens at Runtime initial_partition prune
> stage,  so the cost of the partitions which can be pruned at that stage is still
> included the total cost,  so generic plans can't be chosen. that would be the
> thing I want to fix.

Something is wrong in the generic plan costing then. IIUC, the
selectivity estimate for only a single partition should come out >= 1.
For all the other partitions, it should be 1 and that too because we
clamp the row counts. So the final costs for generic and custom plans
shouldn't be far off unless there's large deviation in the selectivity
of a partition key. I am assuming that there's an equality condition
on a partition key. That's what I meant by the paragraph below.

>
>> If the table has good distribution of a partition key, which also
>> results in good distribution of data across partitions, generic plan
>> cost will be similar to the custom plan costs. If not that's something
>> we want to fix.

Can you please investigate on these lines?

-- 
Best Wishes,
Ashutosh Bapat



Re: Improve choose_custom_plan for initial partition prune case

От
Amit Langote
Дата:
Hi Ashutosh,

On Wed, Oct 7, 2020 at 8:40 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> On Wed, Oct 7, 2020 at 11:20 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
> > On Mon, Oct 5, 2020 at 9:27 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
> >> What happens when we
> >> execute plans with values that have estimates similar to the generic
> >> plan later when we moderate generic plan costs based on the custom
> >> plans?
> >>
> >
> > The example at the beginning of this thread,  I used the exact same values
> > every time, the custom plan will be chosen all the time, which is bad,
> > The main reason is the custom plan knows the exact value in Execute
> > message, so it run plan time partition prune, then the total cost is low, however
> > for the generic plan the partition prune happens at Runtime initial_partition prune
> > stage,  so the cost of the partitions which can be pruned at that stage is still
> > included the total cost,  so generic plans can't be chosen. that would be the
> > thing I want to fix.
>
> Something is wrong in the generic plan costing then.

That's right.

> IIUC, the
> selectivity estimate for only a single partition should come out >= 1.
> For all the other partitions, it should be 1 and that too because we
> clamp the row counts. So the final costs for generic and custom plans
> shouldn't be far off unless there's large deviation in the selectivity
> of a partition key. I am assuming that there's an equality condition
> on a partition key. That's what I meant by the paragraph below.
> >
> >> If the table has good distribution of a partition key, which also
> >> results in good distribution of data across partitions, generic plan
> >> cost will be similar to the custom plan costs. If not that's something
> >> we want to fix.
>
> Can you please investigate on these lines?

The planner currently costs a generic plan assuming that *all*
partitions will be scanned, which is not the case, because run-time
pruning will ensure that scan nodes of all but one partition will be
discarded, assuming a simple query like `select * from parted_table
where a = $1`.  With N partitions, a generic plan for such a query is
N times as expensive as a custom plan, so will never win.

For example:

create table bar (a int, b int, c int) partition by range (a);
create table bar_1 partition of bar for values from (1) to (100001);
create table bar_2 partition of bar for values from (100001) to (200001);
create table bar_3 partition of bar for values from (200001) to (300001);
create table bar_4 partition of bar for values from (300001) to (400001);
create table bar_5 partition of bar for values from (400001) to (500001);
create table bar_6 partition of bar for values from (500001) to (600001);
create table bar_7 partition of bar for values from (600001) to (700001);
create table bar_8 partition of bar for values from (700001) to (800001);
create table bar_9 partition of bar for values from (800001) to (900001);
create table bar_10 partition of bar for values from (900001) to (1000001);
insert into bar select generate_series(1, 1000000);
create index on bar (a);
analyze bar;

set plan_cache_mode to force_custom_plan ;
prepare q as select * from bar where a = $1;
explain execute q (1);
                                  QUERY PLAN
------------------------------------------------------------------------------
 Index Scan using bar_1_a_idx on bar_1 bar  (cost=0.29..8.31 rows=1 width=12)
   Index Cond: (a = 1)
(2 rows)

deallocate q;
set plan_cache_mode to force_generic_plan ;
prepare q as select * from bar where a = $1;
explain execute q (1);
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Append  (cost=0.29..83.15 rows=10 width=12)
   Subplans Removed: 9
   ->  Index Scan using bar_1_a_idx on bar_1  (cost=0.29..8.31 rows=1 width=12)
         Index Cond: (a = $1)
(4 rows)

As you can see, the cost of the generic plan is 83.13 which is 8.31 *
10, even though only one partition is actually scanned, so the actual
cost is the same as a custom plan.  The planner fails to consider that
the those 9 subplans will be removed during execution.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: Improve choose_custom_plan for initial partition prune case

От
Andy Fan
Дата:
Thank you Amit and Ashutosh for your reply!

On Wed, Oct 7, 2020 at 8:41 PM Amit Langote <amitlangote09@gmail.com> wrote:
Hi Ashutosh,

On Wed, Oct 7, 2020 at 8:40 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> On Wed, Oct 7, 2020 at 11:20 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
> > On Mon, Oct 5, 2020 at 9:27 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
> >> What happens when we
> >> execute plans with values that have estimates similar to the generic
> >> plan later when we moderate generic plan costs based on the custom
> >> plans?
> >>
> >
> > The example at the beginning of this thread,  I used the exact same values
> > every time, the custom plan will be chosen all the time, which is bad,
> > The main reason is the custom plan knows the exact value in Execute
> > message, so it run plan time partition prune, then the total cost is low, however
> > for the generic plan the partition prune happens at Runtime initial_partition prune
> > stage,  so the cost of the partitions which can be pruned at that stage is still
> > included the total cost,  so generic plans can't be chosen. that would be the
> > thing I want to fix.
>
> Something is wrong in the generic plan costing then.

That's right.

> IIUC, the
> selectivity estimate for only a single partition should come out >= 1.
> For all the other partitions, it should be 1 and that too because we
> clamp the row counts.

If the planner knows the exact parameter,  the planner can estimate 
other partitions as 1 row (actually 0 row).  However if it doesn't know that,
it will be  rows / ndistinctVals for  "partkey = $1" case. 
 
So the final costs for generic and custom plans
> shouldn't be far off unless there's large deviation in the selectivity
> of a partition key. I am assuming that there's an equality condition
> on a partition key. That's what I meant by the paragraph below.
> >
> >> If the table has good distribution of a partition key, which also
> >> results in good distribution of data across partitions, generic plan
> >> cost will be similar to the custom plan costs. If not that's something
> >> we want to fix.
>
> Can you please investigate on these lines?

The planner currently costs a generic plan assuming that *all*
partitions will be scanned, which is not the case, because run-time
pruning will ensure that scan nodes of all but one partition will be
discarded, assuming a simple query like `select * from parted_table
where a = $1`.  With N partitions, a generic plan for such a query is
N times as expensive as a custom plan, so will never win.

For example:

create table bar (a int, b int, c int) partition by range (a);
create table bar_1 partition of bar for values from (1) to (100001);
create table bar_2 partition of bar for values from (100001) to (200001);
create table bar_3 partition of bar for values from (200001) to (300001);
create table bar_4 partition of bar for values from (300001) to (400001);
create table bar_5 partition of bar for values from (400001) to (500001);
create table bar_6 partition of bar for values from (500001) to (600001);
create table bar_7 partition of bar for values from (600001) to (700001);
create table bar_8 partition of bar for values from (700001) to (800001);
create table bar_9 partition of bar for values from (800001) to (900001);
create table bar_10 partition of bar for values from (900001) to (1000001);
insert into bar select generate_series(1, 1000000);
create index on bar (a);
analyze bar;

set plan_cache_mode to force_custom_plan ;
prepare q as select * from bar where a = $1;
explain execute q (1);
                                  QUERY PLAN
------------------------------------------------------------------------------
 Index Scan using bar_1_a_idx on bar_1 bar  (cost=0.29..8.31 rows=1 width=12)
   Index Cond: (a = 1)
(2 rows)

deallocate q;
set plan_cache_mode to force_generic_plan ;
prepare q as select * from bar where a = $1;
explain execute q (1);
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Append  (cost=0.29..83.15 rows=10 width=12)
   Subplans Removed: 9
   ->  Index Scan using bar_1_a_idx on bar_1  (cost=0.29..8.31 rows=1 width=12)
         Index Cond: (a = $1)
(4 rows)

As you can see, the cost of the generic plan is 83.13 which is 8.31 *
10, even though only one partition is actually scanned, so the actual
cost is the same as a custom plan.  The planner fails to consider that
the those 9 subplans will be removed during execution.

--
Amit Langote
EDB: http://www.enterprisedb.com

I also had some replies at the original runtime partition prune thread [1],   
I am very sorry that I mess up this topic with 2 threads,  I have some new 
understanding about this (might be old ideas for others),  I'd like to 
continue this discussion in that thread and add you to the cc list there. 

Thanks!