Обсуждение: Re: inefficient/wrong plan cache mode selection for queries with partitioned tables (postgresql 17)
On Mon, 12 May 2025, 05:08 Andrei Lepikhov, <lepihov@gmail.com> wrote:
Thanks for this puzzle!
I suppose, in case generic planning is much faster than custom one,
there are two candidates exist:
1. Touching the index during planning causes too much overhead - see
get_actual_variable_range
2. You have a massive default_statistics_target for a table involved.
This is just an artifact of the fact that runtime pruning is not factored into the costs. Note the cost of the generic plan. The plan_cache_mode GUC is about the only way to overrule the choice to use the custom plan.
David
On Mon, May 12, 2025 at 6:01 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Mon, 12 May 2025, 05:08 Andrei Lepikhov, <lepihov@gmail.com> wrote:Thanks for this puzzle!
I suppose, in case generic planning is much faster than custom one,
there are two candidates exist:
1. Touching the index during planning causes too much overhead - see
get_actual_variable_range
2. You have a massive default_statistics_target for a table involved.This is just an artifact of the fact that runtime pruning is not factored into the costs. Note the cost of the generic plan. The plan_cache_mode GUC is about the only way to overrule the choice to use the custom plan.
Situation quite the opposite - I need to force a generic plan because it has the same execution time as a custom plan but performs 20-50x faster (because in custom plan case - 95-98% time spent in planning not in execution).
And the problem is that the cost of a custom plan ignores the cost of planning itself (which is like 2x orders of magnitude worse than the cost of real time partition pruning of a generic plan). I started thinking of something like cost_planner GUC to help with similar issues (where planning cost calculated as cost_planned*(some heuristic function with amount involved in query tables).
In my case the high cost of planning itself should force the database to use generic plan.
Maxim Boguk
Senior Postgresql DBA
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
Senior Postgresql DBA
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
Maxim Boguk <maxim.boguk@gmail.com> writes: > And the problem is that the cost of a custom plan ignores the cost of > planning itself (which is like 2x orders of magnitude worse than the cost > of real time partition pruning of a generic plan). False. The estimate is evidently pretty wrong, but it's not that there is no consideration at all. See around line 1370 in src/backend/utils/cache/plancache.c. regards, tom lane
On Mon, May 12, 2025 at 6:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maxim Boguk <maxim.boguk@gmail.com> writes:
> And the problem is that the cost of a custom plan ignores the cost of
> planning itself (which is like 2x orders of magnitude worse than the cost
> of real time partition pruning of a generic plan).
False. The estimate is evidently pretty wrong, but it's not that
there is no consideration at all. See around line 1370 in
src/backend/utils/cache/plancache.c.
regards, tom lane
Thank you.
Reading the code - probably the lowest hanging fruit is to make
'The current multiplier of 1000 * cpu_operator_cost' configurable in the future versions.
PS: it's always nice to see when my ad-hoc idea (about N*nrelations as cost planner estimate) is already implemented.
Maxim Boguk
Senior Postgresql DBA
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
Senior Postgresql DBA
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
Maxim Boguk <maxim.boguk@gmail.com> writes: > Reading the code - probably the lowest hanging fruit is to make > 'The current multiplier of 1000 * cpu_operator_cost' configurable in the > future versions. I'm wondering whether we should try to make the planner not expend the effort in the first place, but leave partition pruning to the executor, at least in cases where it can determine that that will be possible. regards, tom lane
Re: inefficient/wrong plan cache mode selection for queries with partitioned tables (postgresql 17)
От
Andrei Lepikhov
Дата:
On 5/12/25 20:07, Tom Lane wrote: > Maxim Boguk <maxim.boguk@gmail.com> writes: >> Reading the code - probably the lowest hanging fruit is to make >> 'The current multiplier of 1000 * cpu_operator_cost' configurable in the >> future versions. > > I'm wondering whether we should try to make the planner not expend > the effort in the first place, but leave partition pruning to the > executor, at least in cases where it can determine that that will be > possible. Significant planning time is a sorting out lots of scan paths, applying partition statistics etc. planner-stage partitioning reduces these efforts drastically. -- regards, Andrei Lepikhov
On Tue, 13 May 2025 at 03:19, Maxim Boguk <maxim.boguk@gmail.com> wrote: > On Mon, May 12, 2025 at 6:01 PM David Rowley <dgrowleyml@gmail.com> wrote: >> This is just an artifact of the fact that runtime pruning is not factored into the costs. Note the cost of the genericplan. The plan_cache_mode GUC is about the only way to overrule the choice to use the custom plan. > > Situation quite the opposite - I need to force a generic plan because it has the same execution time as a custom plan butperforms 20-50x faster (because in custom plan case - 95-98% time spent in planning not in execution). You misunderstood. The choice the planner (or choose_custom_plan) made to use the custom plan can be overridden with SET plan_cache_mode = force_generic_plan;, which seems to be what performs better for you, per your example EXPLAIN ANALYZE outputs. > And the problem is that the cost of a custom plan ignores the cost of planning itself (which is like 2x orders of magnitudeworse than the cost of real time partition pruning of a generic plan). I started thinking of something like cost_plannerGUC to help with similar issues (where planning cost calculated as cost_planned*(some heuristic function withamount involved in query tables). > > In my case the high cost of planning itself should force the database to use generic plan. Certainly the cost estimate for planning there is quite crude. I doubt you'll find anyone arguing that it's not. It is however designed to be low-overhead. The estimated planning cost isn't the issue here. It's (as I mentioned) related to no cost consideration being given to run-time pruning. We could certainly adjust things so that is accounted for, and we (I think Robert and I) have talked about it in the past. The problem is that doing that is a wild stab in the dark, especially so for your range partitioned case where the amount of actual partitions pruned during executor startup could range from 0 to all of them. Unfortunately when we tag those costs onto the plan, we've no idea what the parameter values are going to be when the plan is executed. I think Robert suggested multiplying the Append cost by DEFAULT_INEQ_SEL for this bounded range type pruning. Whether that will help you or not depends on how many partitions you have and how evenly populated they are. In order words, it's a tricky problem with no one-size-fits-all solution. David
On Mon, May 12, 2025 at 9:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maxim Boguk <maxim.boguk@gmail.com> writes:
> Reading the code - probably the lowest hanging fruit is to make
> 'The current multiplier of 1000 * cpu_operator_cost' configurable in the
> future versions.
Is the 100x backend memory usage per cached plan difference expected between generic and custom plans?
There are sample memory context dump with
alter role app_server set plan_cache_mode to force_custom_plan ;
reconnect pgbouncers/wait 5 min/check sample
***=> begin;
BEGIN
****=*> select count(*), count(*) filter (where generic_plans>0) as generic_plans, count(*) filter (where custom_plans>0) as custom_plans from pg_prepared_statements ;
count | generic_plans | custom_plans
-------+---------------+--------------
177 | 3 | 174
(1 row)
***=*> select name,parent,level,count(*), pg_size_pretty(sum(total_bytes)) as bytes, sum(total_nblocks) as nblocks, pg_size_pretty(sum(free_bytes)) as free_bytes, sum(free_chunks) as free_chunks, pg_size_pretty(sum(used_bytes)) as used_bytes from pg_backend_memory_contexts group by 1,2,3 having sum(total_bytes)>128*1024 order by 3, sum(total_bytes) desc;
name | parent | level | count | bytes | nblocks | free_bytes | free_chunks | used_bytes
-------------------------+--------------------+-------+-------+---------+---------+------------+-------------+------------
TopMemoryContext | | 0 | 1 | 769 kB | 15 | 236 kB | 574 | 532 kB
CacheMemoryContext | TopMemoryContext | 1 | 1 | 9856 kB | 125 | 223 kB | 2 | 9633 kB
CachedPlanSource | CacheMemoryContext | 2 | 264 | 5228 kB | 1142 | 2142 kB | 456 | 3086 kB
index info | CacheMemoryContext | 2 | 776 | 1612 kB | 1483 | 575 kB | 908 | 1037 kB
CachedPlan | CacheMemoryContext | 2 | 62 | 154 kB | 137 | 41 kB | 31 | 113 kB
CachedPlanQuery | CachedPlanSource | 3 | 264 | 4777 kB | 1147 | 1628 kB | 133 | 3149 kB
reconnect pgbouncers/wait 5 min/check sample
***=> begin;
BEGIN
****=*> select count(*), count(*) filter (where generic_plans>0) as generic_plans, count(*) filter (where custom_plans>0) as custom_plans from pg_prepared_statements ;
count | generic_plans | custom_plans
-------+---------------+--------------
177 | 3 | 174
(1 row)
***=*> select name,parent,level,count(*), pg_size_pretty(sum(total_bytes)) as bytes, sum(total_nblocks) as nblocks, pg_size_pretty(sum(free_bytes)) as free_bytes, sum(free_chunks) as free_chunks, pg_size_pretty(sum(used_bytes)) as used_bytes from pg_backend_memory_contexts group by 1,2,3 having sum(total_bytes)>128*1024 order by 3, sum(total_bytes) desc;
name | parent | level | count | bytes | nblocks | free_bytes | free_chunks | used_bytes
-------------------------+--------------------+-------+-------+---------+---------+------------+-------------+------------
TopMemoryContext | | 0 | 1 | 769 kB | 15 | 236 kB | 574 | 532 kB
CacheMemoryContext | TopMemoryContext | 1 | 1 | 9856 kB | 125 | 223 kB | 2 | 9633 kB
CachedPlanSource | CacheMemoryContext | 2 | 264 | 5228 kB | 1142 | 2142 kB | 456 | 3086 kB
index info | CacheMemoryContext | 2 | 776 | 1612 kB | 1483 | 575 kB | 908 | 1037 kB
CachedPlan | CacheMemoryContext | 2 | 62 | 154 kB | 137 | 41 kB | 31 | 113 kB
CachedPlanQuery | CachedPlanSource | 3 | 264 | 4777 kB | 1147 | 1628 kB | 133 | 3149 kB
And with:
alter role app_server set plan_cache_mode to force_generic_plan ;
reconnect pgbouncers/wait 5 min/check sample
***=> begin;
BEGIN
***=*> select count(*), count(*) filter (where generic_plans>0) as generic_plans, count(*) filter (where custom_plans>0) as custom_plans from pg_prepared_statements ;
count | generic_plans | custom_plans
-------+---------------+--------------
165 | 165 | 0
(1 row)
***=*> select name,parent,level,count(*), pg_size_pretty(sum(total_bytes)) as bytes, sum(total_nblocks) as nblocks, pg_size_pretty(sum(free_bytes)) as free_bytes, sum(free_chunks) as free_chunks, pg_size_pretty(sum(used_bytes)) as used_bytes from pg_backend_memory_contexts group by 1,2,3 having sum(total_bytes)>128*1024 order by 3, sum(total_bytes) desc;
name | parent | level | count | bytes | nblocks | free_bytes | free_chunks | used_bytes
-------------------------+--------------------+-------+-------+---------+---------+------------+-------------+------------
TopMemoryContext | | 0 | 1 | 809 kB | 16 | 236 kB | 712 | 573 kB
CacheMemoryContext | TopMemoryContext | 1 | 1 | 18 MB | 126 | 8137 kB | 3 | 9910 kB
CachedPlan | CacheMemoryContext | 2 | 252 | 73 MB | 1490 | 29 MB | 127 | 43 MB
CachedPlanSource | CacheMemoryContext | 2 | 252 | 4942 kB | 1095 | 1926 kB | 381 | 3016 kB
index info | CacheMemoryContext | 2 | 794 | 1655 kB | 1516 | 579 kB | 926 | 1076 kB
CachedPlanQuery | CachedPlanSource | 3 | 252 | 4502 kB | 1096 | 1460 kB | 134 | 3041 kB
reconnect pgbouncers/wait 5 min/check sample
***=> begin;
BEGIN
***=*> select count(*), count(*) filter (where generic_plans>0) as generic_plans, count(*) filter (where custom_plans>0) as custom_plans from pg_prepared_statements ;
count | generic_plans | custom_plans
-------+---------------+--------------
165 | 165 | 0
(1 row)
***=*> select name,parent,level,count(*), pg_size_pretty(sum(total_bytes)) as bytes, sum(total_nblocks) as nblocks, pg_size_pretty(sum(free_bytes)) as free_bytes, sum(free_chunks) as free_chunks, pg_size_pretty(sum(used_bytes)) as used_bytes from pg_backend_memory_contexts group by 1,2,3 having sum(total_bytes)>128*1024 order by 3, sum(total_bytes) desc;
name | parent | level | count | bytes | nblocks | free_bytes | free_chunks | used_bytes
-------------------------+--------------------+-------+-------+---------+---------+------------+-------------+------------
TopMemoryContext | | 0 | 1 | 809 kB | 16 | 236 kB | 712 | 573 kB
CacheMemoryContext | TopMemoryContext | 1 | 1 | 18 MB | 126 | 8137 kB | 3 | 9910 kB
CachedPlan | CacheMemoryContext | 2 | 252 | 73 MB | 1490 | 29 MB | 127 | 43 MB
CachedPlanSource | CacheMemoryContext | 2 | 252 | 4942 kB | 1095 | 1926 kB | 381 | 3016 kB
index info | CacheMemoryContext | 2 | 794 | 1655 kB | 1516 | 579 kB | 926 | 1076 kB
CachedPlanQuery | CachedPlanSource | 3 | 252 | 4502 kB | 1096 | 1460 kB | 134 | 3041 kB
In the first case 2.5Kb per CachedPlan
in the second case 300Kb per CachedPlan
Problem with force_generic_plan that backends quickly eat up 1GB per backend exhausting available server memory.
Postgresql version 17.4 and no complicated query in this workload (1-2-3 tables per query, sometimes two tables could be partitioned to 24 partitions each, third table always monolitic).
Regards,
Maxim
Maxim Boguk
Senior Postgresql DBA
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
Senior Postgresql DBA
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678