Re: [PATCH] Add a guc parameter to control limit clause adjust path cost.
| От | Bryan Green | 
|---|---|
| Тема | Re: [PATCH] Add a guc parameter to control limit clause adjust path cost. | 
| Дата | |
| Msg-id | 1c6f27f8-40a3-43cc-a485-09d70d09ae15@gmail.com обсуждение исходный текст  | 
		
| Ответ на | [PATCH] Add a guc parameter to control limit clause adjust path cost. ("Haiyang Li" <mohen.lhy@alibaba-inc.com>) | 
| Ответы | 
                	
            		Re: [PATCH] Add a guc parameter to control limit clause adjust path cost.
            		
            		 | 
		
| Список | pgsql-hackers | 
On 11/2/2025 11:28 AM, Haiyang Li wrote:
> Hi all,
> When a query contains LIMIT/OFFSET clauses, the optimizer uses this
> information to influence path generation. For example, it may consider
> the path with the lowest startup cost and adjust the cost of paths
> accordingly.
> 
> In most cases, these behaviors are beneficial and result in a more optimal
> path. However, there are some scenarios where these behaviors can lead to
> a bad plan. One typical scenario is when a query contains both
> ORDER BY and LIMIT clauses, and the optimizer chooses an incorrect index
> scan.
> PostgreSQL makes very optimistic predictions about the path that uses an
> ordered index to satisfy the row count requirement. When the actual scenario
> differs from the prediction, PostgreSQL may generate a bad plan. One case
> provided by Lukas [1] hits this issue. And another case i met recently is:
> 
> ```
> -- bad plan
>                                                                        
>           QUERY PLAN                                                   
>                                
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=1.00..611.58 rows=200 width=419) (actual
> time=3848.321..3867.253 rows=200 loops=1)
>   Buffers: shared hit=5957080 (main=5957080 vm=0 fsm=0) read=1 (main=1
> vm=0 fsm=0)
>   I/O Timings: shared/local read=0.518
>   Read Timings: total=0.534 ms buffer alloc=0.010 read io=0.519 page
> replay=0.002
>   ->  Nested Loop  (cost=1.00..1556085.69 rows=509704 width=419) (actual
> time=3848.320..3867.231 rows=200 loops=1)
>         Buffers: shared hit=5957080 (main=5957080 vm=0 fsm=0) read=1
> (main=1 vm=0 fsm=0)
>         I/O Timings: shared/local read=0.518
>         Read Timings: total=0.534 ms buffer alloc=0.010 read io=0.519
> page replay=0.002
>         ->  Index Scan using "idx_outDetail_channel_id" on
> pmc_outcome_detail d  (cost=0.43..369798.26 rows=641998 width=411)
> (actual time=0.042..758.725 rows=1172313 loops=1)
>               Index Cond: ((channel_id = ANY
> ('{7000,2000,4000}'::numeric[])) AND (channel_id = '4000'::numeric))
>               Filter: ((voucher_file_name IS NULL) AND (payment_type =
> ANY ('{7,8}'::numeric[])) AND (status = '3'::numeric))
>               Rows Removed by Filter: 586208
>               Buffers: shared hit=95516 (main=95516 vm=0 fsm=0)
>         ->  Index Scan using pmc_outcome_pkey on pmc_outcome o 
> (cost=0.56..1.85 rows=1 width=18) (actual time=0.002..0.002 rows=0
> loops=1172313)
>               Index Cond: (id = d.outcome_id)
>               Filter: (user_type = '1'::numeric)
>               Rows Removed by Filter: 1
>               Buffers: shared hit=5861564 (main=5861564 vm=0 fsm=0)
> read=1 (main=1 vm=0 fsm=0)
>               I/O Timings: shared/local read=0.518
>               Read Timings: total=0.534 ms buffer alloc=0.010 read
> io=0.519 page replay=0.002
> Planning:
>   Buffers: shared hit=20 (main=16 vm=4 fsm=0)
> Planning Time: 0.269 ms
> Execution Time: 3867.322 ms
> (24 rows)
> -- good plan
>                                                                        
>         QUERY PLAN                                                     
>                          
>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=15964.44..595103.67 rows=200 width=419) (actual
> time=809.243..818.407 rows=200 loops=1)
>   Buffers: shared hit=5957257 (main=5957257 vm=0 fsm=0) read=5 (main=5
> vm=0 fsm=0)
>   I/O Timings: shared/local read=1.219
>   Read Timings: total=1.243 ms buffer alloc=0.015 read io=1.219 page
> replay=0.004
>   ->  Gather  (cost=15964.44..595103.67 rows=509704 width=419) (actual
> time=809.242..818.381 rows=200 loops=1)
>         Workers Planned: 6
>         Workers Launched: 6
>         Buffers: shared hit=5957257 (main=5957257 vm=0 fsm=0) read=5
> (main=5 vm=0 fsm=0)
>         I/O Timings: shared/local read=1.219
>         Read Timings: total=1.243 ms buffer alloc=0.015 read io=1.219
> page replay=0.004
>         ->  Nested Loop  (cost=14964.44..543133.27 rows=84951 width=419)
> (actual time=801.757..803.199 rows=31 loops=7)
>               Buffers: shared hit=5957257 (main=5957257 vm=0 fsm=0)
> read=5 (main=5 vm=0 fsm=0)
>               I/O Timings: shared/local read=1.219
>               Read Timings: total=1.243 ms buffer alloc=0.015 read
> io=1.219 page replay=0.004
>               ->  Parallel Bitmap Heap Scan on pmc_outcome_detail d 
> (cost=14963.87..345418.09 rows=107000 width=411) (actual
> time=107.837..210.957 rows=167477 loops=7)
>                     Recheck Cond: ((channel_id = ANY
> ('{7000,2000,4000}'::numeric[])) AND (channel_id = '4000'::numeric))
>                     Filter: ((voucher_file_name IS NULL) AND
> (payment_type = ANY ('{7,8}'::numeric[])) AND (status = '3'::numeric))
>                     Rows Removed by Filter: 83745
>                     Heap Blocks: exact=13247
>                     Buffers: shared hit=95541 (main=95541 vm=0 fsm=0)
> read=5 (main=5 vm=0 fsm=0)
>                     I/O Timings: shared/local read=1.219
>                     Read Timings: total=1.243 ms buffer alloc=0.015 read
> io=1.219 page replay=0.004
>                     ->  Bitmap Index Scan on "idx_outDetail_channel_id" 
> (cost=0.00..14803.38 rows=1072436 width=0) (actual time=101.764..101.764
> rows=1820498 loops=1)
>                           Index Cond: ((channel_id = ANY
> ('{7000,2000,4000}'::numeric[])) AND (channel_id = '4000'::numeric))
>                           Buffers: shared hit=6016 (main=6016 vm=0
> fsm=0) read=5 (main=5 vm=0 fsm=0)
>                           I/O Timings: shared/local read=1.219
>                           Read Timings: total=1.243 ms buffer
> alloc=0.015 read io=1.219 page replay=0.004
>               ->  Index Scan using pmc_outcome_pkey on pmc_outcome o 
> (cost=0.56..1.85 rows=1 width=18) (actual time=0.003..0.003 rows=0
> loops=1172342)
>                     Index Cond: (id = d.outcome_id)
>                     Filter: (user_type = '1'::numeric)
>                     Rows Removed by Filter: 1
>                     Buffers: shared hit=5861716 (main=5861716 vm=0 fsm=0)
> Planning:
>   Buffers: shared hit=20 (main=16 vm=4 fsm=0)
> Planning Time: 0.299 ms
> Execution Time: 818.499 ms
> (36 rows)
> ```
> 
> In such cases, users do not have a good way to force the optimizer to
> behave differently.
> 
> To address this, I have added a GUC parameter to control the
> optimizer's use of LIMIT clauses for adjusting the cost of paths,
> providing a means to force intervention. We have also considered similar
> scenarios, such as MIN/MAX. Note that we still retain the optimizer's
> use of LIMIT clauses to adjust the number of rows, as this is always
> reasonable. The patch is provided in attachment(not add test case yet).
> 
> Any thoughts?
> 
> Regards,
> Haiyang Li
> 
> [1] https://pganalyze.com/blog/5mins-postgres-planner-order-by-limit
> <https://pganalyze.com/blog/5mins-postgres-planner-order-by-limit>
> 
> 
I am new to this area, but isn't the problem the filter selectivity
estimation?  The filter Filter: ((voucher_file_name IS NULL) AND
(payment_type =
> ANY ('{7,8}'::numeric[])) AND (status = '3'::numeric))
>               Rows Removed by Filter: 586208
>               Buffers: shared hit=95516 (main=95516 vm=0 fsm=0)
removes the most rows.  The optimizer shouldn't have selected the index
scan path even with LIMIT existing.  Maybe the benefit of the LIMIT
should be adjusted downward based on the filter selectivity?  Calculate
a new limit by dividing the total filter selectivity by the LIMIT?
Forgive me if I am completely offbase with this discussion.
Bryan Green
		
	В списке pgsql-hackers по дате отправления: