Re: planner chooses incremental but not the best one

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: planner chooses incremental but not the best one
Дата
Msg-id a600b525-49be-2fd2-fec9-a668604740fc@enterprisedb.com
обсуждение исходный текст
Ответ на Re: planner chooses incremental but not the best one  (Richard Guo <guofenglinux@gmail.com>)
Ответы Re: planner chooses incremental but not the best one  (Richard Guo <guofenglinux@gmail.com>)
Список pgsql-hackers

On 12/14/23 11:02, Richard Guo wrote:
> 
> On Tue, Dec 12, 2023 at 4:40 PM Nicolas Lutic <n.lutic@loxodata.com
> <mailto:n.lutic@loxodata.com>> wrote:
> 
>     I've come across a behaviour of the planner I can't explain.
>     After a migration from 11 to 15 (on RDS) we noticed a degradation in
>     response time on a query, it went from a few seconds to ten minutes.
>     A vacuum(analyze) has been realized to be sure that all is clean.
>     The 'explain analyze' shows us a change of plan. Postgresql 15 chooses
>     `incremental sort` with an index corresponding to the ORDER BY clause
>     (on the created_at column). The previous v11 plan used a more efficient
>     index.
> 
>     By deactivating incremental sort, response times in v15 are equal to
>     v11
>     one.
> 
> 
> I think this issue is caused by under-estimating the startup cost of
> incremental sort, which in turn is caused by over-estimating the number
> of groups with equal presorted keys by estimate_num_groups().
> 
> We can simulate the same issue with the query below.
> 
> create table t (a int, b int, c int, d int) partition by range (a);
> create table tp1 partition of t for values from (0) to (1000);
> create table tp2 partition of t for values from (1000) to (2000);
> 
> insert into t select i%2000, i%1000, i%300, i from
> generate_series(1,1000000)i;
> 
> create index on t(b);
> create index on t(c);
> 
> analyze t;
> 
> -- by default incremental sort is chosen
> explain analyze select * from t where b = 3 order by c, d limit 10;
>                                                                    QUERY
> PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=143.03..570.89 rows=10 width=16) (actual
> time=375.399..375.402 rows=10 loops=1)
>    ->  Incremental Sort  (cost=143.03..42671.85 rows=994 width=16)
> (actual time=375.397..375.399 rows=10 loops=1)
>          Sort Key: t.c, t.d
>          Presorted Key: t.c
>          Full-sort Groups: 1  Sort Method: top-N heapsort  Average
> Memory: 25kB  Peak Memory: 25kB
>          Pre-sorted Groups: 1  Sort Method: top-N heapsort  Average
> Memory: 25kB  Peak Memory: 25kB
>          ->  Merge Append  (cost=0.85..42644.84 rows=994 width=16)
> (actual time=11.415..375.289 rows=335 loops=1)
>                Sort Key: t.c
>                ->  Index Scan using tp1_c_idx on tp1 t_1
>  (cost=0.42..21318.39 rows=498 width=16) (actual time=6.666..189.356
> rows=168 loops=1)
>                      Filter: (b = 3)
>                      Rows Removed by Filter: 171537
>                ->  Index Scan using tp2_c_idx on tp2 t_2
>  (cost=0.42..21316.50 rows=496 width=16) (actual time=4.745..185.870
> rows=168 loops=1)
>                      Filter: (b = 3)
>                      Rows Removed by Filter: 171534
>  Planning Time: 0.501 ms
>  Execution Time: 375.477 ms
> (16 rows)
> 
> -- disable incremental sort
> set enable_incremental_sort to off;
> SET
> 
> explain analyze select * from t where b = 3 order by c, d limit 10;
>                                                                QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=2577.51..2577.53 rows=10 width=16) (actual
> time=2.928..2.930 rows=10 loops=1)
>    ->  Sort  (cost=2577.51..2579.99 rows=994 width=16) (actual
> time=2.925..2.927 rows=10 loops=1)
>          Sort Key: t.c, t.d
>          Sort Method: top-N heapsort  Memory: 25kB
>          ->  Append  (cost=8.28..2556.03 rows=994 width=16) (actual
> time=0.627..2.670 rows=1000 loops=1)
>                ->  Bitmap Heap Scan on tp1 t_1  (cost=8.28..1276.62
> rows=498 width=16) (actual time=0.625..1.688 rows=500 loops=1)
>                      Recheck Cond: (b = 3)
>                      Heap Blocks: exact=500
>                      ->  Bitmap Index Scan on tp1_b_idx
>  (cost=0.00..8.16 rows=498 width=0) (actual time=0.330..0.330 rows=500
> loops=1)
>                            Index Cond: (b = 3)
>                ->  Bitmap Heap Scan on tp2 t_2  (cost=8.27..1274.43
> rows=496 width=16) (actual time=0.178..0.876 rows=500 loops=1)
>                      Recheck Cond: (b = 3)
>                      Heap Blocks: exact=500
>                      ->  Bitmap Index Scan on tp2_b_idx
>  (cost=0.00..8.14 rows=496 width=0) (actual time=0.093..0.093 rows=500
> loops=1)
>                            Index Cond: (b = 3)
>  Planning Time: 0.481 ms
>  Execution Time: 3.031 ms
> (17 rows)
> 
> As we can see the execution time is 375.477 ms by default and 3.031 ms
> if we disable incremental sort.
> 
> When we calculate the cost of incremental sort, we need to estimate the
> number of groups into which the relation is divided by the presorted
> keys, and then calculate the cost of sorting a single group.  If we
> over-estimate the number of groups, the startup cost of incremental sort
> would be under-estimated.  In the first plan above, the number of groups
> with equal 't.c' is estimated to be 300 by estimate_num_groups(), but is
> actually 3 after applying the restriction 'b = 3'.  As a result, the
> startup cost of the incremental sort is estimated to be 143.03, but is
> actually 14222.68.  That's why the planner mistakenly thinks the
> increment sort path is the cheaper one.
> 

I haven't done any debugging on this, but this seems plausible. In a
way, it seems like a combination of three issues - assumption that
"LIMIT n" has cost proportional to n/N, group by estimate of correlated
columns, and assumption of independence.

> It seems that we need to improve estimate of distinct values in
> estimate_num_groups() when taking the selectivity of restrictions into
> account.
> 
> In 84f9a35e3 we changed to a new formula to perform such estimation.
> But that does not apply to the case here, because for an appendrel,
> set_append_rel_size() always sets "raw tuples" count equal to "rows",
> and that would make estimate_num_groups() skip the adjustment of the
> estimate using the new formula.
> 
> Any thoughts on how to improve this?
> 

Oh! Now I see what you meant by using the new formula in 84f9a35e3
depending on how we sum tuples. I agree that seems like the right thing.

I'm not sure it'll actually help with the issue, though - if I apply the
patch, the plan does not actually change (and the cost changes just a
little bit).

I looked at this only very briefly, but I believe it's due to the
assumption of independence I mentioned earlier - we end up using the new
formula introduced in 84f9a35e3, but it assumes it assumes the
selectivity and number of groups are independent. But that'd not the
case here, because the groups are very clearly correlated (with the
condition on "b").

If that's the case, I'm not sure how to fix this :-(


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "David E. Wheeler"
Дата:
Сообщение: Re: JSON Path and GIN Questions
Следующее
От: Matthias van de Meent
Дата:
Сообщение: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements