Re: BUG #15160: planner overestimates number of rows in join whenthere are more than 200 rows coming from CTE
От | Maxim Boguk |
---|---|
Тема | Re: BUG #15160: planner overestimates number of rows in join whenthere are more than 200 rows coming from CTE |
Дата | |
Msg-id | CAK-MWwR_G0NLLB9TwRhs+3cHDa9zZnYg7AgpAfeYJn_hgpZBsQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE
|
Список | pgsql-bugs |
On Tue, Apr 17, 2018 at 5:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> I'm wondering how planner estimates number of rows in that case:
See eqjoinsel_semi, particularly the change in behavior when it thinks
nd2 is or is not a default estimate.
Given the lack of statistics about the output of the WITH clause,
it's hard to see how we'd ever get trustworthy estimates here.
I think the fact that your first example yields an accurate
estimate is mostly luck.
regards, tom lane
There are similar issue without CTE which look pretty weird:
Good case with LIMIT 199 and adequate estimation:
hh=# explain SELECT * FROM resume WHERE resume_id IN (select id from generate_series(1, 1000) gs(id) LIMIT 199);
QUERY PLAN
---------------------------------------------------------------------------------------------------
Nested Loop (cost=21.53..108.98 rows=199 width=519)
-> Unique (cost=21.42..21.62 rows=199 width=4)
-> Sort (cost=21.42..21.52 rows=199 width=4)
Sort Key: gs.id
-> Limit (cost=0.00..9.95 rows=199 width=4)
-> Function Scan on generate_series gs (cost=0.00..50.00 rows=1000 width=4)
-> Index Scan using resume_pk on resume (cost=0.11..0.39 rows=1 width=519)
Index Cond: (resume_id = gs.id)
QUERY PLAN
---------------------------------------------------------------------------------------------------
Nested Loop (cost=21.53..108.98 rows=199 width=519)
-> Unique (cost=21.42..21.62 rows=199 width=4)
-> Sort (cost=21.42..21.52 rows=199 width=4)
Sort Key: gs.id
-> Limit (cost=0.00..9.95 rows=199 width=4)
-> Function Scan on generate_series gs (cost=0.00..50.00 rows=1000 width=4)
-> Index Scan using resume_pk on resume (cost=0.11..0.39 rows=1 width=519)
Index Cond: (resume_id = gs.id)
Very bad case with awful estimation (only difference LIMIT 200 vs LIMIT 199):
explain SELECT * FROM resume WHERE resume_id IN (select id from generate_series(1, 1000) gs(id) LIMIT 200);
QUERY PLAN
---------------------------------------------------------------------------------------------------
Nested Loop (cost=21.64..109.53 rows=45860504 width=519)
-> Unique (cost=21.53..21.73 rows=200 width=4)
-> Sort (cost=21.53..21.63 rows=200 width=4)
Sort Key: gs.id
-> Limit (cost=0.00..10.00 rows=200 width=4)
-> Function Scan on generate_series gs (cost=0.00..50.00 rows=1000 width=4)
-> Index Scan using resume_pk on resume (cost=0.11..0.39 rows=1 width=519)
Index Cond: (resume_id = gs.id)
QUERY PLAN
---------------------------------------------------------------------------------------------------
Nested Loop (cost=21.64..109.53 rows=45860504 width=519)
-> Unique (cost=21.53..21.73 rows=200 width=4)
-> Sort (cost=21.53..21.63 rows=200 width=4)
Sort Key: gs.id
-> Limit (cost=0.00..10.00 rows=200 width=4)
-> Function Scan on generate_series gs (cost=0.00..50.00 rows=1000 width=4)
-> Index Scan using resume_pk on resume (cost=0.11..0.39 rows=1 width=519)
Index Cond: (resume_id = gs.id)
It's not a problem by itself but once you start using this query with more joined tables - a lot bad things happens because 5 orders of magnitude error in selectivity estimation.
PS: in reality it forces us to use not more than 199 LIMIT in complex joins for batch operations or the database start generate funny plans.
Regards,
Maxim
--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
Senior Postgresql DBA
https://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
В списке pgsql-bugs по дате отправления: