Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE
От | David Rowley |
---|---|
Тема | Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE |
Дата | |
Msg-id | CAApHDvpD1Vuhd3kSKjV-7Ae_8PMPcPG=kxfJOS+ZWVkutCOXgg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE (David Rowley <dgrowleyml@gmail.com>) |
Ответы |
Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE
|
Список | pgsql-bugs |
On Wed, 19 May 2021 at 12:35, David Rowley <dgrowleyml@gmail.com> wrote: > I'll debug this and see if I can see what's going on. This is down to the fact that when there are 2 items in the "three" table we have a histogram in the stats and we can get the upper bound by just looking at the final histogram bucket. However, when there are 3 items, or more accurately, there's a duplicate, we don't build a histogram and just have a most-common-values list instead. I've not yet checked the logic for what we include in the MCV list but it appears in this case we only store the item that's duplicated, in this case, the value 1. We calculate the end bound of the merge join with that value, which is a bad choice as that makes it appear that the end bound is very close to or the same as the start bound, making the merge join seem very cheap. truncate three; insert into three values(1,1),(2,1),(3,1000000); analyze three; select most_common_vals,histogram_bounds from pg_stats where tablename = 'three' and attname = 'million_id'; -[ RECORD 1 ]----+---- most_common_vals | {1} histogram_bounds | truncate three; insert into three values(1,1),(3,1000000); analyze three; select most_common_vals,histogram_bounds from pg_stats where tablename = 'three' and attname = 'million_id'; -[ RECORD 1 ]----+------------ most_common_vals | histogram_bounds | {1,1000000} I'm not really sure the best way to make this better. It seems like an unfortunate bad case. I'm not sure if it'd be better to try and be more inclusive when building MCV lists. But then, what would the logic be when we don't have enough buckets to store more items. There is some code in get_variable_range() that's #ifdef'd out to get the actual range, when possible. There's a comment explaining why we don't do that. /* * XXX It's very tempting to try to use the actual column min and max, if * we can get them relatively-cheaply with an index probe. However, since * this function is called many times during join planning, that could * have unpleasant effects on planning speed. Need more investigation * before enabling this. */ #ifdef NOT_USED if (get_actual_variable_range(root, vardata, sortop, collation, min, max)) return true; #endif For this case, it might be ok to have done that since we have RestrictInfo.scansel_cache. It does not seem as likely that we'll end up doing get_actual_variable_range too often. However, there are likely other cases that are not as well cached which we could make too slow if we did call get_actual_variable_range. That makes me think the best fix would be to do something better during ANALYZE and maybe try and include some more upper bound MCVs. I'm not yet too sure what drawbacks there might be from doing that. David
В списке pgsql-bugs по дате отправления: