Re: Optimize join selectivity estimation by not reading MCV stats for unique join attributes
От | Tomas Vondra |
---|---|
Тема | Re: Optimize join selectivity estimation by not reading MCV stats for unique join attributes |
Дата | |
Msg-id | 2f30ba9c-b707-55ae-0afd-68eb6a8a7708@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Optimize join selectivity estimation by not reading MCV stats for unique join attributes (David Geier <geidav.pg@gmail.com>) |
Ответы |
Re: Optimize join selectivity estimation by not reading MCV stats for unique join attributes
|
Список | pgsql-hackers |
On 11/14/22 10:19, David Geier wrote: > Hi Tom, >> There won't *be* any MCV stats for a column that ANALYZE perceives to >> be unique, so I'm not quite sure where the claimed savings comes from. > > We save if one join attribute is unique while the other isn't. In that > case stored MCV stats are read for the non-unique attribute but then > never used. This is because MCV stats in join selectivity estimation are > only used if they're present on both columns > Right - if we only have MCV on one side of the join, we currently end up loading the MCV we have only to not use it anyway. The uniqueness is a simple way to detect some of those cases. I'd bet the savings can be quite significant for small joins and/or cases with large MCV. I wonder if we might be yet a bit smarter, though. For example, assume the first attribute is not defined as "unique" but we still don't have a MCV (it may be unique - or close to unique - in practice, or maybe it's just uniform distribution). We end up with have_mcvs1 = false Can't we just skip trying to load the second MCV? So we could do if (have_mcvs1 && HeapTupleIsValid(vardata2.statsTuple)) { ... try loading mcv2 ... } Or perhaps what if we have a function that quickly determines if the attribute has MCV, without loading it? I'd bet the expensive part of get_attstatslot() is the deconstruct_array(). We could have a function that only does the first small loop over slots, and returns true/false if we have a slot of the requested stakind. It might even check the isunique flag first, to make it more convenient. And only if both sides return "true" we'd load the MCV, deconstruct the array and all that. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: