Re: Selectivity estimation for intarray with @@
От | Heikki Linnakangas |
---|---|
Тема | Re: Selectivity estimation for intarray with @@ |
Дата | |
Msg-id | 55AE6A03.9060400@iki.fi обсуждение исходный текст |
Ответ на | Re: Selectivity estimation for intarray with @@ (Alexander Korotkov <a.korotkov@postgrespro.ru>) |
Ответы |
Re: Selectivity estimation for intarray with @@
|
Список | pgsql-hackers |
On 07/21/2015 03:44 PM, Alexander Korotkov wrote: > While Uriy is on vacation, I've revised this patch a bit. I whacked this around quite a bit, and I think it's in a committable state now. But if you could run whatever tests you were using before on this, to make sure it still produces the same estimates, that would be great. I didn't change the estimates it should produce, only the code structure. One thing that bothers me slightly with this patch is the way it peeks into the Most-Common-Elements arrays, which is produced by the built-in type analyze function. If we ever change what statistics are collected for arrays, or the way they are stored, this might break. In matchsel, why don't we just call the built-in estimator function for each element that we need to probe, and not look into the statistics ourselves at all? I actually experimented with that, and it did slash much of the code, and it would be more future-proof. However, it was also a lot slower for queries that contain multiple values. That's understandable: the built-in estimator will fetch the statistics tuple, parse the arrays, etc. separately for each value in the query_int, while this patch will do it only once for the whole query, and perform a simple binary search for each value. So overall, I think this is OK as it is. But if we find that we need to use the MCE list in this fashion in more places in the future, it might be worthwhile to add some support code for this in the backend to allow extracting the stats once, and doing multiple "lightweight estimations" using the extracted stats. Some things I fixed/changed: * I didn't like that transformOperator() function, which looked up the function's name. I replaced it with separate wrapper functions for each operator, so that the built-in operator's OID can be hardcoded into each. * I refactored the matchsel function heavily. I think it's more readable now. * I got rid of the Int4Freq array. It didn't seem significantly easier to work with than the separate values/numbers arrays, so I just used those directly. * Also use the matchsel estimator for ~~ (the commutator of @@) * Also use the estimators for the obsolete @ and ~ operators. Not that I care much about those as they are obsolete, but seems strange not to, as it's a trivial matter of setting the right estimator function. * I added an ANALYZE in the regression test. It still won't systematically test all the cost estimation code, and there's nothing to check that the estimates make sense, but at least more of the code will now run. - Heikki
Вложения
В списке pgsql-hackers по дате отправления: