Re: NOT LIKE much faster than LIKE?
От | Tom Lane |
---|---|
Тема | Re: NOT LIKE much faster than LIKE? |
Дата | |
Msg-id | 6398.1136931685@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: NOT LIKE much faster than LIKE? (Simon Riggs <simon@2ndquadrant.com>) |
Ответы |
Re: NOT LIKE much faster than LIKE?
Re: NOT LIKE much faster than LIKE? |
Список | pgsql-performance |
Simon Riggs <simon@2ndquadrant.com> writes: > I think its OK to use the MCV, but I have a problem with the current > heuristics: they only work for randomly generated strings, since the > selectivity goes down geometrically with length. We could certainly use a less aggressive curve for that. You got a specific proposal? >> After finishing that work it occurred to me that we could go a step >> further: if the MCV list accounts for a substantial fraction of the >> population, we could assume that the MCV list is representative of the >> whole population, and extrapolate the pattern's selectivity over the MCV >> list to the whole population instead of using the existing heuristics at >> all. In a situation like Andreas' example this would win big, although >> you can certainly imagine cases where it would lose too. > I don't think that can be inferred with any confidence, unless a large > proportion of the MCV list were itself selected. Otherwise it might > match only a single MCV that just happens to have a high proportion, > then we assume all others have the same proportion. Well, of course it can't be inferred "with confidence". Sometimes you'll win and sometimes you'll lose. The question is, is this a better heuristic than what we use otherwise? The current estimate for non-anchored patterns is really pretty crummy, and even with a less aggressive length-vs-selectivity curve it's not going to be great. Another possibility is to merge the two estimates somehow. > I would favour the idea of dynamic sampling using a block sampling > approach; that was a natural extension of improving ANALYZE also. One thing at a time please. Obtaining better statistics is one issue, but the one at hand here is what to do given particular statistics. regards, tom lane
В списке pgsql-performance по дате отправления: