Re: bytea_ops
От | Tom Lane |
---|---|
Тема | Re: bytea_ops |
Дата | |
Msg-id | 21190.997637483@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: bytea_ops ("Joe Conway" <joseph.conway@home.com>) |
Список | pgsql-patches |
"Joe Conway" <joseph.conway@home.com> writes: >> The biggie is that you missed adding support for bytea to scalarltsel, >> which puts a severe crimp on the optimizer's ability to make any >> intelligent decisions about using your index. > Hopefully done correctly ;-) I'm inclined to think that convert_bytea_to_scalar should just always assume that the appropriate base is 256, ie, all possible byte values can appear in the data. The logic that convert_string_to_scalar uses to guess at a suitable base depends strongly on the assumption that ASCII text is much more likely than any other kind of data. That doesn't seem like the right assumption to make for bytea, I'd think. You've removed the more blatant ASCII dependencies from the code, but I still wonder whether it makes any sense to assume that the byte values seen in the given strings should be used as predictors of the overall distribution of byte values in the column. On the other hand, the reason that convert_string_to_scalar makes all those difficult-to-justify assumptions is that using a large base leads to overly optimistic selectivity estimates. (Example: suppose the histogram bounds are 'aa' and 'zz', and we are trying to estimate the selectivity of the range 'bb' to 'cc'. If we assume the data range is 'a'..'z' then we get scalar equivalents of aa = 0, zz = 0.9985, bb = 0.03994, cc = 0.079881 leading to a selectivity estimate of 0.03994. If we use a data range of 0..255 then we get aa = 0.380386, bb = 0.384307, cc = 0.388229, zz = 0.478424 leading to selectivity = 0.00392, more than a factor of 10 smaller.) Depending on how you are using bytea, 0..255 might be too large for its data range too. Thoughts? BTW, I think that convert_bytea_datum is probably unnecessary, and definitely it's a waste of cycles to palloc a copy of the input values. convert_string_datum exists to (a) unify the representations of the different datatypes that we consider strings, and (b) apply strxfrm if necessary. Neither of those motivations will ever apply to bytea AFAICS. So you could just as easily pass the given Datums directly to convert_bytea_to_scalar and let it work directly on them. regards, tom lane
В списке pgsql-patches по дате отправления: