Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
От | Peter Geoghegan |
---|---|
Тема | Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses |
Дата | |
Msg-id | CAM3SWZTWVC2Md8KFi461dju=NNjVxm8QB3YfkDCR5t84HHbqNg@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Generalizing SortSupport for text to work with char(n), bytea,
and alternative opclasses
Re: Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses |
Список | pgsql-hackers |
We lack SortSupport for many character-like-type cases. In full, the cases within the core system are: * char(n) opfamily (bpchar_ops). * text_pattern_ops opfamily (includes text and varchar "pattern" opclasses, which are generally recommended for accelerating LIKE operator queries). * bpchar_pattern_ops -- the operator family/class for char(n), used where "pattern" style indexing is required for the char(n) type. * bytea default opclass. This is a type that, like the others, shares its representation with text (a varlena header and some data bytes -- a string, essentially). Its comparator already behaves identically to that of the text comparator when the "C" collation is used. I've actually seen a specific request for this [1]. These cases do matter. For one thing, even if they're less important than the default text/varchar opclasses, having such large inconsistencies in character type sort performance is a fairly major POLA violation; opclasses like text_pattern_ops are *supposed* to be faster though less capable alternatives to the defaults. For another, char(n) is in the SQL standard, which may be why all TPC benchmarks use char(n) for columns that are sorted on or used for grouping. char(n) sorting can be made about 8x faster with SortSupport/abbreviation, making it the best candidate for abbreviation optimization that I've ever seen. It would be regrettable if we accidentally lost a benchmark due to not having char(n) SortSupport. Attached patch adds SortSupport for all of the cases listed above. I did not bother doing anything with contrib/citext, because I think it's not worth it. I think that we should definitely invest in case insensitive collations, and retire contrib/citext. The fact that the *default* collation (and not the input collation) is passed by citextcmp()'s call to str_tolower() suggests to me that the only way to make citext do the right thing is to basically introduce case insensitive collations to Postgres. Of course, doing so would make contrib/citext obsolete. I also didn't bother extending the name type's SortSupport (something that has existed since 9.2) to perform abbreviation, although that wouldn't be very hard. I saw no point. I think I might also get around to adding abbreviated key support for the network address types during the 9.6 cycle. That seems like something a less experienced contributor could easily pick up, though -- if anyone wants to take it off my hands, please do so. [1] https://lwn.net/Articles/653721/ -- Peter Geoghegan
Вложения
В списке pgsql-hackers по дате отправления: