Re: Inconsistent results with libc sorting on Windows
От | Peter Geoghegan |
---|---|
Тема | Re: Inconsistent results with libc sorting on Windows |
Дата | |
Msg-id | CAH2-WzmtLiQn_4Q5W4GKR8QUiW2MqLEYQtL5EwNDbbPtsSNvsA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Inconsistent results with libc sorting on Windows (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: Inconsistent results with libc sorting on Windows
|
Список | pgsql-hackers |
On Tue, Jun 13, 2023 at 5:59 PM Thomas Munro <thomas.munro@gmail.com> wrote: > Trying to follow along here... you're doing the moral equivalent of > strxfrm(), so sort keys have the transitive property but direct string > comparisons don't? Or is this because LCIDs reach a different > algorithm somehow (or otherwise why do you need to use LCIDs for this, > when there is a non-LCID version of that function, with a warning not > to use the older LCID version[1]?) I'm reminded of the fact that the abbreviated keys strxfrm() debacle (back when 9.5 was released) was caused by a bug in strcoll() -- not a bug in strxfrm() itself. From our point of view the problem was that strxfrm() failed to be bug compatible with strcoll() due to a buggy strcoll() optimization. I believe that strxfrm() is generally less likely to have bugs than strcoll(). There are far fewer opportunities to dodge unnecessary work in the case of strxfrm()-like algorithms (offering something like ICU's pg_strnxfrm_prefix_icu() prefix optimization is the only one). On the other hand, collation library implementers are likely to heavily optimize strcoll() for typical use-cases such as sorting and binary search. Using strxfrm() for everything is discouraged [1]. There is an important difference between this issue and the various glibc collation related bugs that I've come across, though: to the best of my knowledge there was never a glibc bug that caused strcoll() to violate transitive consistency -- it always agreed with itself. So this is a new one on me. Seems like that might make "always use strxfrm()" (or whatever it's actually called on this platform) acceptable; strxfrm() can't really violate transitive consistency in the same way. (I think -- I'm assuming that it'll always produce a conditioned binary string in a deterministic fashion, since AFAICT even this buggy strcoll()-like function won't ever give an inconsistent answer when it compares the same two strings.) [1] https://unicode-org.github.io/icu/userguide/collation/concepts#sortkeys-vs-comparison -- Peter Geoghegan
В списке pgsql-hackers по дате отправления: