Re: Problem in 'ORDER BY' of a column using a created collation?
От | Robert Haas |
---|---|
Тема | Re: Problem in 'ORDER BY' of a column using a created collation? |
Дата | |
Msg-id | CA+TgmobWgZC-JkS4OKbxQsjuBh7mpKMPw5kxtcHJgN-7+vGgCg@mail.gmail.com обсуждение исходный текст |
Ответ на | Problem in 'ORDER BY' of a column using a created collation? (Nishant Sharma <nishant.sharma@enterprisedb.com>) |
Ответы |
Re: Problem in 'ORDER BY' of a column using a created collation?
|
Список | pgsql-hackers |
On Mon, Aug 25, 2025 at 3:52 AM Nishant Sharma <nishant.sharma@enterprisedb.com> wrote: > We did debug 'Experiment 1' and we find that:- > Whatever is the Column 1 in 'ORDER BY' gets correctly ordered, > because it uses abbreviated sort optimization due to which its data > datum gets converted to abbreviated datum using > "varstr_abbrev_convert()" function, and then the comparator > function selected is > "ssup->comparator = ssup_datum_unsigned_cmp()" > for sorting operation. But in case of column 2 in 'ORDER BY' (which > is showing incorrect result for 'Experiment 1') does not use > abbreviated sort optimization and here comparator function selected > is "ssup->comparator = varlenafastcmp_locale" --> > "strncoll_icu_utf8()", which appears, uses the third party ICU > library function for comparison and does not work as expected. Generally, there are two methods for performing collation-aware string comparisons, and they are expected to produce equivalent results. For libc collations, the first method is to use strcoll() or strcoll_l() or similar to directly compared the strings, and the other is to use strxfrm() or similar to convert the string to a normalize representation which can then be compared using memcmp(). Equivalent facilities exist for ICU; see collate_methods_icu and collate_methods_icu_utf8 in pg_local_icu.c; one or the other of those should be getting used here. Even though these two methods are expected to produce equivalent results, if there's a bug, they might not. That bug could either exist in our code or it could exist in ICU; I suspect the latter is more likely, but the former is certainly possible. What I think you want to do is try to track down two specific strings where transforming them via strnxfrm_icu() produces results that compare in one order using memcmp(), but passing the same strings to strncoll_icu() or strncoll_icu_utf8() -- whichever is appropriate -- produces a different result. If you're able to find such strings, then you can probably rephrase those examples in terms of whatever underlying functions strncoll_icu() and strxfrm_icu() are calling and we can maybe report the problem to the ICU maintainers. If no such strings seem to exist, then there's probably a problem in the PostgreSQL code someplace, and you should try to figure out why we're getting the wrong answer despite ICU apparently doing the right thing. -- Robert Haas EDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: