Re: Problem in 'ORDER BY' of a column using a created collation?

Поиск

Список

Период

Сортировка

От	Nishant Sharma
Тема	Re: Problem in 'ORDER BY' of a column using a created collation?
Дата	1 октября 13:02:33
Msg-id	CADrsxdbBn9NShy41AXUi2NUWUV48arffMcQbpzy8c-=e0Q1HSw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Problem in 'ORDER BY' of a column using a created collation? (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: Problem in 'ORDER BY' of a column using a created collation?
Список	pgsql-hackers

Дерево обсуждения

On Fri, Sep 26, 2025 at 6:46 PM Robert Haas <robertmhaas@gmail.com> wrote:

Generally, there are two methods for performing collation-aware string
comparisons, and they are expected to produce equivalent results. For
libc collations, the first method is to use strcoll() or strcoll_l()
or similar to directly compared the strings, and the other is to use
strxfrm() or similar to convert the string to a normalize
representation which can then be compared using memcmp(). Equivalent
facilities exist for ICU; see collate_methods_icu and
collate_methods_icu_utf8 in pg_local_icu.c; one or the other of those
should be getting used here.

Even though these two methods are expected to produce equivalent
results, if there's a bug, they might not. That bug could either exist
in our code or it could exist in ICU; I suspect the latter is more
likely, but the former is certainly possible. What I think you want to
do is try to track down two specific strings where transforming them
via strnxfrm_icu() produces results that compare in one order using
memcmp(), but passing the same strings to strncoll_icu() or
strncoll_icu_utf8() -- whichever is appropriate -- produces a
different result. If you're able to find such strings, then you can
probably rephrase those examples in terms of whatever underlying
functions strncoll_icu() and strxfrm_icu() are calling and we can
maybe report the problem to the ICU maintainers. If no such strings
seem to exist, then there's probably a problem in the PostgreSQL code
someplace, and you should try to figure out why we're getting the
wrong answer despite ICU apparently doing the right thing.

--
Robert Haas
EDB: http://www.enterprisedb.com

Thanks Robert for your reply!

I was able to create an independent C src file. With no relation to PG in it.

Did a simple comparison of the two methods mentioned by Robert for '1'

& 'a' as input strings. The ASCENDING SORT ORDER of them is printed.

Here is how I compiled the C src code:-

"gcc icu_issue_repro.c -L/usr/lib -licui18n -licuuc -licudata -o

icu_issue_repro.o"

Here is its output:-

------------------------------------------------------------------------------------------

Testing sort order for '1' & 'a' using ICU library with collation = 'ja-u-kr-latn-digit'

With Method ucol_strcollUTF8():
SORT ORDER ASC : '1', 'a'

With Method ucol_nextSortKeyPart() (i.e transform and memcmp):
SORT ORDER ASC : 'a', '1'

Testing Ends

------------------------------------------------------------------------------------------

We can clearly see that both methods are showing different sort order

for the same input strings.

So, I think this helps confirm that the problem lies in the ICU library

method "ucol_strcollUTF8()".

Robert, if you agree with my shared details, can you please help me

with details on how I can share this issue to ICU maintainers for the

correction?

Thank you!

PFA, the C src code.

Regards,

Nishant Sharma,

EDB, Pune.

Вложения

icu_issue_repro.c

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Problem in 'ORDER BY' of a column using a created collation?

Вложения