Re: BUG #18523: String compare not consistent
От | Peter Eisentraut |
---|---|
Тема | Re: BUG #18523: String compare not consistent |
Дата | |
Msg-id | 30c9e733-9573-4500-b191-5a8a67b3795f@eisentraut.org обсуждение исходный текст |
Ответ на | Re: BUG #18523: String compare not consistent (Laurenz Albe <laurenz.albe@cybertec.at>) |
Список | pgsql-bugs |
On 26.06.24 17:59, Laurenz Albe wrote: > Interestingly, glibc and ICU disagree about that: > > SELECT '|1' < '01' COLLATE "de_DE.utf8" AS glibc, > '|1' < '01' COLLATE "de-AT-x-icu" AS icu; > > glibc │ icu > ═══════╪═════ > f │ t > (1 row) Let's work this out. The collation weights of the characters involved are: 0030 ; [.209F.0020.0002] # DIGIT ZERO 0031 ; [.20A0.0020.0002] # DIGIT ONE 007C ; [*06AF.0020.0002] # VERTICAL LINE So under the ICU default "noignore" behavior, the sort keys are: '|1' => 06AF 20A0 0000 0020 0020 0000 0002 0002 '01' => 209F 20A0 0000 0020 0020 0000 0002 0002 Therefore, '|1' < '01'. Under "shifted", the effective collation weights are: 0030 ; [.209F.0020.0002.FFFF] # DIGIT ZERO 0031 ; [.20A0.0020.0002.FFFF] # DIGIT ONE 007C ; [.0000.0000.0000.06AF] # VERTICAL LINE and the sort keys are: '|1' => 20A0 0000 0020 0000 0002 0000 06AF FFFF '01' => 209F 20A0 0000 0020 0020 0000 0002 0002 0000 FFFF FFFF Therefore, '|1' > '01'. That is what you get in glibc. (Older glibc use "shift-trimmed" behavior, which trims the trailing FFFF's, which doesn't affect the result in this case.) Under ICU (or rather CLDR), the '|' character is not actually a variable collation element by default, so you actually get the same result if you ask for "noignore" or "shifted" behavior. But you can make symbols variable with 'und-u-ka-shifted-kv-symbol', and then you will also get the same result as glibc. Further reading: - https://peter.eisentraut.org/blog/2023/04/12/how-collation-of-punctuation-and-whitespace-works - https://peter.eisentraut.org/blog/2023/05/16/overview-of-icu-collation-settings
В списке pgsql-bugs по дате отправления: