Does UCS_BASIC have the right CTYPE?
От | Jeff Davis |
---|---|
Тема | Does UCS_BASIC have the right CTYPE? |
Дата | |
Msg-id | 20d61f835afe7de89df0b038aa7fe799c53cf634.camel@j-davis.com обсуждение исходный текст |
Ответы |
Re: Does UCS_BASIC have the right CTYPE?
|
Список | pgsql-hackers |
UCS_BASIC is defined in the standard as a collation based on comparing the code point values, and in UTF8 that is satisfied with memcmp(), so the collation locale for UCS_BASIC in Postgres is simply "C". But what should the result of UPPER('á' COLLATE UCS_BASIC) be? In Postgres, the answer is 'á', but intuitively, one could reasonably expect the answer to be 'Á'. Reading the standard, it seems that LOWER()/UPPER() are defined in terms of the Unicode General Category (Section 4.2, "<fold> is a pair of functions..."). It is somewhat ambiguous about the case mappings, but I could guess that it means the Default Case Algorithm[1]. That seems to suggest the standard answer should be 'Á' regardless of any COLLATE clause (though I could be misreading). I'm a bit confused by that... what's the standard-compatible way to specify the locale for UPPER()/LOWER()? If there is none, then it makes sense that Postgres overloads the COLLATE clause for that purpose so that users can use a different locale if they want. But given that UCS_BASIC is a collation specified in the standard, shouldn't it have ctype behavior that's as close to the standard as possible? Regards, Jeff Davis [1] https://www.unicode.org/versions/Unicode15.1.0/ch03.pdf#G33992
В списке pgsql-hackers по дате отправления: