Re: collate not support Unicode Variation Selector
От | Thomas Munro |
---|---|
Тема | Re: collate not support Unicode Variation Selector |
Дата | |
Msg-id | CA+hUKGLnJUososSwJLycKfA0TXRsciKxPJfqVED=aOMYE1knOw@mail.gmail.com обсуждение исходный текст |
Ответ на | RE: collate not support Unicode Variation Selector (荒井元成 <n2029@ndensan.co.jp>) |
Ответы |
Re: collate not support Unicode Variation Selector
|
Список | pgsql-hackers |
On Wed, Aug 3, 2022 at 12:09 PM 荒井元成 <n2029@ndensan.co.jp> wrote: > D209007=# create table ivstest ( moji text collate "ja-x-icu" CONSTRAINT firstkey PRIMARY KEY ); > D209007=# insert into ivstest (moji) values ( U&'\+003436' || U&'\+0E0101' || U&'\+00304D'); > D209007=# insert into ivstest (moji) values ( U&'\+003436' || U&'\+00304D'); > D209007=# select moji from ivstest where moji like '%' || U&'\+003436' || '%'; > ------------- > 㐶󠄁き > 㐶き > (2 行) > > expected > ------------- > 㐶き > (1 行) So you want to match only strings that contain U&'\+003436' *not* followed by a variation selector (as we also discussed at [1]). I'm pretty sure that everything in PostgreSQL considers variation selectors to be separate characters. Perhaps it is possible to write a regular expression covering the variation selector ranges, something like '\U00003436[^\U000E0100-\U000E010EF]'? Here's an example using Latin characters that are easier for me, but show approximately the same thing, since variation selectors are a bit like "combining" characters: postgres=# create table t (x text); CREATE TABLE postgres=# insert into t values ('e'), ('ef'), ('e' || U&'\0301'); INSERT 0 3 postgres=# select * from t; x ---- e ef é (3 rows) postgres=# select * from t where x ~ 'e([^\u0300-\u036f]|$)'; x ---- e ef (2 rows) [1] https://www.postgresql.org/message-id/flat/013f01d873bb%24ff5f64b0%24fe1e2e10%24%40ndensan.co.jp
В списке pgsql-hackers по дате отправления: