Re: UTF8 regexp and char classes still does not work
От | Sergey Burladyan |
---|---|
Тема | Re: UTF8 regexp and char classes still does not work |
Дата | |
Msg-id | 8739sta40g.fsf@home.progtech.ru обсуждение исходный текст |
Ответ на | Re: UTF8 regexp and char classes still does not work (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Tom Lane <tgl@sss.pgh.pa.us> writes: > Hmm, you're right. I only tested that on Latin1 characters, for which > it does work because those have Unicode points below 256. I'm not > sure of a reasonable solution for the general case --- we certainly > don't want this function iterating up to 2^21 or thereabouts. Yes, i understand this problem. How perl do this? May be this Unicode table can be precomputed or linked to postgres binary from external source? > Your test case seems to be using KOI8 encoding, though, which doesn't > have anything to do with UTF8 behavior. It's just for example of expected result. See first test, it is UTF8, two bytes per character: > > --- CYRILLIC SMALL LETTER ZHE ~* CYRILLIC CAPITAL LETTER ZHE > > select E'\320\266' ~* E'\320\226', E'\320\266' ~ '[[:alpha:]]+', 'g' ~ '[[:alpha:]]+'; > > ?column? | ?column? | ?column? > > ----------+----------+---------- > > t | f | t -- Sergey Burladyan
В списке pgsql-hackers по дате отправления: