BUG #4200: Regexp character classes not UTF8-compliant
От | Jean-Baptiste Quenot |
---|---|
Тема | BUG #4200: Regexp character classes not UTF8-compliant |
Дата | |
Msg-id | 200805261913.m4QJD5gh048059@wwwmaster.postgresql.org обсуждение исходный текст |
Ответы |
Re: BUG #4200: Regexp character classes not
UTF8-compliant
|
Список | pgsql-bugs |
The following bug has been logged online: Bug reference: 4200 Logged by: Jean-Baptiste Quenot Email address: jbq@caraldi.com PostgreSQL version: 8.3.1 Operating system: Linux Ubuntu Hardy Description: Regexp character classes not UTF8-compliant Details: PostgreSQL documentation at http://www.postgresql.org/docs/8.3/static/functions-matching.html describes the various character classes, and they can be used to match or replace strings with regexp support. However, the [:alnum:] and [:alpha:] character classes are not UTF8-compliant, like shown in the examples below: dockee=# show client_encoding; client_encoding ----------------- UTF8 (1 row) dockee=# show lc_ctype; lc_ctype ------------- en_US.UTF-8 (1 row) dockee=# select regexp_replace('bébéà u', '[[:alnum:]]', '', 'g'); regexp_replace ---------------- ééà (1 row) ovhdev=# select regexp_replace('bébéà u', '[[:alpha:]]', '', 'g'); regexp_replace ---------------- ééà (1 row) dockee=# select regexp_replace('bébéà u', $$\w$$, '', 'g'); regexp_replace ---------------- ééà (1 row) Only characters in the ASCII range were correctly detected to belong to the [:alnum:] character class, whereas other characters are valid too.
В списке pgsql-bugs по дате отправления: