BUG #7999: Regexp with utf8
От | somloieater@gmail.com |
---|---|
Тема | BUG #7999: Regexp with utf8 |
Дата | |
Msg-id | E1UKnf7-0005Sa-L4@wrigleys.postgresql.org обсуждение исходный текст |
Ответы |
Re: BUG #7999: Regexp with utf8
|
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 7999 Logged by: david Email address: somloieater@gmail.com PostgreSQL version: 9.1.8 Operating system: linux Description: = \y and \Y do not behave correctly next to multibyte utf-8 characters - they seem to invert their senses=CB=90 Propper behaivour with ascii e 'es'~$$\y[e=C9=9B]s$$ =3D> t = Inverted behaviour with epsilon '=C9=9Bs'~$$\y[e=C9=9B]s$$ =3D> f '=C9=9Bs'~$$[e=C9=9B]\ys$$ =3D> t '=C9=9Bs'~$$[e=C9=9B]\Ys$$ =3D> f This seems to be a case of utf8 characters not being recognised as word-forming: '=C9=9B'~$$\w'$$ =3D> f I've checked with a few other characters which are >1byte in utf8. U+00F0 counds as \w, but nothing I've tried > FF matches. I wonder if it's something to do with >256? = In case anyone else hits this bug, replacing \y with (^|$|\s|[[:punct:]]) seems to work for me, although it's ugly.
В списке pgsql-bugs по дате отправления: