Re: to_tsvector in 8.2.3
От | Teodor Sigaev |
---|---|
Тема | Re: to_tsvector in 8.2.3 |
Дата | |
Msg-id | 460175E3.40601@sigaev.ru обсуждение исходный текст |
Ответ на | Re: to_tsvector in 8.2.3 (Magnus Hagander <magnus@hagander.net>) |
Ответы |
Re: to_tsvector in 8.2.3
|
Список | pgsql-general |
> postgres=# select to_tsvector('test text'); > to_tsvector > --------------- > 'test text':1 > (1 row) Ok. that's related to http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/tsearch2/wordparser/parser.c.diff?r1=1.11;r2=1.12;f=h commit. Thomas pointed that it can be non-breakable space (0xa0) and that commit assumes any character with C locale and multibyte encoding and > 0x7f is alpha. To check theory, pls, apply attached patch. If so, I'm confused, we can not assume that 0xa0 is a space symbol in any multibyte encoding, even in Windows. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/ *** ./contrib/tsearch2/wordparser/parser.c.orig Wed Mar 21 20:41:23 2007 --- ./contrib/tsearch2/wordparser/parser.c Wed Mar 21 21:10:39 2007 *************** *** 124,130 **** --- 124,134 ---- * with C-locale is an alpha character */ if ( c > 0x7f ) + { + if ( c == 0xa0 ) + return 0; return 1; + } return isalnum(0xff & c); } *************** *** 157,163 **** --- 161,171 ---- * with C-locale is an alpha character */ if ( c > 0x7f ) + { + if ( c == 0xa0 ) + return 0; return 1; + } return isalpha(0xff & c); }
В списке pgsql-general по дате отправления: