Re: Extending range of to_tsvector et al
От | Dan Scott |
---|---|
Тема | Re: Extending range of to_tsvector et al |
Дата | |
Msg-id | CAAY5AM3pFkc=HNHbpGn_xf3wE+tcRaa4jwyG7mRC89z6mxsxZQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Extending range of to_tsvector et al (johnkn63 <john.knightley@gmail.com>) |
Ответы |
Re: Extending range of to_tsvector et al
|
Список | pgsql-hackers |
On Sun, Sep 30, 2012 at 1:56 PM, johnkn63 <john.knightley@gmail.com> wrote: > When using to_tsvector a number of newer unicode characters and pua > characters are not included. How do I add the characters which I desire to > be found? I've just started digging into this code a bit, but from what I've found src/backend/tsearch/wparser_def.c defines much of the parser functionality, and in the area of Unicode includes a number of comments like: * with multibyte encoding and C-locale isw* function may fail or give wrong result. * multibyte encoding and C-locale often are used for Asian languages. * any non-ascii symbol with multibyte encoding with C-locale is an alpha character ... in concert with ifdefs around WIDE_UPPER_LOWER (in effect if WCSTOMBS and TOWLOWER are available) to complicate testing scenarios :) Also note that src/test/regress/sql/tsearch.sql and regress/sql/tsdicts.sql currently focus on English, ASCII-only data. Perhaps this is a good opportunity for you to describe what your environment looks like (OS, PostgreSQL version, encoding and locale settings for the database) and show some sample to_tsquery() @@ to_tsvector() queries that don't behave the way you think they should behave - and we could start building some test cases as a first step? -- Dan Scott Laurentian University
В списке pgsql-hackers по дате отправления: