Re: pg_trgm
От | Robert Haas |
---|---|
Тема | Re: pg_trgm |
Дата | |
Msg-id | AANLkTimowXqtPBQl4Qhsj2LlxLhIbvMQUuI4G8cB42Eh@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: pg_trgm (Peter Eisentraut <peter_e@gmx.net>) |
Ответы |
Re: pg_trgm
|
Список | pgsql-hackers |
On Thu, May 27, 2010 at 2:01 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > On fre, 2010-05-28 at 00:46 +0900, Tatsuo Ishii wrote: >> > I don't know about Japanese, but the locale approach works just fine for >> > other agglutinative languages. I would rather suspect that it is the >> > trigram approach that might be rather useless for such languages, >> > because you are going to get a lot of similarity hits for the affixes. >> >> I'm not sure what you mean by "affixes". But I will explain... >> >> A Japanese sentence consists of words. Problem is, each word is not >> separated by space (agglutinative). So most text tools such as text >> search need preprocess which finds word boundaries by looking up >> dictionaries (and smart grammer analysis routine). In the process >> "affixes" can be determined and perhaps removed from the target word >> group to be used for text search (note that removing affixes is no >> relevant to locale). Once we get space separated sentence, it can be >> processed by text search or by pg_trgm just same as Engligh. (Note >> that these preprocessing are done outside PostgreSQL world). The >> difference is just the "word" can be consists of non ASCII letters. > > I think the problem at hand has nothing at all to do with agglutination > or CJK-specific issues. You will get the same problem with other > languages *if* you set a locale that does not adequately support the > characters in use. E.g., Russian with locale C and encoding UTF8: > > select similarity(E'\u0441\u043B\u043E\u043D', E'\u0441\u043B\u043E > \u043D\u044B'); > similarity > ──────────── > NaN > (1 row) What I can't help wondering as I'm reading this discussion is - Tatsuo-san said upthread that he has a problem with pg_trgm that he does not have with full text search. So what is full text search doing differently than pg_trgm? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
В списке pgsql-hackers по дате отправления: