Re: Patch: add conversion from pg_wchar to multibyte
| От | Robert Haas |
|---|---|
| Тема | Re: Patch: add conversion from pg_wchar to multibyte |
| Дата | |
| Msg-id | CA+TgmoYgS-EC4cV5rFw1ebD=uPJYn_vUdz7+XU-N0KXBgqXEYw@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Patch: add conversion from pg_wchar to multibyte (Alexander Korotkov <aekorotkov@gmail.com>) |
| Ответы |
Re: Patch: add conversion from pg_wchar to multibyte
|
| Список | pgsql-hackers |
On Tue, May 1, 2012 at 6:02 PM, Alexander Korotkov <aekorotkov@gmail.com> wrote: > Right. When number of trigrams is big, it is slow to scan posting list of > all of them. The solution is this case is to exclude most frequent trigrams > from index scan. But, it require some kind of statistics of trigrams > frequencies which we don't have. We could estimate frequencies using some > hard-coded assumptions about natural languages. Or we could exclude > arbitrary trigrams. But I don't like both these ideas. This problem is also > relevant for LIKE/ILIKE search using trigram indexes. I was thinking you could perhaps do it just based on the *number* of trigrams, not necessarily their frequency. > Probably you have some comments on idea of conversion from pg_wchar to > multibyte? Is it acceptable at all? Well, I'm not an expert on encodings, but it seems like a logical extension of what we're doing right now, so I don't really see why not. I'm confused by the diff hunks in pg_mule2wchar_with_len, though. Presumably either the old code is right (in which case, don't change it) or the new code is right (in which case, there's a bug fix needed here that ought to be discussed and committed separately from the rest of the patch). Maybe I am missing something. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: