Re: Patch: add conversion from pg_wchar to multibyte

Поиск

Список

Период

Сортировка

От	Alexander Korotkov
Тема	Re: Patch: add conversion from pg_wchar to multibyte
Дата	1 мая 2012 г. 19:03:01
Msg-id	CAPpHfdsfg7vcanUBRPJBzPJ5jETVw2sH5LBwpeac=R_C74QTag@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Patch: add conversion from pg_wchar to multibyte (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: Patch: add conversion from pg_wchar to multibyte
Список	pgsql-hackers

Дерево обсуждения

<div class="gmail_quote">On Mon, Apr 30, 2012 at 10:07 PM, Robert Haas <span dir="ltr"><<a
href="mailto:robertmhaas@gmail.com"target="_blank">robertmhaas@gmail.com</a>></span> wrote:<br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Sun, Apr
29,2012 at 8:12 AM, Erik Rijkers <<a href="mailto:er@xs4all.nl">er@xs4all.nl</a>> wrote:<br /> > Perhaps I'm
tooearly with these tests, but FWIW I reran my earlier test program against three<br /> > instances.  (the patches
compiledfine, and make check was without problem).<br /><br /></div>These tests results seem to be more about the
pg_trgmchanges than the<br /> patch actually on this thread, unless I'm missing something.  But the<br /> executive
summaryseems to be that pg_trgm might need to be a bit<br /> smarter about costing the trigram-based search, because
whenthe<br /> number of trigrams is really big, using the index is<br /> counterproductive.  Hopefully that's not too
hardto fix; the basic<br /> approach seems quite promising.</blockquote><div class="gmail_quote"><br /></div><div
class="gmail_quote">Right.When number of trigrams is big, it is slow to scan posting list of all of them. The solution
isthis case is to exclude most frequent trigrams from index scan. But, it require some kind of statistics of trigrams
frequencieswhich we don't have. We could estimate frequencies using some hard-coded assumptions about natural
languages.Or we could exclude arbitrary trigrams. But I don't like both these ideas. This problem is also relevant for
LIKE/ILIKEsearch using trigram indexes.</div><div class="gmail_quote"><br /></div><div class="gmail_quote">Something
similarcould occur in tsearch when we search for "frequent_term & rare_term". In some situations (depending on
termsfrequencies) it's better to exclude  frequent_term from index scan and do recheck. We have relevant statistics to
dosuch decision, but it doesn't seem to be feasible to get it using current GIN interface.</div><div
class="gmail_quote"><br/></div><div class="gmail_quote">Probably you have some comments on idea of conversion from
pg_wcharto multibyte? Is it acceptable at all?</div><br />------<br />With best regards,<br />Alexander Korotkov.</div>

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Patch: add conversion from pg_wchar to multibyte