Re: Patch: add conversion from pg_wchar to multibyte
От | Alexander Korotkov |
---|---|
Тема | Re: Patch: add conversion from pg_wchar to multibyte |
Дата | |
Msg-id | CAPpHfdsfg7vcanUBRPJBzPJ5jETVw2sH5LBwpeac=R_C74QTag@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Patch: add conversion from pg_wchar to multibyte (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Patch: add conversion from pg_wchar to multibyte
|
Список | pgsql-hackers |
<div class="gmail_quote">On Mon, Apr 30, 2012 at 10:07 PM, Robert Haas <span dir="ltr"><<a href="mailto:robertmhaas@gmail.com"target="_blank">robertmhaas@gmail.com</a>></span> wrote:<br /><blockquote class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Sun, Apr 29,2012 at 8:12 AM, Erik Rijkers <<a href="mailto:er@xs4all.nl">er@xs4all.nl</a>> wrote:<br /> > Perhaps I'm tooearly with these tests, but FWIW I reran my earlier test program against three<br /> > instances. (the patches compiledfine, and make check was without problem).<br /><br /></div>These tests results seem to be more about the pg_trgmchanges than the<br /> patch actually on this thread, unless I'm missing something. But the<br /> executive summaryseems to be that pg_trgm might need to be a bit<br /> smarter about costing the trigram-based search, because whenthe<br /> number of trigrams is really big, using the index is<br /> counterproductive. Hopefully that's not too hardto fix; the basic<br /> approach seems quite promising.</blockquote><div class="gmail_quote"><br /></div><div class="gmail_quote">Right.When number of trigrams is big, it is slow to scan posting list of all of them. The solution isthis case is to exclude most frequent trigrams from index scan. But, it require some kind of statistics of trigrams frequencieswhich we don't have. We could estimate frequencies using some hard-coded assumptions about natural languages.Or we could exclude arbitrary trigrams. But I don't like both these ideas. This problem is also relevant for LIKE/ILIKEsearch using trigram indexes.</div><div class="gmail_quote"><br /></div><div class="gmail_quote">Something similarcould occur in tsearch when we search for "frequent_term & rare_term". In some situations (depending on termsfrequencies) it's better to exclude frequent_term from index scan and do recheck. We have relevant statistics to dosuch decision, but it doesn't seem to be feasible to get it using current GIN interface.</div><div class="gmail_quote"><br/></div><div class="gmail_quote">Probably you have some comments on idea of conversion from pg_wcharto multibyte? Is it acceptable at all?</div><br />------<br />With best regards,<br />Alexander Korotkov.</div>
В списке pgsql-hackers по дате отправления: