Re: gsoc, text search selectivity and dllist enhancments
От | Tom Lane |
---|---|
Тема | Re: gsoc, text search selectivity and dllist enhancments |
Дата | |
Msg-id | 13892.1215722248@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: gsoc, text search selectivity and dllist enhancments (Alvaro Herrera <alvherre@commandprompt.com>) |
Список | pgsql-hackers |
Alvaro Herrera <alvherre@commandprompt.com> writes: > Jan Urbański wrote: >> Oh, one important thing. You need to choose a bucket width for the LC >> algorithm, that is decide after how many elements will you prune your >> data structure. I chose to prune after every twenty tsvectors. > Do you prune after X tsvectors regardless of the numbers of lexemes in > them? I don't think that preserves the algorithm properties; if there's > a bunch of very short tsvectors and then long tsvectors, the pruning > would take place too early for the initial lexemes. I think you should > count lexemes, not tsvectors. Yeah. I haven't read the Lossy Counting paper in detail yet, but I suspect that the mathematical proof of limited error doesn't work if the pruning is done on a variable spacing. I don't see anything very wrong with pruning intra-tsvector; the effects ought to average out, since the point where you prune is going to move around with respect to the tsvector boundaries. regards, tom lane
В списке pgsql-hackers по дате отправления: