Re: gsoc, text search selectivity and dllist enhancments

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: gsoc, text search selectivity and dllist enhancments
Дата	10 июля 2008 г. 17:37:37
Msg-id	13892.1215722248@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: gsoc, text search selectivity and dllist enhancments (Alvaro Herrera <alvherre@commandprompt.com>)
Список	pgsql-hackers

Дерево обсуждения

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Jan Urbański wrote:
>> Oh, one important thing. You need to choose a bucket width for the LC  
>> algorithm, that is decide after how many elements will you prune your  
>> data structure. I chose to prune after every twenty tsvectors.

> Do you prune after X tsvectors regardless of the numbers of lexemes in
> them?  I don't think that preserves the algorithm properties; if there's
> a bunch of very short tsvectors and then long tsvectors, the pruning
> would take place too early for the initial lexemes.  I think you should
> count lexemes, not tsvectors.

Yeah.  I haven't read the Lossy Counting paper in detail yet, but I
suspect that the mathematical proof of limited error doesn't work if the
pruning is done on a variable spacing.  I don't see anything very wrong
with pruning intra-tsvector; the effects ought to average out, since the
point where you prune is going to move around with respect to the
tsvector boundaries.
        regards, tom lane

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: gsoc, text search selectivity and dllist enhancments