Re: gsoc, text search selectivity and dllist enhancments

Поиск

Список

Период

Сортировка

От	Jan Urbański
Тема	Re: gsoc, text search selectivity and dllist enhancments
Дата	11 июля 2008 г. 03:18:38
Msg-id	4876FB31.8010803@students.mimuw.edu.pl обсуждение исходный текст
Ответ на	Re: gsoc, text search selectivity and dllist enhancments (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

Tom Lane wrote:
> Jan Urbański <j.urbanski@students.mimuw.edu.pl> writes:
>> Tom Lane wrote:
> Well, (1) the normal measure would be statistics_target *tsvectors*,
> and we'd have to translate that to lexemes somehow; my proposal is just
> to use a fixed constant instead of tsvector width as in your original
> patch.  And (2) storing only statistics_target lexemes would be
> uselessly small and would guarantee that people *have to* set a custom
> target on tsvector columns to get useful results.  Obviously broken
> defaults are not my bag.

Fair enough, I'm fine with a multiplication factor.

>> Also, the existing code decides which elements are worth storing as most 
>> common ones by discarding those that are not frequent enough (that's 
>> where num_mcv can get adjusted downwards). I mimicked that for lexemes 
>> but maybe it just doesn't make sense?
> 
> Well, that's not unreasonable either, if you can come up with a
> reasonable definition of "not frequent enough"; but that adds another
> variable to the discussion.

The current definition was "with more occurrences than 0.001 of total 
rows count, but no less than 2". Copied right off 
compute_minimal_stats(), I have no problem with removing it. I think its 
point is to guard you against a situation where all elements are more or 
less unique, and taking the top N would just give you some random noise. 
It doesn't hurt, so I'd be for keeping the mechanism, but if people feel 
different, then I'll just drop it.

-- 
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: gsoc, text search selectivity and dllist enhancments