Re: gsoc, text search selectivity and dllist enhancments
От | Jan Urbański |
---|---|
Тема | Re: gsoc, text search selectivity and dllist enhancments |
Дата | |
Msg-id | 4876FB31.8010803@students.mimuw.edu.pl обсуждение исходный текст |
Ответ на | Re: gsoc, text search selectivity and dllist enhancments (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Tom Lane wrote: > Jan Urbański <j.urbanski@students.mimuw.edu.pl> writes: >> Tom Lane wrote: > Well, (1) the normal measure would be statistics_target *tsvectors*, > and we'd have to translate that to lexemes somehow; my proposal is just > to use a fixed constant instead of tsvector width as in your original > patch. And (2) storing only statistics_target lexemes would be > uselessly small and would guarantee that people *have to* set a custom > target on tsvector columns to get useful results. Obviously broken > defaults are not my bag. Fair enough, I'm fine with a multiplication factor. >> Also, the existing code decides which elements are worth storing as most >> common ones by discarding those that are not frequent enough (that's >> where num_mcv can get adjusted downwards). I mimicked that for lexemes >> but maybe it just doesn't make sense? > > Well, that's not unreasonable either, if you can come up with a > reasonable definition of "not frequent enough"; but that adds another > variable to the discussion. The current definition was "with more occurrences than 0.001 of total rows count, but no less than 2". Copied right off compute_minimal_stats(), I have no problem with removing it. I think its point is to guard you against a situation where all elements are more or less unique, and taking the top N would just give you some random noise. It doesn't hurt, so I'd be for keeping the mechanism, but if people feel different, then I'll just drop it. -- Jan Urbanski GPG key ID: E583D7D2 ouden estin
В списке pgsql-hackers по дате отправления: