Re: Stats target increase vs compute_tsvector_stats()
От | Tom Lane |
---|---|
Тема | Re: Stats target increase vs compute_tsvector_stats() |
Дата | |
Msg-id | 29737.1229353308@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Stats target increase vs compute_tsvector_stats() (Jan Urbański <j.urbanski@students.mimuw.edu.pl>) |
Список | pgsql-hackers |
Jan Urbański <j.urbanski@students.mimuw.edu.pl> writes: > Tom Lane wrote: >> I came across this bit in ts_typanalyze.c: >> >> /* We want statistic_target * 100 lexemes in the MCELEM array */ >> num_mcelem = stats->attr->attstattarget * 100; >> >> I wonder whether the multiplier here should be changed? > The origin of that bit is this post: > http://archives.postgresql.org/pgsql-hackers/2008-07/msg00556.php > and the following few downthread ones. > If we bump the default statistics target 10 times, then changing the > multiplier to 10 seems the right thing to do. OK, will do. > Only thing that needs > caution is the frequency of pruning we do in the Lossy Counting > algorithm, that IIRC is correlated with the desired target length of the > MCELEM array. Right below that we have /* * We set bucket width equal to the target number of result lexemes. * This is probably about right but perhaps might needto be scaled * up or down a bit? */bucket_width = num_mcelem; so it should track automatically. AFAICS the argument in the above thread that this is an appropriate pruning distance holds good regardless of just how we obtain the target mcelem count. > BTW: I've been occupied with other things and might have missed some > discussions, but at some point it has been considered to use Lossy > Counting to gather statistics from regular columns, not only tsvectors. > Wouldn't this help the performance hit ANALYZE takes from upping > default_stats_target? Perhaps, but it's not likely to get done for 8.4 ... regards, tom lane
В списке pgsql-hackers по дате отправления: