Re: Tuple sampling
От | Tom Lane |
---|---|
Тема | Re: Tuple sampling |
Дата | |
Msg-id | 28169.1085347956@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Tuple sampling (Manfred Koizar <mkoi-pg@aon.at>) |
Ответы |
Re: Tuple sampling
Re: Tuple sampling |
Список | pgsql-patches |
Manfred Koizar <mkoi-pg@aon.at> writes: > This patch implements the new tuple sampling method as discussed on > -hackers and -performance a few weeks ago. Applied with minor editorializations. AFAICS get_next_S() needs to be called with the number of tuples already processed, which means you were off-by-one --- this surely makes only a trivial difference in the probabilities, but if we are going to use Vitter's algorithm then we may as well get it right. Also, I took out the TupleCount typedef and went back to using doubles for the tuple counts; this is more consistent with the coding style used elsewhere, and I really doubt that it's any slower. (The datatype conversions induced inside get_next_S are likely to outweigh any savings from counting by ints, on most modern hardware.) Plus the justification for assuming it couldn't overflow seems weak to me; the current limitation to 300000 requested sample rows is very arbitrary and could change anytime. I was initially convinced that your implementation of Knuth's algorithm S was all wet, so now there's a bunch of comments explaining why it's actually correct... regards, tom lane
В списке pgsql-patches по дате отправления: