Re: gistchoose vs. bloat
От | Heikki Linnakangas |
---|---|
Тема | Re: gistchoose vs. bloat |
Дата | |
Msg-id | 51019F41.1060204@vmware.com обсуждение исходный текст |
Ответ на | Re: gistchoose vs. bloat (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: gistchoose vs. bloat
|
Список | pgsql-hackers |
On 24.01.2013 22:35, Tom Lane wrote: > Alexander Korotkov<aekorotkov@gmail.com> writes: >> There is another cause of overhead when use randomization in gistchoose: >> extra penalty calls. It could be significant when index fits to cache. In >> order evade it I especially change behaviour of my patch from "look >> sequentially and choose random" to "look in random order". I think we need >> to include comparison of CPU time. > > Hmm ... actually, isn't that an argument in favor of Heikki's method? > The way he's doing it, we can exit without making additional penalty > calls once we've found a zero-penalty tuple and decided not to look > further (which is something with a pretty high probability). No, I think Alexander is right, although I believe the difference is minimal in practice. If we assume that the there are no zero-penalty tuples on the page, with both approaches you have to call penalty on every tuple to know which is best. If there are zero-penalty tuples, then there is a small difference. With Alexander's method, you can stop looking as soon as you find a zero-penalty tuple (the order you check the tuples is random). With my method, you can stop looking as soon as you find a zero-penalty tuple, *and* you decide to not look further. With the 1/2 probability to stop looking further, you give up on average after 2 tuples. So if I'm doing my math right, my patch does on average between 1x - 2x as many penalty calls as Alexander's, depending on how many zero-penalty tuples there are. The 2x case is when the page is full of zero-penalty tuples, in which case the difference isn't big in absolute terms; 2 penalty calls per page versus 1. BTW, one thing that I wondered about this: How expensive is random()? I'm assuming not very, but I don't really know. Alexander's patch called random() for every tuple on the page, while I call it only once for each equal-penalty tuple. If it's really cheap, I think my/Tom's patch could be slightly simplified by always initializing keep_current_best with random() at top of the function, and eliminating the -1 "haven't decided yet" state. OTOH, if random() is expensive, I note that we only need one bit of randomness, so we could avoid calling random() so often if we utilized all the bits from each random() call. - Heikki
В списке pgsql-hackers по дате отправления: