Re: [HACKERS] Performance degradation in TPC-H Q18
От | Andres Freund |
---|---|
Тема | Re: [HACKERS] Performance degradation in TPC-H Q18 |
Дата | |
Msg-id | 20170301040222.3hp5s7rvbuow2mtr@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: [HACKERS] Performance degradation in TPC-H Q18 (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On 2017-03-01 09:21:47 +0530, Robert Haas wrote: > On Wed, Mar 1, 2017 at 8:48 AM, Andres Freund <andres@anarazel.de> wrote: > >> BTW, another option to consider is lowering the target fillfactor. > >> IIRC, Kuntal mentioned to me that cranking it down seemed to fix the > >> issue. Obviously, it's better to detect when we need a lower > >> fillfactor than to always use a lower one, but obviously the tighter > >> you pack the hash table, the more likely it is that you're going to > >> have these kinds of problems. > > > > Yea, that'd be one approach, but I feel better dynamically increasing > > the fillfactor like in the patch. Even with a lower fillfactor you could > > see issues, and as you say a higher fillfactor is nice [TM]; after the > > patch I played with *increasing* the fillfactor, without being able to > > measure a downside. > > I am going to bet that increasing the fillfactor would be a mistake. > The expected length of a clump in the table is going to be about > 1/(1-ff), which increases very quickly beyond the current value of > 0.8. For example, if the actual fillfactor is 0.9 and the keys are > perfectly distributed -- nine in a row and then one empty slot, > lather, rinse, repeat -- probing for a key that's not there has a 10% > chance of hitting an empty slot, a 10% chance of hitting the slot just > before an empty slot, and so on. So the expected number of probes to > find that there's no match is 4.5. However, it could easily happen > that you have 3 or 4 empty slots in a row in one place and 27 or 36 > occupied slots in a row in another place, and that could cause all > kinds of grief. Moreover, real-world hash functions on real-world > data aren't always perfect, which increases the chances of things > going off track. I'm not exactly sure how high a fillfactor is too > high, but note that > https://en.wikipedia.org/wiki/Hash_table#Open_addressing claims that > "performance dramatically degrades when the load factor grows beyond > 0.7 or so", which isn't cited but the same text appears in > https://www.unf.edu/~wkloster/3540/wiki_book2.pdf for whatever that's > worth. That's without the "trick" of "robin hood" hashing though: https://en.wikipedia.org/wiki/Hash_table#Robin_Hood_hashing http://codecapsule.com/2013/11/17/robin-hood-hashing-backward-shift-deletion/ https://www.pvk.ca/Blog/2013/11/26/the-other-robin-hood-hashing/ it probably depends a bit on the scenario how high you want to have the fillfactor - for hashaggs a high fill factor would probably be good (they're often "read" heavy), for bitmapscans less so. But I guess there's not much point in immediately increasing the fillfactor, can do that in a second step. Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: