Re: Avoiding hash join batch explosions with extreme skew and weird stats
От | Thomas Munro |
---|---|
Тема | Re: Avoiding hash join batch explosions with extreme skew and weird stats |
Дата | |
Msg-id | CA+hUKGK1=Xa+AV3NR4PX0XNPwUbnVG=f-P3Abw17X1wE4Yevmg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Avoiding hash join batch explosions with extreme skew and weirdstats (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Список | pgsql-hackers |
On Fri, May 17, 2019 at 11:46 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > On Thu, May 16, 2019 at 06:58:43PM -0400, Tom Lane wrote: > >Thomas Munro <thomas.munro@gmail.com> writes: > >> On Fri, May 17, 2019 at 4:39 AM Tomas Vondra > >> <tomas.vondra@2ndquadrant.com> wrote: > >>> I kinda like the idea with increasing the spaceAllowed value. Essentially, > >>> if we decide adding batches would be pointless, increasing the memory > >>> budget is the only thing we can do anyway. > > > >> But that's not OK, we need to fix THAT. > > > >I don't think it's necessarily a good idea to suppose that we MUST > >fit in work_mem come what may. It's likely impossible to guarantee > >that in all cases. Even if we can, a query that runs for eons will > >help nobody. > > I kinda agree with Thomas - arbitrarily increasing work_mem is something > we should not do unless abosolutely necessary. If the query is slow, it's > up to the user to bump the value up, if deemed appropriate. +1 I think we can gaurantee that we can fit in work_mem with only one exception: we have to allow work_mem to be exceeded when we otherwise couldn't fit a single tuple. Then the worst possible case with the looping algorithm is that we degrade to loading just one inner tuple at a time into the hash table, at which point we effectively have a nested loop join (except (1) it's flipped around: for each tuple on the inner side, we scan the outer side; and (2) we can handle full outer joins). In any reasonable case you'll have a decent amount of tuples at a time, so you won't have to loop too many times so it's not really quadratic in the number of tuples. The realisation that it's a nested loop join in the extreme case is probably why the MySQL people called it 'block nested loop join' (and as far as I can tell from quick googling, it might be their *primary* strategy for hash joins that don't fit in memory, not just a secondary strategy after Grace fails, but I might be wrong about that). Unlike plain old single-tuple nested loop join, it works in arbitrary sized blocks (the hash table). What we would call a regular hash join, they call a BNL that just happens to have only one loop. I think Grace is probably a better primary strategy, but loops are a good fallback. The reason I kept mentioning sort-merge in earlier threads is because it'd be better in the worst cases. Unfortunately it would be worse in the best case (smallish numbers of loops) and I suspect many real world cases. It's hard to decide, so perhaps we should be happy that sort-merge can't be considered currently because the join conditions may not be merge-joinable. -- Thomas Munro https://enterprisedb.com
В списке pgsql-hackers по дате отправления: