Re: Avoiding hash join batch explosions with extreme skew and weird stats

Поиск

Список

Период

Сортировка

От	Thomas Munro
Тема	Re: Avoiding hash join batch explosions with extreme skew and weird stats
Дата	17 мая 2019 г. 00:26:23
Msg-id	CA+hUKGK1=Xa+AV3NR4PX0XNPwUbnVG=f-P3Abw17X1wE4Yevmg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Avoiding hash join batch explosions with extreme skew and weirdstats (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

On Fri, May 17, 2019 at 11:46 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On Thu, May 16, 2019 at 06:58:43PM -0400, Tom Lane wrote:
> >Thomas Munro <thomas.munro@gmail.com> writes:
> >> On Fri, May 17, 2019 at 4:39 AM Tomas Vondra
> >> <tomas.vondra@2ndquadrant.com> wrote:
> >>> I kinda like the idea with increasing the spaceAllowed value. Essentially,
> >>> if we decide adding batches would be pointless, increasing the memory
> >>> budget is the only thing we can do anyway.
> >
> >> But that's not OK, we need to fix THAT.
> >
> >I don't think it's necessarily a good idea to suppose that we MUST
> >fit in work_mem come what may.  It's likely impossible to guarantee
> >that in all cases.  Even if we can, a query that runs for eons will
> >help nobody.
>
> I kinda agree with Thomas - arbitrarily increasing work_mem is something
> we should not do unless abosolutely necessary. If the query is slow, it's
> up to the user to bump the value up, if deemed appropriate.

+1

I think we can gaurantee that we can fit in work_mem with only one
exception: we have to allow work_mem to be exceeded when we otherwise
couldn't fit a single tuple.

Then the worst possible case with the looping algorithm is that we
degrade to loading just one inner tuple at a time into the hash table,
at which point we effectively have a nested loop join (except (1) it's
flipped around: for each tuple on the inner side, we scan the outer
side; and (2) we can handle full outer joins).  In any reasonable case
you'll have a decent amount of tuples at a time, so you won't have to
loop too many times so it's not really quadratic in the number of
tuples.  The realisation that it's a nested loop join in the extreme
case is probably why the MySQL people called it 'block nested loop
join' (and as far as I can tell from quick googling, it might be their
*primary* strategy for hash joins that don't fit in memory, not just a
secondary strategy after Grace fails, but I might be wrong about
that).  Unlike plain old single-tuple nested loop join, it works in
arbitrary sized blocks (the hash table).  What we would call a regular
hash join, they call a BNL that just happens to have only one loop.  I
think Grace is probably a better primary strategy, but loops are a
good fallback.

The reason I kept mentioning sort-merge in earlier threads is because
it'd be better in the worst cases.  Unfortunately it would be worse in
the best case (smallish numbers of loops) and I suspect many real
world cases.  It's hard to decide, so perhaps we should be happy that
sort-merge can't be considered currently because the join conditions
may not be merge-joinable.

--
Thomas Munro
https://enterprisedb.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Avoiding hash join batch explosions with extreme skew and weird stats