Re: DSA overflow in hash join

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: DSA overflow in hash join
Дата	25 сентября 02:22:01
Msg-id	7e9903e7-c640-429f-a68c-612a2db4b6d1@vondra.me обсуждение исходный текст
Ответ на	Re: DSA overflow in hash join (Konstantin Knizhnik <knizhnik@garret.ru>)
Список	pgsql-hackers

Дерево обсуждения

Hi,

I did look at this because of the thread about "nbatch overflow" [1].
And the patches I just posted in that thread resolve the issue for me,
in the sense that the reproducer [2] no longer fails for me.

But I think that's actually mostly an accident - the balancing reduces
nbatch, exchanging it for in-memory hash table. In this case we start
with nbatch=2M, but it gets reduced to 64k. Which is low enough to fit
into the 1GB allocation limit.

Which is nice, but I can't guarantee it will always work out like this.
It's unlikely we'd need 2M batches, but is it impossible?

So we may still need something like this the max_batches protection. I
don't think we should apply this to non-parallel hash joins, though.
Which is what the last patch would do, I think.

However, why don't we simply allow huge allocations for this?

  /* Allocate space. */
  pstate->batches =
    dsa_allocate_extended(hashtable->area,
           EstimateParallelHashJoinBatch(hashtable) * nbatch,
       (DSA_ALLOC_ZERO | DSA_ALLOC_HUGE));

This fixes the issue for me, even with the balancing disabled. Or is
there a reason why this would be a bad idea?

It seems a bit strange to force parallel scans to use fewer batches,
when (presumably) parallelism is more useful for larger data sets.

regards


[1]
https://www.postgresql.org/message-id/244dc6c1-3b3d-4de2-b3de-b1511e6a6d10%40vondra.me

[2]
https://www.postgresql.org/message-id/52b94d5b-a135-489d-9833-2991a69ec623%40garret.ru

-- 
Tomas Vondra

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: DSA overflow in hash join