Re: DSA overflow in hash join
От | Tomas Vondra |
---|---|
Тема | Re: DSA overflow in hash join |
Дата | |
Msg-id | 7e9903e7-c640-429f-a68c-612a2db4b6d1@vondra.me обсуждение исходный текст |
Ответ на | Re: DSA overflow in hash join (Konstantin Knizhnik <knizhnik@garret.ru>) |
Список | pgsql-hackers |
Hi, I did look at this because of the thread about "nbatch overflow" [1]. And the patches I just posted in that thread resolve the issue for me, in the sense that the reproducer [2] no longer fails for me. But I think that's actually mostly an accident - the balancing reduces nbatch, exchanging it for in-memory hash table. In this case we start with nbatch=2M, but it gets reduced to 64k. Which is low enough to fit into the 1GB allocation limit. Which is nice, but I can't guarantee it will always work out like this. It's unlikely we'd need 2M batches, but is it impossible? So we may still need something like this the max_batches protection. I don't think we should apply this to non-parallel hash joins, though. Which is what the last patch would do, I think. However, why don't we simply allow huge allocations for this? /* Allocate space. */ pstate->batches = dsa_allocate_extended(hashtable->area, EstimateParallelHashJoinBatch(hashtable) * nbatch, (DSA_ALLOC_ZERO | DSA_ALLOC_HUGE)); This fixes the issue for me, even with the balancing disabled. Or is there a reason why this would be a bad idea? It seems a bit strange to force parallel scans to use fewer batches, when (presumably) parallelism is more useful for larger data sets. regards [1] https://www.postgresql.org/message-id/244dc6c1-3b3d-4de2-b3de-b1511e6a6d10%40vondra.me [2] https://www.postgresql.org/message-id/52b94d5b-a135-489d-9833-2991a69ec623%40garret.ru -- Tomas Vondra
В списке pgsql-hackers по дате отправления: