Re: Adjusting hash join memory limit to handle batch explosion
От | Tomas Vondra |
---|---|
Тема | Re: Adjusting hash join memory limit to handle batch explosion |
Дата | |
Msg-id | 36af890e-464d-41ae-88a8-d06317f36a98@vondra.me обсуждение исходный текст |
Ответ на | Re: Adjusting hash join memory limit to handle batch explosion (Melanie Plageman <melanieplageman@gmail.com>) |
Список | pgsql-hackers |
On 1/9/25 17:17, Melanie Plageman wrote: > On Tue, Dec 31, 2024 at 6:07 PM Tomas Vondra <tomas@vondra.me> wrote: >> >> So I decided to revisit the three patches from 2019. Attached are >> rebased and cleaned up versions. A couple comments on each one: >> >> >> 1) v20241231-adjust-limit-0001-Account-for-batch-files-in-ha.patch >> >> I believe this is the way to go, for now. The basic idea is to keep the >> overall behavior, but "relax" the memory limit as the number of batches >> increases to minimize the total memory use. >> >> This may seem a bit weird, but as the number of batches grows there's no >> way to not violate the limit. And the current code simply ignores this >> and allocates arbitrary amounts of memory. > > I'm just catching up on this thread and haven't read all the mails > yet. I started with looking at the patches in the first email and got > a bit confused. > > In this patch (v20241231-adjust-limit-0001-Account-for-batch-files-in-ha.patch), > I see that you've started accounting for the spill files in > hashtable->spaceUsed -- in the same way that is done for the tuples in > the hashtable. I know the other memory contexts (hashCxt and batchCxt) > in hashtable aren't appropriate for figuring out spaceUsed, but I was > wondering if the hashtable->spillCxt accurately reflects how much > memory is being used for these spill files at one time? Perhaps it > doesn't make sense to use this, but when we added it in 8c4040edf45, I > thought we might one day be able to use it for determining peak space > usage. Or perhaps you are imagining it only be used for observability? > Good question. Yes, the patch from 12/31 does look at all the memory, including the batch files. I thought about using the spillCtx too, but I don't think it it would work because the context tracks *current* memory usage, and we're interested in how much memory would be used *after* doubling the number of batches. regards -- Tomas Vondra
В списке pgsql-hackers по дате отправления: