Обсуждение: Re: Adjusting hash join memory limit to handle batch explosion

Поиск
Список
Период
Сортировка

Re: Adjusting hash join memory limit to handle batch explosion

От
Melanie Plageman
Дата:
On Tue, Dec 31, 2024 at 6:07 PM Tomas Vondra <tomas@vondra.me> wrote:
>
> So I decided to revisit the three patches from 2019. Attached are
> rebased and cleaned up versions. A couple comments on each one:
>
>
> 1) v20241231-adjust-limit-0001-Account-for-batch-files-in-ha.patch
>
> I believe this is the way to go, for now. The basic idea is to keep the
> overall behavior, but "relax" the memory limit as the number of batches
> increases to minimize the total memory use.
>
> This may seem a bit weird, but as the number of batches grows there's no
> way to not violate the limit. And the current code simply ignores this
> and allocates arbitrary amounts of memory.

I'm just catching up on this thread and haven't read all the mails
yet. I started with looking at the patches in the first email and got
a bit confused.

In this patch (v20241231-adjust-limit-0001-Account-for-batch-files-in-ha.patch),
I see that you've started accounting for the spill files in
hashtable->spaceUsed -- in the same way that is done for the tuples in
the hashtable. I know the other memory contexts (hashCxt and batchCxt)
in hashtable aren't appropriate for figuring out spaceUsed, but I was
wondering if the hashtable->spillCxt accurately reflects how much
memory is being used for these spill files at one time? Perhaps it
doesn't make sense to use this, but when we added it in 8c4040edf45, I
thought we might one day be able to use it for determining peak space
usage. Or perhaps you are imagining it only be used for observability?

- Melanie



Re: Adjusting hash join memory limit to handle batch explosion

От
Tomas Vondra
Дата:

On 1/9/25 17:17, Melanie Plageman wrote:
> On Tue, Dec 31, 2024 at 6:07 PM Tomas Vondra <tomas@vondra.me> wrote:
>>
>> So I decided to revisit the three patches from 2019. Attached are
>> rebased and cleaned up versions. A couple comments on each one:
>>
>>
>> 1) v20241231-adjust-limit-0001-Account-for-batch-files-in-ha.patch
>>
>> I believe this is the way to go, for now. The basic idea is to keep the
>> overall behavior, but "relax" the memory limit as the number of batches
>> increases to minimize the total memory use.
>>
>> This may seem a bit weird, but as the number of batches grows there's no
>> way to not violate the limit. And the current code simply ignores this
>> and allocates arbitrary amounts of memory.
> 
> I'm just catching up on this thread and haven't read all the mails
> yet. I started with looking at the patches in the first email and got
> a bit confused.
> 
> In this patch (v20241231-adjust-limit-0001-Account-for-batch-files-in-ha.patch),
> I see that you've started accounting for the spill files in
> hashtable->spaceUsed -- in the same way that is done for the tuples in
> the hashtable. I know the other memory contexts (hashCxt and batchCxt)
> in hashtable aren't appropriate for figuring out spaceUsed, but I was
> wondering if the hashtable->spillCxt accurately reflects how much
> memory is being used for these spill files at one time? Perhaps it
> doesn't make sense to use this, but when we added it in 8c4040edf45, I
> thought we might one day be able to use it for determining peak space
> usage. Or perhaps you are imagining it only be used for observability?
> 

Good question. Yes, the patch from 12/31 does look at all the memory,
including the batch files. I thought about using the spillCtx too, but I
don't think it it would work because the context tracks *current* memory
usage, and we're interested in how much memory would be used *after*
doubling the number of batches.


regards

-- 
Tomas Vondra