Re: Optimize planner memory consumption for huge arrays

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: Optimize planner memory consumption for huge arrays
Дата	25 февраля 2024 г. 13:52:47
Msg-id	4f9f1c8e-e9ad-488d-9c47-d8f0f4e61f75@enterprisedb.com обсуждение исходный текст
Ответ на	Re: Optimize planner memory consumption for huge arrays (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Optimize planner memory consumption for huge arrays
Список	pgsql-hackers

Дерево обсуждения

On 2/25/24 00:07, Tom Lane wrote:
> I wrote:
>> Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
>>> On 2/19/24 16:45, Tom Lane wrote:
>>>> Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
>>>>> For example, I don't think we expect selectivity functions to allocate
>>>>> long-lived objects, right? So maybe we could run them in a dedicated
>>>>> memory context, and reset it aggressively (after each call).
> 
>>>> That could eliminate a whole lot of potential leaks.  I'm not sure 
>>>> though how much it moves the needle in terms of overall planner
>>>> memory consumption.
> 
>>> It was an ad hoc thought, inspired by the issue at hand. Maybe it would
>>> be possible to find similar "boundaries" in other parts of the planner.
> 
>> Here's a quick and probably-incomplete implementation of that idea.
>> I've not tried to study its effects on memory consumption, just made
>> sure it passes check-world.
> 
> I spent a bit more time on this patch.  One thing I was concerned
> about was whether it causes any noticeable slowdown, and it seems that
> it does: testing with "pgbench -S" I observe perhaps 1% slowdown.
> However, we don't necessarily need to reset the temp context after
> every single usage.  I experimented with resetting it every tenth
> time, and that got me from 1% slower than HEAD to 1% faster.

Isn't 1% well within the usual noise and/or the differences that can be
caused simply by slightly different alignment of the binary? I'd treat
this as "same performance" ...

> Of course "every tenth time" is very ad hoc.  I wondered if we could
> make it somehow conditional on how much memory had been consumed
> in the temp context, but there doesn't seem to be any cheap way
> to get that.  Applying something like MemoryContextMemConsumed
> would surely be a loser.  I'm not sure if it'd be worth extending
> the mcxt.c API to provide something like "MemoryContextResetIfBig",
> with some internal rule that would be cheap to apply like "reset
> if we have any non-keeper blocks".

Wouldn't it be sufficient to look simply at MemoryContextMemAllocated?
That's certainly way cheaper than MemoryContextStatsInternal, especially
if the context tree is shallow (which I think we certainly expect here).

I think MemoryContextResetIfBig is an interesting idea - I think a good
way to define "big" would be "has multiple blocks", because that's the
only case where we can actually reclaim some memory.

> 
> I also looked into whether it really does reduce overall memory
> consumption noticeably, by collecting stats about planner memory
> consumption during the core regression tests.  The answer is that
> it barely helps.  I see the average used space across all planner
> invocations drop from 23344 bytes to 23220, and the worst-case
> numbers hardly move at all.  So that's a little discouraging.
> But of course the regression tests prefer not to deal in very
> large/expensive test cases, so maybe it's not surprising that
> I don't see much win in this test.
> 

I'm not really surprised by this - I think you're right most of our
selectivity functions either doesn't do memory-expensive stuff, or we
don't have such corner cases in our regression tests. Or at least not to
the extent to move the overall average, so we'd need to look at
individual cases allocating quite a bit of memory.

But I think that's fine - I see this as a safety measure, not something
that'd improve the "good" cases.

> Anyway, 0001 attached is a cleaned-up patch with the every-tenth-
> time rule, and 0002 (not meant for commit) is the quick and
> dirty instrumentation patch I used for collecting usage stats.
> 
> Even though this seems of only edge-case value, I'd much prefer
> to do this than the sort of ad-hoc patching initially proposed
> in this thread.
> 

+1 to that, it seems like a more principled approach.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Optimize planner memory consumption for huge arrays