Re: Shared detoast Datum proposal

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: Shared detoast Datum proposal
Дата
Msg-id 87ttllbd00.fsf@163.com
обсуждение исходный текст
Ответ на Re: Shared detoast Datum proposal  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: Shared detoast Datum proposal  (Nikita Malakhov <hukutoc@gmail.com>)
Список pgsql-hackers
>
>> 2. more likely to use up all the memory which is allowed. for example:
>> if we set the limit to 1MB, then we kept more data which will be not
>> used and then consuming all of the 1MB. 
>> 
>> My method is resolving this with some helps from other modules (kind of
>> invasive) but can control the eviction well and use the memory as less
>> as possible.
>> 
>
> Is the memory usage really an issue? Sure, it'd be nice to evict entries
> as soon as we can prove they are not needed anymore, but if the cache
> limit is set to 1MB it's not really a problem to use 1MB.

This might be a key point which leads us to some different directions, so
I want to explain more about this, to see if we can get some agreement
here.

It is a bit hard to decide which memory limit to set, 1MB, 10MB or 40MB,
100MB. In my current case it is 40MB at least. Less memory limit 
causes cache ineffecitve, high memory limit cause potential memory
use-up issue in the TOAST cache design. But in my method, even we set a
higher value, it just limits the user case it really (nearly) needed,
and it would not cache more values util the limit is hit. This would
make a noticable difference when we want to set a high limit and we have
some high active sessions, like 100 * 40MB = 4GB. 

> On 3/4/24 18:08, Andy Fan wrote:
>> ...
>>>
>>>> I assumed that releasing all of the memory at the end of executor once
>>>> is not an option since it may consumed too many memory. Then, when and
>>>> which entry to release becomes a trouble for me. For example:
>>>>
>>>>           QUERY PLAN
>>>> ------------------------------
>>>>  Nested Loop
>>>>    Join Filter: (t1.a = t2.a)
>>>>    ->  Seq Scan on t1
>>>>    ->  Seq Scan on t2
>>>> (4 rows)
>>>>
>>>> In this case t1.a needs a longer lifespan than t2.a since it is
>>>> in outer relation. Without the help from slot's life-cycle system, I
>>>> can't think out a answer for the above question.
>>>>
>>>
>>> This is true, but how likely such plans are? I mean, surely no one would
>>> do nested loop with sequential scans on reasonably large tables, so how
>>> representative this example is?
>> 
>> Acutally this is a simplest Join case, we still have same problem like
>> Nested Loop + Index Scan which will be pretty common. 
>> 
>
> Yes, I understand there are cases where LRU eviction may not be the best
> choice - like here, where the "t1" should stay in the case. But there
> are also cases where this is the wrong choice, and LRU would be better.
>
> For example a couple paragraphs down you suggest to enforce the memory
> limit by disabling detoasting if the memory limit is reached. That means
> the detoasting can get disabled because there's a single access to the
> attribute somewhere "up the plan tree". But what if the other attributes
> (which now won't be detoasted) are accessed many times until then?

I am not sure I can't follow up here, but I want to explain more about
the disable-detoast-sharing logic when the memory limit is hit. When
this happen, the detoast sharing is disabled, but since the detoast
datum will be released every soon when the slot->tts_values[*] is
discard, then the 'disable' turn to 'enable' quickly. So It is not 
true that once it is get disabled, it can't be enabled any more for the
given query.

> I think LRU is a pretty good "default" algorithm if we don't have a very
> good idea of the exact life span of the values, etc. Which is why other
> nodes (e.g. Memoize) use LRU too.

> But I wonder if there's a way to count how many times an attribute is
> accessed (or is likely to be). That might be used to inform a better
> eviction strategy.

Yes, but in current issue we can get a better esitimation with the help
of plan shape and Memoize depends on some planner information as well.
If we bypass the planner information and try to resolve it at the 
cache level, the code may become to complex as well and all the cost is
run time overhead while the other way is a planning timie overhead.

> Also, we don't need to evict the whole entry - we could evict just the
> data part (guaranteed to be fairly large), but keep the header, and keep
> the counts, expected number of hits, and other info. And use this to
> e.g. release entries that reached the expected number of hits. But I'd
> probably start with LRU and only do this as an improvement later.

A great lession learnt here, thanks for sharing this!

As for the current user case what I want to highlight is in the current
user case, we are "caching" "user data" "locally".

USER DATA indicates it might be very huge, it is not common to have a
1M tables, but it is much common we have 1M Tuples in one scan, so
keeping the header might extra memory usage as well, like 10M * 24 bytes
= 240MB. LOCALLY means it is not friend to multi active sessions. CACHE
indicates it is hard to evict correctly. My method also have the USER
DATA, LOCALLY attributes, but it would be better at eviction. eviction
then have lead to memory usage issue which is discribed at the beginning
of this writing. 

>>> Also, this leads me to the need of having some sort of memory limit. If
>>> we may be keeping entries for extended periods of time, and we don't
>>> have any way to limit the amount of memory, that does not seem great.
>>>
>>> AFAIK if we detoast everything into tts_values[] there's no way to
>>> implement and enforce such limit. What happens if there's a row with
>>> multiple large-ish TOAST values? What happens if those rows are in
>>> different (and distant) parts of the plan?
>> 
>> I think this can be done by tracking the memory usage on EState level
>> or global variable level and disable it if it exceeds the limits and
>> resume it when we free the detoast datum when we don't need it. I think
>> no other changes need to be done.
>> 
>
> That seems like a fair amount of additional complexity. And what if the
> toasted values are accessed in context without EState (I haven't checked
> how common / important that is)?
>
> And I'm not sure just disabling the detoast as a way to enforce a memory
> limit, as explained earlier.
>
>>> It seems far easier to limit the memory with the toast cache.
>> 
>> I think the memory limit and entry eviction is the key issue now. IMO,
>> there are still some difference when both methods can support memory
>> limit. The reason is my patch can grantee the cached memory will be
>> reused, so if we set the limit to 10MB, we know all the 10MB is
>> useful, but the TOAST cache method, probably can't grantee that, then
>> when we want to make it effecitvely, we have to set a higher limit for
>> this.
>> 
>
> Can it actually guarantee that? It can guarantee the slot may be used,
> but I don't see how could it guarantee the detoasted value will be used.
> We may be keeping the slot for other reasons. And even if it could
> guarantee the detoasted value will be used, does that actually prove
> it's better to keep that value? What if it's only used once, but it's
> blocking detoasting of values that will be used 10x that?
>
> If we detoast a 10MB value in the outer side of the Nest Loop, what if
> the inner path has multiple accesses to another 10MB value that now
> can't be detoasted (as a shared value)?

Grarantee may be wrong word. The difference in my mind are:
1. plan shape have better potential to know the user case of datum,
since we know the plan tree and knows the rows pass to a given node.
2. Planning time effort is cheaper than run-time effort.
3. eviction in my method is not as important as it is in TOAST cache
method since it is reset per slot, so usually it doesn't hit limit in
fact. But as a cache, it does.  
4. use up to memory limit we set in TOAST cache case. 

>>> In any case, my concern is more about having to do this when creating
>>> the plan at all, the code complexity etc. Not just because it might have
>>> performance impact.
>> 
>> I think the main trade-off is TOAST cache method is pretty non-invasive
>> but can't control the eviction well, the impacts includes:
>> 1. may evicting the datum we want and kept the datum we don't need.
>
> This applies to any eviction algorithm, not just LRU. Ultimately what
> matters is whether we have in the cache the most often used values, i.e.
> the hit ratio (perhaps in combination with how expensive detoasting that
> particular entry was).

Correct, just that I am doubtful about design a LOCAL CACHE for USER
DATA with the reason I described above.

At last, thanks for your attention, really appreciated about it!

-- 
Best Regards
Andy Fan




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: [PoC] Improve dead tuple storage for lazy vacuum
Следующее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: Some shared memory chunks are allocated even if related processes won't start