Re: Shared detoast Datum proposal

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: Shared detoast Datum proposal
Дата
Msg-id 875xyb1a6q.fsf@163.com
обсуждение исходный текст
Ответ на Re: Shared detoast Datum proposal  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: Shared detoast Datum proposal  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
> On 2/26/24 14:22, Andy Fan wrote:
>> 
>>...
>>
>>> Also, toasted values
>>> are not always being used immediately and as a whole, i.e. jsonb values are fully
>>> detoasted (we're working on this right now) to extract the smallest value from
>>> big json, and these values are not worth keeping in memory. For text values too,
>>> we often do not need the whole value to be detoasted.
>> 
>> I'm not sure how often this is true, espeically you metioned text data
>> type. I can't image why people acquire a piece of text and how can we
>> design a toasting system to fulfill such case without detoast the whole
>> as the first step. But for the jsonb or array data type, I think it is
>> more often. However if we design toasting like that, I'm not sure if it
>> would slow down other user case. for example detoasting the whole piece
>> use case. I am justing feeling both way has its user case, kind of heap
>> store and columna store.  
>> 
>
> Any substr/starts_with call benefits from this optimization, and such
> calls don't seem that uncommon. I certainly can imagine people doing
> this fairly often.

This leads me to pay more attention to pg_detoast_datum_slice user case
which I did overlook and caused some regression in this case. Thanks!

As you said later:

> Is there a way to identify cases that are likely to benefit from this
> slicing, and disable the detoasting for them? We already disable it for
> other cases, so maybe we can do this here too?

I think the answer is yes, if our goal is to disable the whole toast for
some speical calls like substr/starts_with. In the current patch, we have:

/*
 * increase_level_for_pre_detoast
 *    Check if the given Expr could detoast a Var directly, if yes,
 * increase the level and return true. otherwise return false;
 */
static inline void
increase_level_for_pre_detoast(Node *node, intermediate_level_context *ctx)
{
    /* The following nodes is impossible to detoast a Var directly. */
    if (IsA(node, List) || IsA(node, TargetEntry) || IsA(node, NullTest))
    {
        ctx->level_added = false;
    }
    else if (IsA(node, FuncExpr) && castNode(FuncExpr, node)->funcid == F_PG_COLUMN_COMPRESSION)
    {
        /* let's not detoast first so that pg_column_compression works. */
        ctx->level_added = false;
    }

while it works, but the blacklist looks not awesome.

> In any case, it's a long-standing behavior /
> optimization, and I don't think we can just dismiss it this quickly.

I agree. So I said both have their user case. and when I say the *both*, I
mean the other method is "partial detoast prototype", which Nikita has
told me before. while I am sure it has many user case, I'm also feeling
it is complex and have many questions in my mind, but I'd like to see
the design before asking them.

> Or maybe there's a way to do the detoasting lazily, and only keep the
> detoasted value when it's requesting the whole value. Or perhaps even
> better, remember what part we detoasted, and maybe it'll be enough for
> future requests?
>
> I'm not sure how difficult would this be with the proposed approach,
> which eagerly detoasts the value into tts_values. I think it'd be
> easier to implement with the TOAST cache I briefly described ... 

I can understand the benefits of the TOAST cache, but the following
issues looks not good to me IIUC: 

1). we can't put the result to tts_values[*] since without the planner
decision, we don't know if this will break small_tlist logic. But if we
put it into the cache only, which means a cache-lookup as a overhead is
unavoidable.

2). It is hard to decide which entry should be evicted without attaching
it to the TupleTableSlot's life-cycle. then we can't grantee the entry
we keep is the entry we will reuse soon?

-- 
Best Regards
Andy Fan




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Optimize planner memory consumption for huge arrays
Следующее
От: Andy Fan
Дата:
Сообщение: Re: Shared detoast Datum proposal