Обсуждение: Dead code in toast_fetch_datum_slice?

Поиск
Список
Период
Сортировка

Dead code in toast_fetch_datum_slice?

От
Stephen Frost
Дата:
Greetings,

Perhaps I'm missing something, but in toast_fetch_datum_slice() there's:

    Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer));

Followed, not long after, by:

    if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
        SET_VARSIZE_COMPRESSED(result, length + VARHDRSZ);
    else
        SET_VARSIZE(result, length + VARHDRSZ);

Further, the only caller of this function today is
heap_tuple_untoast_attr_slice(), which does:

    /* fast path for non-compressed external datums */
    if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
        return toast_fetch_datum_slice(attr, sliceoffset, slicelength);

As such, I'm feeling like that conditional to handle the case where this
function is passed a compressed TOAST value is rather confusing dead
code.

Hence I'm proposing the attached.

Thoughts?

Thanks!

Stephen

Вложения

Re: Dead code in toast_fetch_datum_slice?

От
Paul Ramsey
Дата:


On Fri, Dec 7, 2018 at 3:25 PM Stephen Frost <sfrost@snowman.net> wrote:
>
>
> Perhaps I'm missing something, but in toast_fetch_datum_slice() there's:
>
>         Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer));
>
> Further, the only caller of this function today is
> heap_tuple_untoast_attr_slice(), which does:
>
>     /* fast path for non-compressed external datums */
>     if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
>         return toast_fetch_datum_slice(attr, sliceoffset, slicelength);
>
> As such, I'm feeling like that conditional to handle the case where this
> function is passed a compressed TOAST value is rather confusing dead
> code.
>
> Hence I'm proposing the attached

Your analysis looks correct to me, I'm pretty sure I had the same reaction, first time I read through. It would be nice to handle partial decompression all the way down at this level, but unfortunately the comment at the Assert() is right: there's no way to know how many of the toasted pieces need to be read in order to have enough compressed input to create the desired amount of decompressed output, so there's no choice except to read the whole compressed thing, even in a slicing context.

P.
 

Re: Dead code in toast_fetch_datum_slice?

От
Stephen Frost
Дата:
Greetings,

* Paul Ramsey (pramsey@cleverelephant.ca) wrote:
> On Fri, Dec 7, 2018 at 3:25 PM Stephen Frost <sfrost@snowman.net> wrote:
> > Perhaps I'm missing something, but in toast_fetch_datum_slice() there's:
> >
> >         Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer));
> >
> > Further, the only caller of this function today is
> > heap_tuple_untoast_attr_slice(), which does:
> >
> >     /* fast path for non-compressed external datums */
> >     if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
> >         return toast_fetch_datum_slice(attr, sliceoffset, slicelength);
> >
> > As such, I'm feeling like that conditional to handle the case where this
> > function is passed a compressed TOAST value is rather confusing dead
> > code.
> >
> > Hence I'm proposing the attached
>
> Your analysis looks correct to me, I'm pretty sure I had the same reaction,
> first time I read through. It would be nice to handle partial decompression
> all the way down at this level, but unfortunately the comment at the
> Assert() is right: there's no way to know how many of the toasted pieces
> need to be read in order to have enough compressed input to create the
> desired amount of decompressed output, so there's no choice except to read
> the whole compressed thing, even in a slicing context.

Thanks for taking a look.

For the archives sake, this has now been committed.

Thanks again!

Stephen

Вложения

Re: Dead code in toast_fetch_datum_slice?

От
Alvaro Herrera
Дата:
On 2018-Dec-10, Paul Ramsey wrote:

> Your analysis looks correct to me, I'm pretty sure I had the same reaction,
> first time I read through. It would be nice to handle partial decompression
> all the way down at this level, but unfortunately the comment at the
> Assert() is right: there's no way to know how many of the toasted pieces
> need to be read in order to have enough compressed input to create the
> desired amount of decompressed output, so there's no choice except to read
> the whole compressed thing, even in a slicing context.

It'd be useful to have some sort of iterator-style API for detoasting.
If you need more data, just call it again.  It's more wasteful if you
end up retrieving all of the toasted data, but if you just need a
fraction it's obviously a win.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Dead code in toast_fetch_datum_slice?

От
Stephen Frost
Дата:
Greetings,

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
> On 2018-Dec-10, Paul Ramsey wrote:
> > Your analysis looks correct to me, I'm pretty sure I had the same reaction,
> > first time I read through. It would be nice to handle partial decompression
> > all the way down at this level, but unfortunately the comment at the
> > Assert() is right: there's no way to know how many of the toasted pieces
> > need to be read in order to have enough compressed input to create the
> > desired amount of decompressed output, so there's no choice except to read
> > the whole compressed thing, even in a slicing context.
>
> It'd be useful to have some sort of iterator-style API for detoasting.
> If you need more data, just call it again.  It's more wasteful if you
> end up retrieving all of the toasted data, but if you just need a
> fraction it's obviously a win.

I was wondering about that myself.  I was looking this area with the
idea of pushing Paul's patch here:

https://www.postgresql.org/message-id/CACowWR1vBmmje1HDZGdXWX_z5mkypQA1JYW8XxUNyJQ1Mriseg@mail.gmail.com

but that's just "give me all the data from the front to X point."

Paul, what do you think about implementing an iterator for decompressing
PGLZ data, and then using that?  For your use-case, it'd be just one
call since we know how much we want, but for other use-cases (such as
searching a compressed TOAST item for something), it'd be an actual
iteration and we could potentially eliminate a lot of work in those
cases where we just need a boolean yes/no the TOAST'd data matches the
query.

Thanks!

Stephen

Вложения

Re: Dead code in toast_fetch_datum_slice?

От
Paul Ramsey
Дата:
> On Dec 10, 2018, at 10:30 AM, Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
>> On 2018-Dec-10, Paul Ramsey wrote:
>>> Your analysis looks correct to me, I'm pretty sure I had the same reaction,
>>> first time I read through. It would be nice to handle partial decompression
>>> all the way down at this level, but unfortunately the comment at the
>>> Assert() is right: there's no way to know how many of the toasted pieces
>>> need to be read in order to have enough compressed input to create the
>>> desired amount of decompressed output, so there's no choice except to read
>>> the whole compressed thing, even in a slicing context.
>>
>> It'd be useful to have some sort of iterator-style API for detoasting.
>> If you need more data, just call it again.  It's more wasteful if you
>> end up retrieving all of the toasted data, but if you just need a
>> fraction it's obviously a win.
>
> I was wondering about that myself.  I was looking this area with the
> idea of pushing Paul's patch here:
>
> https://www.postgresql.org/message-id/CACowWR1vBmmje1HDZGdXWX_z5mkypQA1JYW8XxUNyJQ1Mriseg@mail.gmail.com
>
> but that's just "give me all the data from the front to X point."
>
> Paul, what do you think about implementing an iterator for decompressing
> PGLZ data, and then using that?  For your use-case, it'd be just one
> call since we know how much we want, but for other use-cases (such as
> searching a compressed TOAST item for something), it'd be an actual
> iteration and we could potentially eliminate a lot of work in those
> cases where we just need a boolean yes/no the TOAST'd data matches the
> query.

I think an iterator on detoast is a precondition to an iterator on decompression which is a precondition to a workflow
thatallows functions to iterate through toasted objects instead of being restricted to the fetch/slice paradigm.  

I was surprised how *few* builtin functions benefited from my partial-decompression patch though: just substring() and
left()for text/bytea. So I’m not sure how many functions could get a win from a fancy iterator.  

One common use case I thought *might* get some leverage is LIKE ‘foo%’, but I shied away from the kind of API mucking
thatwould have been necessary to change that over to use slicing while still supporting all the other ways LIKE is
called.

P