Обсуждение: Dead code in toast_fetch_datum_slice?
Greetings, Perhaps I'm missing something, but in toast_fetch_datum_slice() there's: Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)); Followed, not long after, by: if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) SET_VARSIZE_COMPRESSED(result, length + VARHDRSZ); else SET_VARSIZE(result, length + VARHDRSZ); Further, the only caller of this function today is heap_tuple_untoast_attr_slice(), which does: /* fast path for non-compressed external datums */ if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) return toast_fetch_datum_slice(attr, sliceoffset, slicelength); As such, I'm feeling like that conditional to handle the case where this function is passed a compressed TOAST value is rather confusing dead code. Hence I'm proposing the attached. Thoughts? Thanks! Stephen
Вложения
On Fri, Dec 7, 2018 at 3:25 PM Stephen Frost <sfrost@snowman.net> wrote:
>
>
> Perhaps I'm missing something, but in toast_fetch_datum_slice() there's:
>
> Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer));
>
> Further, the only caller of this function today is
> heap_tuple_untoast_attr_slice(), which does:
>
> /* fast path for non-compressed external datums */
> if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
> return toast_fetch_datum_slice(attr, sliceoffset, slicelength);
>
> As such, I'm feeling like that conditional to handle the case where this
> function is passed a compressed TOAST value is rather confusing dead
> code.
>
> Hence I'm proposing the attached
Your analysis looks correct to me, I'm pretty sure I had the same reaction, first time I read through. It would be nice to handle partial decompression all the way down at this level, but unfortunately the comment at the Assert() is right: there's no way to know how many of the toasted pieces need to be read in order to have enough compressed input to create the desired amount of decompressed output, so there's no choice except to read the whole compressed thing, even in a slicing context.
P.
Greetings, * Paul Ramsey (pramsey@cleverelephant.ca) wrote: > On Fri, Dec 7, 2018 at 3:25 PM Stephen Frost <sfrost@snowman.net> wrote: > > Perhaps I'm missing something, but in toast_fetch_datum_slice() there's: > > > > Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)); > > > > Further, the only caller of this function today is > > heap_tuple_untoast_attr_slice(), which does: > > > > /* fast path for non-compressed external datums */ > > if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) > > return toast_fetch_datum_slice(attr, sliceoffset, slicelength); > > > > As such, I'm feeling like that conditional to handle the case where this > > function is passed a compressed TOAST value is rather confusing dead > > code. > > > > Hence I'm proposing the attached > > Your analysis looks correct to me, I'm pretty sure I had the same reaction, > first time I read through. It would be nice to handle partial decompression > all the way down at this level, but unfortunately the comment at the > Assert() is right: there's no way to know how many of the toasted pieces > need to be read in order to have enough compressed input to create the > desired amount of decompressed output, so there's no choice except to read > the whole compressed thing, even in a slicing context. Thanks for taking a look. For the archives sake, this has now been committed. Thanks again! Stephen
Вложения
On 2018-Dec-10, Paul Ramsey wrote: > Your analysis looks correct to me, I'm pretty sure I had the same reaction, > first time I read through. It would be nice to handle partial decompression > all the way down at this level, but unfortunately the comment at the > Assert() is right: there's no way to know how many of the toasted pieces > need to be read in order to have enough compressed input to create the > desired amount of decompressed output, so there's no choice except to read > the whole compressed thing, even in a slicing context. It'd be useful to have some sort of iterator-style API for detoasting. If you need more data, just call it again. It's more wasteful if you end up retrieving all of the toasted data, but if you just need a fraction it's obviously a win. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Greetings, * Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > On 2018-Dec-10, Paul Ramsey wrote: > > Your analysis looks correct to me, I'm pretty sure I had the same reaction, > > first time I read through. It would be nice to handle partial decompression > > all the way down at this level, but unfortunately the comment at the > > Assert() is right: there's no way to know how many of the toasted pieces > > need to be read in order to have enough compressed input to create the > > desired amount of decompressed output, so there's no choice except to read > > the whole compressed thing, even in a slicing context. > > It'd be useful to have some sort of iterator-style API for detoasting. > If you need more data, just call it again. It's more wasteful if you > end up retrieving all of the toasted data, but if you just need a > fraction it's obviously a win. I was wondering about that myself. I was looking this area with the idea of pushing Paul's patch here: https://www.postgresql.org/message-id/CACowWR1vBmmje1HDZGdXWX_z5mkypQA1JYW8XxUNyJQ1Mriseg@mail.gmail.com but that's just "give me all the data from the front to X point." Paul, what do you think about implementing an iterator for decompressing PGLZ data, and then using that? For your use-case, it'd be just one call since we know how much we want, but for other use-cases (such as searching a compressed TOAST item for something), it'd be an actual iteration and we could potentially eliminate a lot of work in those cases where we just need a boolean yes/no the TOAST'd data matches the query. Thanks! Stephen
Вложения
> On Dec 10, 2018, at 10:30 AM, Stephen Frost <sfrost@snowman.net> wrote: > > Greetings, > > * Alvaro Herrera (alvherre@2ndquadrant.com) wrote: >> On 2018-Dec-10, Paul Ramsey wrote: >>> Your analysis looks correct to me, I'm pretty sure I had the same reaction, >>> first time I read through. It would be nice to handle partial decompression >>> all the way down at this level, but unfortunately the comment at the >>> Assert() is right: there's no way to know how many of the toasted pieces >>> need to be read in order to have enough compressed input to create the >>> desired amount of decompressed output, so there's no choice except to read >>> the whole compressed thing, even in a slicing context. >> >> It'd be useful to have some sort of iterator-style API for detoasting. >> If you need more data, just call it again. It's more wasteful if you >> end up retrieving all of the toasted data, but if you just need a >> fraction it's obviously a win. > > I was wondering about that myself. I was looking this area with the > idea of pushing Paul's patch here: > > https://www.postgresql.org/message-id/CACowWR1vBmmje1HDZGdXWX_z5mkypQA1JYW8XxUNyJQ1Mriseg@mail.gmail.com > > but that's just "give me all the data from the front to X point." > > Paul, what do you think about implementing an iterator for decompressing > PGLZ data, and then using that? For your use-case, it'd be just one > call since we know how much we want, but for other use-cases (such as > searching a compressed TOAST item for something), it'd be an actual > iteration and we could potentially eliminate a lot of work in those > cases where we just need a boolean yes/no the TOAST'd data matches the > query. I think an iterator on detoast is a precondition to an iterator on decompression which is a precondition to a workflow thatallows functions to iterate through toasted objects instead of being restricted to the fetch/slice paradigm. I was surprised how *few* builtin functions benefited from my partial-decompression patch though: just substring() and left()for text/bytea. So I’m not sure how many functions could get a win from a fancy iterator. One common use case I thought *might* get some leverage is LIKE ‘foo%’, but I shied away from the kind of API mucking thatwould have been necessary to change that over to use slicing while still supporting all the other ways LIKE is called. P