Обсуждение: Dead code with short varlenas in toast_save_datum()

Поиск
Список
Период
Сортировка

Dead code with short varlenas in toast_save_datum()

От
Michael Paquier
Дата:
Hi all,

Nikhil (in CC.), has noticed while looking at the 8B-value TOAST patch
that this code block used for short varlenas in toast_save_datum() is
dead based on the current coverage:
    if (VARATT_IS_SHORT(dval))
    {
        data_p = VARDATA_SHORT(dval);
        data_todo = VARSIZE_SHORT(dval) - VARHDRSZ_SHORT;
        toast_pointer.va_rawsize = data_todo + VARHDRSZ;    /* as if not short */
        toast_pointer.va_extinfo = data_todo;
    }

Coverage link:
https://coverage.postgresql.org/src/backend/access/common/toast_internals.c.gcov.html

toast_save_datum() is taken only through toast_tuple_externalize(),
and I was assuming first that an EXTERNAL attribute with a minimal
toast_tuple_target of 128B would have been enough to trigger this,
still even a minimal bound is not enough to trigger
heap_toast_insert_or_update(), which heap_prepare_insert() would call
if:
- the hardcoded check on TOAST_TUPLE_THRESHOLD is reached, which of
course would not happen.
- tuple is marked with HEAP_HASEXTERNAL, something that happens only
if fill_val() finds an external varlena (aka VARATT_IS_EXTERNAL),
and that does not happen for SHORT varlenas.

The insertion of short varlenas in TOAST is old enough to vote and it
assumed as supported when reading external on-disk TOAST datums, still
it's amazing to see that there no in-core coverage for it.

Anyway, There is an action item here.  I don't see a SQL pattern that
would trigger this code path on HEAD (perhaps I lack the imagination
to do so), so the only alternative I can think of is the introduction
of a C function in regress.c to force our way through, calling
directly toast_save_datum() or something equivalent, so as
toast_save_datum() is tested with more varlena patterns.  That would
be something similar to what we do for indirect TOAST tuples.

Another possibility is to assume that this code is dead.  But I doubt
that it is possible to claim that as the TOAST code is assumed as
being usable by out-of-core code, so depending on how fancy things are
being done a TOAST relation could use a short varatt for an insert.

Thoughts?
--
Michael

Вложения

Re: Dead code with short varlenas in toast_save_datum()

От
Nikita Malakhov
Дата:
Hi!

Michael, as far as I understand while externalizing tuple we always check
tuple size with toast_tuple_find_biggest_attribute(), where 
biggest attribute size is always >= MAXALIGN(TOAST_POINTER_SIZE)
so currently I cannot see how it is possible to get into VARATT_IS_SHORT branch.

I've commented this code and done some tests and did not see any unexpected
behavior. It certainly looks like some legacy code left "just because".

On Mon, Aug 4, 2025 at 6:16 AM Michael Paquier <michael@paquier.xyz> wrote:
Hi all,

Nikhil (in CC.), has noticed while looking at the 8B-value TOAST patch
that this code block used for short varlenas in toast_save_datum() is
dead based on the current coverage:
    if (VARATT_IS_SHORT(dval))
    {
        data_p = VARDATA_SHORT(dval);
        data_todo = VARSIZE_SHORT(dval) - VARHDRSZ_SHORT;
        toast_pointer.va_rawsize = data_todo + VARHDRSZ;    /* as if not short */
        toast_pointer.va_extinfo = data_todo;
    }

Coverage link:
https://coverage.postgresql.org/src/backend/access/common/toast_internals.c.gcov.html

toast_save_datum() is taken only through toast_tuple_externalize(),
and I was assuming first that an EXTERNAL attribute with a minimal
toast_tuple_target of 128B would have been enough to trigger this,
still even a minimal bound is not enough to trigger
heap_toast_insert_or_update(), which heap_prepare_insert() would call
if:
- the hardcoded check on TOAST_TUPLE_THRESHOLD is reached, which of
course would not happen.
- tuple is marked with HEAP_HASEXTERNAL, something that happens only
if fill_val() finds an external varlena (aka VARATT_IS_EXTERNAL),
and that does not happen for SHORT varlenas.

The insertion of short varlenas in TOAST is old enough to vote and it
assumed as supported when reading external on-disk TOAST datums, still
it's amazing to see that there no in-core coverage for it.

Anyway, There is an action item here.  I don't see a SQL pattern that
would trigger this code path on HEAD (perhaps I lack the imagination
to do so), so the only alternative I can think of is the introduction
of a C function in regress.c to force our way through, calling
directly toast_save_datum() or something equivalent, so as
toast_save_datum() is tested with more varlena patterns.  That would
be something similar to what we do for indirect TOAST tuples.

Another possibility is to assume that this code is dead.  But I doubt
that it is possible to claim that as the TOAST code is assumed as
being usable by out-of-core code, so depending on how fancy things are
being done a TOAST relation could use a short varatt for an insert.

Thoughts?
--
Michael


--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company

Re: Dead code with short varlenas in toast_save_datum()

От
Nikhil Kumar Veldanda
Дата:
Hi all,

On Sun, Aug 3, 2025 at 8:16 PM Michael Paquier <michael@paquier.xyz> wrote:
>     if (VARATT_IS_SHORT(dval))
>     {
>         data_p = VARDATA_SHORT(dval);
>         data_todo = VARSIZE_SHORT(dval) - VARHDRSZ_SHORT;
>         toast_pointer.va_rawsize = data_todo + VARHDRSZ;    /* as if not short */
>         toast_pointer.va_extinfo = data_todo;
>     }
>
> Coverage link:
> https://coverage.postgresql.org/src/backend/access/common/toast_internals.c.gcov.html
>

This code path is currently not covered by tests. It can be exercised
with the following SQL pattern

CREATE TABLE temp_tbl (a text, b text);
ALTER TABLE temp_tbl SET (toast_tuple_target = 128);
ALTER TABLE temp_tbl ALTER COLUMN a SET STORAGE EXTERNAL;
ALTER TABLE temp_tbl ALTER COLUMN b SET STORAGE EXTERNAL;
INSERT INTO temp_tbl values(repeat('a', 4000), repeat('a', 120));

--
Nikhil Veldanda



Re: Dead code with short varlenas in toast_save_datum()

От
Nikita Malakhov
Дата:
Hi,

Nikhil, thank you! This case should be added to regressions.

On Tue, Aug 5, 2025 at 8:26 PM Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com> wrote:
Hi all,

On Sun, Aug 3, 2025 at 8:16 PM Michael Paquier <michael@paquier.xyz> wrote:
>     if (VARATT_IS_SHORT(dval))
>     {
>         data_p = VARDATA_SHORT(dval);
>         data_todo = VARSIZE_SHORT(dval) - VARHDRSZ_SHORT;
>         toast_pointer.va_rawsize = data_todo + VARHDRSZ;    /* as if not short */
>         toast_pointer.va_extinfo = data_todo;
>     }
>
> Coverage link:
> https://coverage.postgresql.org/src/backend/access/common/toast_internals.c.gcov.html
>

This code path is currently not covered by tests. It can be exercised
with the following SQL pattern

CREATE TABLE temp_tbl (a text, b text);
ALTER TABLE temp_tbl SET (toast_tuple_target = 128);
ALTER TABLE temp_tbl ALTER COLUMN a SET STORAGE EXTERNAL;
ALTER TABLE temp_tbl ALTER COLUMN b SET STORAGE EXTERNAL;
INSERT INTO temp_tbl values(repeat('a', 4000), repeat('a', 120));

--
Nikhil Veldanda




--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company

Re: Dead code with short varlenas in toast_save_datum()

От
Michael Paquier
Дата:
On Tue, Aug 05, 2025 at 10:26:03AM -0700, Nikhil Kumar Veldanda wrote:
> This code path is currently not covered by tests. It can be exercised
> with the following SQL pattern
>
> CREATE TABLE temp_tbl (a text, b text);
> ALTER TABLE temp_tbl SET (toast_tuple_target = 128);
> ALTER TABLE temp_tbl ALTER COLUMN a SET STORAGE EXTERNAL;
> ALTER TABLE temp_tbl ALTER COLUMN b SET STORAGE EXTERNAL;
> INSERT INTO temp_tbl values(repeat('a', 4000), repeat('a', 120));

Ah, thanks, nice one.  I did not consider the trick of using two
attributes to bypass the check when externalizing the tuple.  I'll go
add a test in strings.sql among these lines, with some TOAST slice
scans based on substr().
--
Michael

Вложения

Re: Dead code with short varlenas in toast_save_datum()

От
Michael Paquier
Дата:
On Wed, Aug 06, 2025 at 10:30:54AM +0900, Michael Paquier wrote:
> Ah, thanks, nice one.  I did not consider the trick of using two
> attributes to bypass the check when externalizing the tuple.  I'll go
> add a test in strings.sql among these lines, with some TOAST slice
> scans based on substr().

Done that as of 225ebfe30a1a, with some additions:
- A check on pg_column_compression(), to make sure that the short
varlena path leads to an uncompressed on-disk Datum.
- Some substr() for the values, to cross-check slice reads.
- A query based on the TOAST relation's chunk_seq, making sure that we
have taken toast_save_datum() two times when inserting a tuple with
the two toastable attributes.

The buildfarm is OK with it, as is the latest patch for the 8-byte
TOAST values posted here:
https://www.postgresql.org/message-id/aIyCz4T858Kcm4UU@paquier.xyz

The patch posted there needs a rebase, following the changes in
varatt.h done in e035863c9a04.  Will post a rebase very soon.
--
Michael

Вложения

Re: Dead code with short varlenas in toast_save_datum()

От
Nikhil Kumar Veldanda
Дата:
Hi Michael,

I’m sending a small test-only patch that increases test coverage for
toast_internals.c from 88.5% -> 95.8%.

--
Nikhil Veldanda

Вложения

Re: Dead code with short varlenas in toast_save_datum()

От
Michael Paquier
Дата:
On Sun, Aug 10, 2025 at 03:08:14AM -0700, Nikhil Kumar Veldanda wrote:
> I’m sending a small test-only patch that increases test coverage for
> toast_internals.c from 88.5% -> 95.8%.

(Discussed these ones offline, while looking at the coverage of
toast_internals.c.)

Thanks for compiling a patch to close more the coverage gap.  I'll
look at what you have here.
--
Michael

Вложения

Re: Dead code with short varlenas in toast_save_datum()

От
Michael Paquier
Дата:
On Sun, Aug 10, 2025 at 07:35:13PM +0900, Michael Paquier wrote:
> Thanks for compiling a patch to close more the coverage gap.  I'll
> look at what you have here.

First, thanks for working on that.  That's going to help a lot to make
sure that messing with this area of the code will not break some of
the current assumptions.

So, we have two things here for the rewrite code paths in
toast_save_datum():
1) A SQL test for va_valueid == InvalidOid where we need a new value
that does not conflict with the new and old TOAST tables.
2) An isolation test for the much trickier case where we are going to
reuse a chunk_id.

First, in the SQL test.  The trick where you are using a PLAIN storage
to not allocate a chunk_id on initial storage with a value large
enough to force TOAST on rewrite, while the value is small enough to
fit on a single page, is a nice one.  We could have used a \gset as
well with a toasted value, but that won't change the fact that we
check for a new value allocated.  The location in strings.sql feels
incorrect because this is a rewrite issue, so I have moved the test to
vacuum.sql and applied a slightly-tweaked result.  A second thing I
have added is a test to make sure that the same chunk_id is reused
after the rewrite.  That's also worth tracking and cheap, covering the
non-InvalidOid case.

With the isolation test, the case is different, and it looks like the
test is incomplete: we want to make sure that the new chunk IDs are
the same before and after, but we cannot use \gset in this context.
What I would suggest is to create an intermediate table storing the
contents we want to compare after the CLUSTER, with a CTAS that stores
the primary key of cluster_toast_value_reuse.id and the chunk_id
associated to each row.  Then, after the CLUSTER, we join the pkey
values in the CTAS table and cluster_toast_value_reuse, compare their
chunk IDs and they should match.  The test triggers 29 times the
todo=0 code path, as far as I can see, but we should not need that
many tuples with generate_series(), no?  If the test is written so as
we compare the old and new chunk IDs with the pkey values, the number
of tuples does not matter much, but that would be a bit cheaper to
run.  Could you update the isolation test to do something among these
lines?
--
Michael

Вложения

Re: Dead code with short varlenas in toast_save_datum()

От
Nikhil Kumar Veldanda
Дата:
On Thu, Aug 14, 2025 at 8:32 PM Michael Paquier <michael@paquier.xyz> wrote:
>
>
> First, in the SQL test.  The trick where you are using a PLAIN storage
> to not allocate a chunk_id on initial storage with a value large
> enough to force TOAST on rewrite, while the value is small enough to
> fit on a single page, is a nice one.  We could have used a \gset as
> well with a toasted value, but that won't change the fact that we
> check for a new value allocated.  The location in strings.sql feels
> incorrect because this is a rewrite issue, so I have moved the test to
> vacuum.sql and applied a slightly-tweaked result.  A second thing I
> have added is a test to make sure that the same chunk_id is reused
> after the rewrite.  That's also worth tracking and cheap, covering the
> non-InvalidOid case.
>

Thank you, Michael, for adjusting the change and merging it.

> With the isolation test, the case is different, and it looks like the
> test is incomplete: we want to make sure that the new chunk IDs are
> the same before and after, but we cannot use \gset in this context.
> What I would suggest is to create an intermediate table storing the
> contents we want to compare after the CLUSTER, with a CTAS that stores
> the primary key of cluster_toast_value_reuse.id and the chunk_id
> associated to each row.  Then, after the CLUSTER, we join the pkey
> values in the CTAS table and cluster_toast_value_reuse, compare their
> chunk IDs and they should match.  The test triggers 29 times the
> todo=0 code path, as far as I can see, but we should not need that
> many tuples with generate_series(), no?  If the test is written so as
> we compare the old and new chunk IDs with the pkey values, the number
> of tuples does not matter much, but that would be a bit cheaper to
> run.  Could you update the isolation test to do something among these
> lines?
> --

Thanks for the guidance. I’ve updated the isolation test to use a CTAS
capturing (id, chunk_id) pre-CLUSTER and added a post-CLUSTER join to
verify the chunk IDs match. I also reduced the number of tuples to 1.

Please find the attached patch for your review. Thanks

--
Nikhil Veldanda

Вложения

Re: Dead code with short varlenas in toast_save_datum()

От
Michael Paquier
Дата:
On Sat, Aug 16, 2025 at 02:34:02AM -0700, Nikhil Kumar Veldanda wrote:
> Thanks for the guidance. I’ve updated the isolation test to use a CTAS
> capturing (id, chunk_id) pre-CLUSTER and added a post-CLUSTER join to
> verify the chunk IDs match. I also reduced the number of tuples to 1.
>
> Please find the attached patch for your review. Thanks

Cool, thanks for the new patch.  The structure looks OK.  There were
two things that itched me a bit:
- Instead of reporting the number of tuples where the chunk IDs do not
match across the rewrites, I have rewritten the query to report the
serial IDs, instead.
- Making sure that the CTAS holds some data, matching with the number
of tuples inserted.  A count(*) was enough for that.
--
Michael

Вложения