Обсуждение: Re: Assertion failure in smgr.c when using pg_prewarm with partitioned tables

Поиск
Список
Период
Сортировка
On Thu, May 15, 2025 at 2:22 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>
> Hi,
>
> I encountered an assertion failure when a partitioned table is specified
> as an argument to pg_prewarm. Below are the steps to reproduce the
> issue:
>
> $ pgbench -i -s 1 --partitions=3
> $ psql <<EOF
> CREATE EXTENSION pg_prewarm;
> SELECT pg_prewarm('pgbench_accounts');
> EOF
>
> The following assertion failure occurs:
>
> TRAP: failed Assert("RelFileNumberIsValid(rlocator.relNumber)"), File:
> "smgr.c", Line: 246, PID: 1246282
> postgres: ikeda postgres [local]
> SELECT(ExceptionalCondition+0xbb)[0x55edd16725c1]
> postgres: ikeda postgres [local] SELECT(smgropen+0x5e)[0x55edd145c1ff]
>
>
> It looks like this may have been overlooked in commit 049ef33.
> What do you think?

Yeah, this should be fixed, don't you think that instead of checking
the relnumber is valid, we shall check whether it's a partitioned rel
and give a separate error that prewarm is not supported for
partitioned tables?  And in fact, we can think of supporting this for
the partitioned tables as well in the future, where we can loop
through all the partitions and do prewarm for each of them.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




On 2025/05/15 18:20, Dilip Kumar wrote:
> On Thu, May 15, 2025 at 2:22 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>>
>> Hi,
>>
>> I encountered an assertion failure when a partitioned table is specified
>> as an argument to pg_prewarm. Below are the steps to reproduce the
>> issue:

Thanks for the report!

This assertion failure can also occur when pg_prewarm() is run on objects like
foreign tables, plain views, or other relations that don't have storage - not just
partitioned tables.


>> $ pgbench -i -s 1 --partitions=3
>> $ psql <<EOF
>> CREATE EXTENSION pg_prewarm;
>> SELECT pg_prewarm('pgbench_accounts');
>> EOF
>>
>> The following assertion failure occurs:
>>
>> TRAP: failed Assert("RelFileNumberIsValid(rlocator.relNumber)"), File:
>> "smgr.c", Line: 246, PID: 1246282
>> postgres: ikeda postgres [local]
>> SELECT(ExceptionalCondition+0xbb)[0x55edd16725c1]
>> postgres: ikeda postgres [local] SELECT(smgropen+0x5e)[0x55edd145c1ff]
>>
>>
>> It looks like this may have been overlooked in commit 049ef33.
>> What do you think?
> 
> Yeah, this should be fixed, don't you think that instead of checking
> the relnumber is valid,

+1

How about adding a check to see whether the target relation has storage,
using something like RELKIND_HAS_STORAGE()?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On Thu, May 15, 2025 at 3:14 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2025/05/15 18:20, Dilip Kumar wrote:
> > On Thu, May 15, 2025 at 2:22 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
> >>
> >> Hi,
> >>
> >> I encountered an assertion failure when a partitioned table is specified
> >> as an argument to pg_prewarm. Below are the steps to reproduce the
> >> issue:
>
> Thanks for the report!
>
> This assertion failure can also occur when pg_prewarm() is run on objects like
> foreign tables, plain views, or other relations that don't have storage - not just
> partitioned tables.
>
>
> >> $ pgbench -i -s 1 --partitions=3
> >> $ psql <<EOF
> >> CREATE EXTENSION pg_prewarm;
> >> SELECT pg_prewarm('pgbench_accounts');
> >> EOF
> >>
> >> The following assertion failure occurs:
> >>
> >> TRAP: failed Assert("RelFileNumberIsValid(rlocator.relNumber)"), File:
> >> "smgr.c", Line: 246, PID: 1246282
> >> postgres: ikeda postgres [local]
> >> SELECT(ExceptionalCondition+0xbb)[0x55edd16725c1]
> >> postgres: ikeda postgres [local] SELECT(smgropen+0x5e)[0x55edd145c1ff]
> >>
> >>
> >> It looks like this may have been overlooked in commit 049ef33.
> >> What do you think?
> >
> > Yeah, this should be fixed, don't you think that instead of checking
> > the relnumber is valid,
>
> +1
>
> How about adding a check to see whether the target relation has storage,
> using something like RELKIND_HAS_STORAGE()?

Yeah, that makes more sense.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Thu, May 15, 2025 at 6:50 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Thu, May 15, 2025 at 3:14 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > How about adding a check to see whether the target relation has storage,
> > using something like RELKIND_HAS_STORAGE()?

> Yeah, that makes more sense.

+1.  FWIW, not long ago we fixed a similar Assert failure in
contrib/pg_freespacemap by verifying RELKIND_HAS_STORAGE() before
trying to access the storage (see 4623d7144).  Wondering if there are
other similar issues elsewhere in contrib ...

Thanks
Richard



On Fri, May 16, 2025 at 11:51 AM Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote:
>
> Thank you for your comments!
>
> I updated the patch to use RELKIND_HAS_STORAGE() as done in
> commit 4623d7144. Please see the v2-0001 patch for the changes.
>
> On 2025-05-15 19:57, Richard Guo wrote:
> > +1.  FWIW, not long ago we fixed a similar Assert failure in
> > contrib/pg_freespacemap by verifying RELKIND_HAS_STORAGE() before
> > trying to access the storage (see 4623d7144).  Wondering if there are
> > other similar issues elsewhere in contrib ...
>
> I tested all contrib functions that take regclass arguments (see
> attached test.sql and test_result.txt). The result shows that only
> pg_prewarm() can lead to the assertion failure.

Right, even while I was searching, I see this is used in 3 contrib
modules, amcheck, autoprewarm, and pg_prewarm. amcheck is already
checking for AM type, and Autoprewarm is identifying the relation by
block, so there is no chance of getting the relation which do not have
storage.

>
> However, I found that amcheck's error messages can be misleading
> when run on partitioned indexes.
>
> For example, on the master branch, amcheck (e.g., bt_index_check
> function)
> shows this error:
>
>   > ERROR:  expected "btree" index as targets for verification
>   > DETAIL:  Relation "pgbench_accounts_pkey" is a btree index.
>
> This message says it expects a "btree" index, yet also states the
> relation
> is a "btree" index, which can seem contradictory. The actual issue is
> that
> the index is a btree partitioned index, but this detail is missing,
> causing
> possible confusion.

Yes, I found the error message confusing during testing as well, so it
makes sense to improve it.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




On 2025/05/16 15:33, Dilip Kumar wrote:
> On Fri, May 16, 2025 at 11:51 AM Masahiro Ikeda
> <ikedamsh@oss.nttdata.com> wrote:
>>
>> Thank you for your comments!
>>
>> I updated the patch to use RELKIND_HAS_STORAGE() as done in
>> commit 4623d7144. Please see the v2-0001 patch for the changes.

Thanks for updating the patch!

You added a test for pg_prewarm() with a partitioned table in the TAP tests.
But, wouldn't it be simpler and easier to maintain if we added this
as a regular regression test (i.e., using .sql and .out files) instead of
TAP tests? That should be sufficient for this case?

Also, since the issue was introduced in v17, this patch should be
back-patched to v17, right?


>> On 2025-05-15 19:57, Richard Guo wrote:
>>> +1.  FWIW, not long ago we fixed a similar Assert failure in
>>> contrib/pg_freespacemap by verifying RELKIND_HAS_STORAGE() before
>>> trying to access the storage (see 4623d7144).  Wondering if there are
>>> other similar issues elsewhere in contrib ...
>>
>> I tested all contrib functions that take regclass arguments (see
>> attached test.sql and test_result.txt). The result shows that only
>> pg_prewarm() can lead to the assertion failure.
> 
> Right, even while I was searching, I see this is used in 3 contrib
> modules, amcheck, autoprewarm, and pg_prewarm. amcheck is already
> checking for AM type, and Autoprewarm is identifying the relation by
> block, so there is no chance of getting the relation which do not have
> storage.
> 
>>
>> However, I found that amcheck's error messages can be misleading
>> when run on partitioned indexes.
>>
>> For example, on the master branch, amcheck (e.g., bt_index_check
>> function)
>> shows this error:
>>
>>    > ERROR:  expected "btree" index as targets for verification
>>    > DETAIL:  Relation "pgbench_accounts_pkey" is a btree index.
>>
>> This message says it expects a "btree" index, yet also states the
>> relation
>> is a "btree" index, which can seem contradictory. The actual issue is
>> that
>> the index is a btree partitioned index, but this detail is missing,
>> causing
>> possible confusion.
> 
> Yes, I found the error message confusing during testing as well, so it
> makes sense to improve it.

+1

Regarding the patch, how about also adding a regression test for
amcheck with a partitioned index?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On Mon, May 19, 2025 at 5:18 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>
> Thanks for your work and feedback!
>
> I've updated the patches and added regular regression tests for
> both pg_prewarm and amcheck.

Thanks for updating the patches!

Regarding the 0001 patch:

+CREATE TABLE test_part1 PARTITION OF test FOR VALUES FROM (1) TO (1000);
+INSERT INTO test SELECT generate_series(1, 100);

These lines don't seem necessary for the test. How about removing them?
It would simplify the test and slightly reduce the execution time - though
the time savings are very minimal.

+-- To specify the relation which does not have storage should fail.

This comment could be clearer as:
"pg_prewarm() should fail if the target relation has no storage."

+ /* Check that the storage exists. */

Maybe rephrase to:
"Check that the relation has storage."?

Regarding the 0002 patch:

- errdetail("Relation \"%s\" is a %s index.",
-    RelationGetRelationName(rel), NameStr(((Form_pg_am)
GETSTRUCT(amtuprel))->amname))));
+ errdetail("Relation \"%s\" is a %s %sindex.",
+    RelationGetRelationName(rel), NameStr(((Form_pg_am)
GETSTRUCT(amtuprel))->amname),
+    (rel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) ?
"partitioned " : "")));

Instead of manually building the message, how about using
errdetail_relkind_not_supported()?
It would be more consistent with similar error reporting elsewhere.

> I agree with back-patching v3-0001. I was able to reproduce the issue
> on the REL_17_STABLE branch.

Thanks for the confirmation.

> One concern is that this patch changes
> the error message in production:
>
> * v17.5 (without --enable-cassert)
> > ERROR:  fork "main" does not exist for this relation
>
> * REL_17_STABLE with the v3-0001 patch (without --enable-cassert)
> > ERROR:  relation "test" does not have storage
>
> However, I think preventing the assertion failure should take priority.

Yes.

Regards,

--
Fujii Masao