Обсуждение: Failed Assert while pgstat_unlink_relation

Поиск
Список
Период
Сортировка

Failed Assert while pgstat_unlink_relation

От
vignesh C
Дата:
Hi,

While reviewing/testing one of the patches I found the following Assert:
#0  0x000055c6312ba639 in pgstat_unlink_relation (rel=0x7fb73bcbac58)
at pgstat_relation.c:161
#1  0x000055c6312bbb5a in pgstat_relation_delete_pending_cb
(entry_ref=0x55c6335563a8) at pgstat_relation.c:839
#2  0x000055c6312b72bc in pgstat_delete_pending_entry
(entry_ref=0x55c6335563a8) at pgstat.c:1124
#3  0x000055c6312be3f1 in pgstat_release_entry_ref (key=...,
entry_ref=0x55c6335563a8, discard_pending=true) at pgstat_shmem.c:523
#4  0x000055c6312bee9a in pgstat_drop_entry
(kind=PGSTAT_KIND_RELATION, dboid=5, objoid=40960) at
pgstat_shmem.c:867
#5  0x000055c6312c034a in AtEOXact_PgStat_DroppedStats
(xact_state=0x55c6334baac8, isCommit=false) at pgstat_xact.c:97
#6  0x000055c6312c0240 in AtEOXact_PgStat (isCommit=false,
parallel=false) at pgstat_xact.c:55
#7  0x000055c630df8bee in AbortTransaction () at xact.c:2861
#8  0x000055c630df94fd in AbortCurrentTransaction () at xact.c:3352

I could reproduce this issue with the following steps:
create table t1(c1 int);
BEGIN;
CREATE TABLE t (c1 int);
CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;
CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;

Regards,
Vignesh



Re: Failed Assert while pgstat_unlink_relation

От
Kyotaro Horiguchi
Дата:
At Mon, 28 Nov 2022 14:01:30 +0530, vignesh C <vignesh21@gmail.com> wrote in 
> Hi,
> 
> While reviewing/testing one of the patches I found the following Assert:
> #0  0x000055c6312ba639 in pgstat_unlink_relation (rel=0x7fb73bcbac58)
> at pgstat_relation.c:161
> #1  0x000055c6312bbb5a in pgstat_relation_delete_pending_cb
> (entry_ref=0x55c6335563a8) at pgstat_relation.c:839
> #2  0x000055c6312b72bc in pgstat_delete_pending_entry
> (entry_ref=0x55c6335563a8) at pgstat.c:1124
> #3  0x000055c6312be3f1 in pgstat_release_entry_ref (key=...,
> entry_ref=0x55c6335563a8, discard_pending=true) at pgstat_shmem.c:523
> #4  0x000055c6312bee9a in pgstat_drop_entry
> (kind=PGSTAT_KIND_RELATION, dboid=5, objoid=40960) at
> pgstat_shmem.c:867
> #5  0x000055c6312c034a in AtEOXact_PgStat_DroppedStats
> (xact_state=0x55c6334baac8, isCommit=false) at pgstat_xact.c:97
> #6  0x000055c6312c0240 in AtEOXact_PgStat (isCommit=false,
> parallel=false) at pgstat_xact.c:55
> #7  0x000055c630df8bee in AbortTransaction () at xact.c:2861
> #8  0x000055c630df94fd in AbortCurrentTransaction () at xact.c:3352
> 
> I could reproduce this issue with the following steps:
> create table t1(c1 int);
> BEGIN;
> CREATE TABLE t (c1 int);
> CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;
> CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;

Good catch!

AtEOXact_PgStat_DroppedStats() visits a relation that has been dropped
then wiped (when CLOBBER_FREED_MEMORY) by AtEOXact_RelationCache()
called just before.  Since the relcache content directly pointed from
stats module is lost in this case, the stats side cannot defend itself
from this.

Perhaps RelationDestroyRelation() need to do pgstat_drop_entry() or
AtEOXact_PgStat_DroppedStats() should be called before
AtEOXact_RelationCache(). the latter of which is simpler. I think we
need to test this case, too.

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 8086b857b9..789ff4cc6a 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2838,6 +2838,7 @@ AbortTransaction(void)
                              RESOURCE_RELEASE_BEFORE_LOCKS,
                              false, true);
         AtEOXact_Buffers(false);
+        AtEOXact_PgStat(false, is_parallel_worker); /* reads relcache */
         AtEOXact_RelationCache(false);
         AtEOXact_Inval(false);
         AtEOXact_MultiXact();
@@ -2858,7 +2859,6 @@ AbortTransaction(void)
         AtEOXact_Files(false);
         AtEOXact_ComboCid();
         AtEOXact_HashTables(false);
-        AtEOXact_PgStat(false, is_parallel_worker);
         AtEOXact_ApplyLauncher(false);
         pgstat_report_xact_timestamp(0);
     }

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Failed Assert while pgstat_unlink_relation

От
Andres Freund
Дата:
Hi,

On 2022-11-29 15:42:45 +0900, Kyotaro Horiguchi wrote:
> At Mon, 28 Nov 2022 14:01:30 +0530, vignesh C <vignesh21@gmail.com> wrote in 
> > Hi,
> > 
> > While reviewing/testing one of the patches I found the following Assert:
> > #0  0x000055c6312ba639 in pgstat_unlink_relation (rel=0x7fb73bcbac58)
> > at pgstat_relation.c:161
> > #1  0x000055c6312bbb5a in pgstat_relation_delete_pending_cb
> > (entry_ref=0x55c6335563a8) at pgstat_relation.c:839
> > #2  0x000055c6312b72bc in pgstat_delete_pending_entry
> > (entry_ref=0x55c6335563a8) at pgstat.c:1124
> > #3  0x000055c6312be3f1 in pgstat_release_entry_ref (key=...,
> > entry_ref=0x55c6335563a8, discard_pending=true) at pgstat_shmem.c:523
> > #4  0x000055c6312bee9a in pgstat_drop_entry
> > (kind=PGSTAT_KIND_RELATION, dboid=5, objoid=40960) at
> > pgstat_shmem.c:867
> > #5  0x000055c6312c034a in AtEOXact_PgStat_DroppedStats
> > (xact_state=0x55c6334baac8, isCommit=false) at pgstat_xact.c:97
> > #6  0x000055c6312c0240 in AtEOXact_PgStat (isCommit=false,
> > parallel=false) at pgstat_xact.c:55
> > #7  0x000055c630df8bee in AbortTransaction () at xact.c:2861
> > #8  0x000055c630df94fd in AbortCurrentTransaction () at xact.c:3352
> > 
> > I could reproduce this issue with the following steps:
> > create table t1(c1 int);
> > BEGIN;
> > CREATE TABLE t (c1 int);
> > CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;
> > CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;
> 
> Good catch!
> 
> AtEOXact_PgStat_DroppedStats() visits a relation that has been dropped
> then wiped (when CLOBBER_FREED_MEMORY) by AtEOXact_RelationCache()
> called just before.  Since the relcache content directly pointed from
> stats module is lost in this case, the stats side cannot defend itself
> from this.
> 
> Perhaps RelationDestroyRelation() need to do pgstat_drop_entry() or
> AtEOXact_PgStat_DroppedStats() should be called before
> AtEOXact_RelationCache(). the latter of which is simpler. I think we
> need to test this case, too.

This doesn't strike me as the right fix. What do you think about my patch at
https://postgr.es/m/20221128210908.hyffmljjylj447nu%40awork3.anarazel.de ,
leaving the quibbles around error handling aside?

Afaict it fixes the issue.

Greetings,

Andres Freund



Re: Failed Assert while pgstat_unlink_relation

От
Kyotaro Horiguchi
Дата:
At Thu, 1 Dec 2022 19:23:28 -0800, Andres Freund <andres@anarazel.de> wrote in 
> > AtEOXact_PgStat_DroppedStats() visits a relation that has been dropped
> > then wiped (when CLOBBER_FREED_MEMORY) by AtEOXact_RelationCache()
> > called just before.  Since the relcache content directly pointed from
> > stats module is lost in this case, the stats side cannot defend itself
> > from this.
> > 
> > Perhaps RelationDestroyRelation() need to do pgstat_drop_entry() or
> > AtEOXact_PgStat_DroppedStats() should be called before
> > AtEOXact_RelationCache(). the latter of which is simpler. I think we
> > need to test this case, too.
> 
> This doesn't strike me as the right fix. What do you think about my patch at
> https://postgr.es/m/20221128210908.hyffmljjylj447nu%40awork3.anarazel.de ,
> leaving the quibbles around error handling aside?

Yeah, I didn't like what my patch does...

> Afaict it fixes the issue.

Hmm. I see it works for this specific case, but I don't understand why
it is generally safe.

The in-xact created relation t1 happened to be scanned during the
CREATE RULE and a stats entry is attached. So the stats entry loses t1
at roll-back, then crashes.  Thus, if I understand it correctly, it
seems to me that just unlinking the stats from t1 (when relkind is
changed) works.

But the fix doesn't change the behavior in relkind-not-changing
cases. If an in-xact-created table gets a stats entry then the
relcache entry for t1 is refreshed to a table relation again then the
transaction rolls back, crash will happen for the same reason. I'm not
sure if there is such a case actually.

When I tried to check that behavior further, I found that that
CREATE ROLE is no longer allowed..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Failed Assert while pgstat_unlink_relation

От
Andres Freund
Дата:
Hi,

On 2022-12-05 15:20:55 +0900, Kyotaro Horiguchi wrote:
> The in-xact created relation t1 happened to be scanned during the
> CREATE RULE and a stats entry is attached. So the stats entry loses t1
> at roll-back, then crashes.  Thus, if I understand it correctly, it
> seems to me that just unlinking the stats from t1 (when relkind is
> changed) works.
> 
> But the fix doesn't change the behavior in relkind-not-changing
> cases. If an in-xact-created table gets a stats entry then the
> relcache entry for t1 is refreshed to a table relation again then the
> transaction rolls back, crash will happen for the same reason. I'm not
> sure if there is such a case actually.

We unlink the stats in that case already. see RelationDestroyRelation().


> When I tried to check that behavior further, I found that that
> CREATE ROLE is no longer allowed..

I assume you mean RULE, not ROLE? It should still work in 15.

Greetings,

Andres Freund