Обсуждение: Postgresql 8.4.1 segfault, backtrace

Поиск
Список
Период
Сортировка

Postgresql 8.4.1 segfault, backtrace

От
Richard Neill
Дата:
Dear All,

I've just upgraded from 8.4.0 to 8.4.1 because of a segfault in 8.4, and
we've found that this is still happening repeatedly in 8.4.1. We're in a
  bit of a bind, as this is a production system, and we get segfaults
every few hours.

[It's a testament to how good the postgres crash recovery is that, with
a reasonably small value of checkpoint_segments = 4, recovery happens in
30 seconds, and the warehouse systems seem to continue OK].


The version I'm using is 8.4.1, in the source package provided for
Ubuntu Karmic, compiled by me on a 64-bit server (running Ubuntu Jaunty).

I'm not sufficiently expert to debug it very far, but I wonder whether
the following info from GDB would help one of the hackers here (I've
trimmed out the uninteresting bits):

------------
$ gdb /usr/lib/postgresql/8.4/bin/postgres core.200909030901
GNU gdb 6.8-debian

This GDB was configured as "x86_64-linux-gnu"...

Core was generated by `postgres: fensys fswcs [local] startup
                              '.
Program terminated with signal 11, Segmentation fault.
[New process 14965]
#0  RelationCacheInitializePhase2 () at relcache.c:2654
2654                    if (relation->rd_rel->relhasrules &&
relation->rd_rules == NULL)
(gdb) bt
#0  RelationCacheInitializePhase2 () at relcache.c:2654
#1  0x00007f61355a1021 in InitPostgres (in_dbname=0x7f613788c610
"fswcs", dboid=0, username=0x7f6137889450 "fensys", out_dbname=0x0) at
postinit.c:576
#2  0x00007f61354dbcc5 in PostgresMain (argc=4, argv=0x7f6137889480,
username=0x7f6137889450 "fensys") at postgres.c:3334
#3  0x00007f61354aefdd in ServerLoop () at postmaster.c:3447
#4  0x00007f61354afecc in PostmasterMain (argc=5, argv=0x7f6137885140)
at postmaster.c:1040
#5  0x00007f61354568ce in main (argc=5, argv=0x7f6137885140) at main.c:188
(gdb) quit
-------------

A few more bits of info:

The backtrace points to line 2654 in relcache.c, in
   RelationCacheInitializePhase2()

There is a NULL dereference of "relation"

  => needNewCacheFile = false
     criticalRelcachesBuilt = true

=> nothing is happening before it enters the failure code block.


I can give you a core dump if anyone would like to see it, but it's 405
MB after bzipping.

One last observation: a dump and restore of the DB seems to prevent it
crashing for about a day.

Thank you for your help,

Richard

Re: Postgresql 8.4.1 segfault, backtrace

От
Tom Lane
Дата:
Richard Neill <rn214@cam.ac.uk> writes:
> I've just upgraded from 8.4.0 to 8.4.1 because of a segfault in 8.4, and
> we've found that this is still happening repeatedly in 8.4.1.

Oh dear.  I just got an off-list report that seems to point to the same
kind of thing.

> The backtrace points to line 2654 in relcache.c, in
>    RelationCacheInitializePhase2()

> There is a NULL dereference of "relation"

>   => needNewCacheFile = false
>      criticalRelcachesBuilt = true

> => nothing is happening before it enters the failure code block.

<spock>Fascinating.</spock>

I think this must mean that corrupt data is being read from the relcache
init file.  The reason a restart fixes it is probably that restart
forcibly removes the old init file, which is good for recovery but not
so good for finding out what's wrong.  Could you modify
RelationCacheInitFileRemove (at the bottom of relcache.c) to rename the
file someplace else instead of deleting it?  And then send me a copy
of the bad file once you have one?

> I can give you a core dump if anyone would like to see it, but it's 405
> MB after bzipping.

Not going to help anyone else anyway, since it's uninterpretable without
a duplicate system.  (If you have a spare machine with the same OS and
the same postgres executables, maybe you could put the core file on that
and let me ssh in to have a look?)

> One last observation: a dump and restore of the DB seems to prevent it
> crashing for about a day.

Do you have any maintenance operations that touch the system catalogs
(like maybe a forced REINDEX)?  Can you correlate the crashes with any
activity of that sort?

BTW, the other reporter claimed that the problem went away after
building with asserts+debug.  I'm not sure I believe that, especially
seeing that you evidently have debug on.  But if you don't have asserts
enabled, please rebuild with them and see if that changes anything.

            regards, tom lane

Re: Postgresql 8.4.1 segfault, backtrace

От
Michael Brown
Дата:
On Thursday 24 September 2009 23:02:15 Michael Brown wrote:
> > I think this must mean that corrupt data is being read from the relcache
> > init file.  The reason a restart fixes it is probably that restart
> > forcibly removes the old init file, which is good for recovery but not
> > so good for finding out what's wrong.  Could you modify
> > RelationCacheInitFileRemove (at the bottom of relcache.c) to rename the
> >  file someplace else instead of deleting it?  And then send me a copy
> > of the bad file once you have one?
>
> I have captured and attached the file as saved-pg_internal.init.bak.

In case it helps, I noticed the following in gdb:

  (gdb) p *(RelIdCacheEnt*)status.curEntry
  $1 = {reloid = 932863600, reldesc = 0x0}

and this reloid is too high to be realistic; we have only just hit the two
million mark for oids in pg_class.  This seems to support your thought that
the relcache init file is corrupt.

Michael

Re: Postgresql 8.4.1 segfault, backtrace

От
Tom Lane
Дата:
Michael Brown <mbrown@fensystems.co.uk> writes:
>> ... (If you have a spare machine with the same OS and
>> the same postgres executables, maybe you could put the core file on that
>> and let me ssh in to have a look?)

[ ssh details ]

Thanks for letting me poke around.  What I found out is that the
hash_seq_search loop in RelationCacheInitializePhase2 is crashing
because it's attempting to examine a hashtable entry that is on the
hashtable's freelist!?  Given that information I think the cause of
the bug is fairly clear:

1. RelationCacheInitializePhase2 loads the rules or trigger descriptions
for some system catalog (actually it must be the latter; we haven't got
any catalogs with rules attached).

2. By chance, a shared-cache-inval flush comes through while it's doing
that, causing all non-open, non-nailed relcache entries to be discarded.
Including, in particular, the one that is "next" according to the
hash_seq_search's status.

3. Now the loop iterates into the freelist, and kaboom.  It will
probably fail to fail on entries that are actually discarded, because
they still have valid pointers in them ... but as soon as it gets to
a never-yet-used freelist entry, it'll do a null dereference.

RelationCacheInitializePhase2 is breaking the rules by assuming that it
can continue to iterate the hash_seq_search after doing something that
might cause a hash entry other than the current one to be discarded.
We can probably fix that without too much trouble, eg by restarting the
loop after an update.

But: the question at this point is why we've never seen such a report
before 8.4.  If this theory is correct, it's been broken for a *long*
time.  I can think of a couple of possible explanations:

A: the problem can only manifest if this loop has work to do for
a relcache entry that is not the last one in its bucket chain.
8.4 might have added more preloaded relcache entries than were there
before.  Or the 8.4 changes in the hash functions might have shuffled
the entries' bucket placement around so that the problem can happen
when it couldn't before.

B: the 8.4 changes in the shared-cache-inval mechanism might have
made it more likely that a freshly started backend could get hit with a
relcache flush request.  I should think that those changes would have
made this *less* likely not more so, so maybe there is an additional
bug lurking in that area.

I shall go and do some further investigation, but at least it's now
clear where to look.  Thanks for the report, and for being so helpful
in providing information!

            regards, tom lane

Re: Postgresql 8.4.1 segfault, backtrace

От
Tom Lane
Дата:
"Michael Brown" <mbrown@fensystems.co.uk> writes:
> I have put in place a temporary workaround on the production system, which
> is to insert a

>     // Pretend that the cache is always invalid
>     fprintf ( stderr, "*** bypassing cache ***\n" );
>     goto read_failed;

I don't think this will actually help --- if anything it exposes you
to the bug more :-(.  Given my current theory, there is not anything
wrong with the init file.  The problem is a sort of race condition
that would be triggered by very high cache-inval traffic during startup
of a new backend.  I looked at the cache inval array in your coredump,
and it looked like there had been a whole bunch of table deletions
happening concurrently with the startup --- "whole bunch" meaning
hundreds if not thousands.  Is there anything in your application
behavior that might encourage a lot of table drops to happen
concurrently?

I'll get you a real fix as soon as I can, but might not be till
tomorrow.

            regards, tom lane

Re: Postgresql 8.4.1 segfault, backtrace

От
Tom Lane
Дата:
"Michael Brown" <mbrown@fensystems.co.uk> writes:
> If temporary table drops count towards this, then yes.

Yeah, they do.

> I could fairly easily change this procedure to truncate rather than drop
> the temporary table, if that would lessen the exposure to the problem.
> Would that be likely to help?

Very probably.  It's not a complete fix but it would probably reduce the
cache inval traffic (and hence the risk) by an order of magnitude.
However, please be prepared to change back after I send you the backend
fix, so you can stress-test it ;-)

> (Alternatively, given that the temporary table usage here is quite
> inelegant, is there a better way to obtain a consistent database snapshot
> across multiple queries without using SERIALIZABLE when inside a PL/pgSQL
> function that has to be marked VOLATILE?)

Maybe you could accumulate the data you need in a local array instead,
but that would be a big rewrite.  A cursor might be a possibility too.

            regards, tom lane

Re: Postgresql 8.4.1 segfault, backtrace

От
Tom Lane
Дата:
I wrote:
> But: the question at this point is why we've never seen such a report
> before 8.4.  If this theory is correct, it's been broken for a *long*
> time.  I can think of a couple of possible explanations:

> A: the problem can only manifest if this loop has work to do for
> a relcache entry that is not the last one in its bucket chain.
> 8.4 might have added more preloaded relcache entries than were there
> before.  Or the 8.4 changes in the hash functions might have shuffled
> the entries' bucket placement around so that the problem can happen
> when it couldn't before.

The latter theory appears to be the correct one: in 8.4, pg_database
is at risk (since it has a trigger) and it shares a hash bucket with
pg_ts_dict.  In versions 8.0-8.3 there is, by pure luck, no hash
collision for vulnerable catalogs.  I checked with variants of

select relname, hashoid(oid)%512 as bucket from pg_class where  (relhasrules or relhastriggers) and relkind in
('r','i')and relnamespace = 11 and hashoid(oid)%512 in (select hashoid(oid)%512 from pg_class where relkind in
('r','i')and relnamespace = 11 group by 1 having count(*)>1); 

which is conservative since it looks at all system catalogs/indexes
whether or not they are part of the preloaded set.

7.4 does show a collision, but since we've not heard reports of this
before, I speculate that it might have some other behavior that
protects it.  The relevant code was certainly a lot different back then.

Interestingly, the bug can no longer be reproduced in CVS HEAD, because
pg_database no longer has a trigger.  We had better fix it anyway of
course, since future hash collisions are unpredictable.  I'm wondering
though whether to bother back-patching further than 8.4.  Thoughts?

> B: the 8.4 changes in the shared-cache-inval mechanism might have
> made it more likely that a freshly started backend could get hit with a
> relcache flush request.  I should think that those changes would have
> made this *less* likely not more so, so maybe there is an additional
> bug lurking in that area.

I thought I'd better check this theory too.  I double-checked the SI
code and can't find any evidence of a problem of that sort.  The
nextMsgNum of a new backend is correctly initialized to maxMsgNum, and
the correct lock is held, so it should work correctly.  I think it's
just that Michael's system has sufficiently high load peaks to sometimes
delay an incoming backend long enough for it to get reset.  There might
be kernel scheduling quirks contributing to the behavior too.

            regards, tom lane

Re: Postgresql 8.4.1 segfault, backtrace

От
"Michael Brown"
Дата:
Tom Lane said:
> I shall go and do some further investigation, but at least it's now
> clear where to look.  Thanks for the report, and for being so helpful in
> providing information!

Thank you!

I have put in place a temporary workaround on the production system, which
is to insert a

    // Pretend that the cache is always invalid
    fprintf ( stderr, "*** bypassing cache ***\n" );
    goto read_failed;

immediately before

    /* check for correct magic number (compatible version) */
    if (fread(&magic, 1, sizeof(magic), fp) != sizeof(magic))
        goto read_failed;
    if (magic != RELCACHE_INIT_FILEMAGIC)
        goto read_failed;

in load_relcache_init_file().  This, I hope, will cause postgres to always
invalidate and rebuild the relcache init file.  The workaround has been in
place for around an hour so far and does not seem to be significantly
impacting upon performance.  If there is anything dangerous about this
workaround, could you let me know?

If you come up with a patch against 8.4.1, we should be able to test it
under production loads almost straight away.

Thanks again,

Michael

Re: Postgresql 8.4.1 segfault, backtrace

От
"Michael Brown"
Дата:
Tom Lane said:
> "Michael Brown" <mbrown@fensystems.co.uk> writes:
>> I have put in place a temporary workaround on the production system,
>> which is to insert a
>
>>     // Pretend that the cache is always invalid
>>     fprintf ( stderr, "*** bypassing cache ***\n" );
>>     goto read_failed;
>
> I don't think this will actually help --- if anything it exposes you to
> the bug more :-(.  Given my current theory, there is not anything wrong
> with the init file.  The problem is a sort of race condition that would
> be triggered by very high cache-inval traffic during startup of a new
> backend.  I looked at the cache inval array in your coredump, and it
> looked like there had been a whole bunch of table deletions happening
> concurrently with the startup --- "whole bunch" meaning
> hundreds if not thousands.  Is there anything in your application
> behavior that might encourage a lot of table drops to happen
> concurrently?

If temporary table drops count towards this, then yes.  We have a
reporting procedure (in PL/pgSQL) that runs every ten seconds.  This
procedure needs to generate entries in two reporting tables.  In order to
obtain a consistent view when running under READ COMMITTED, it creates a
temporary table containing both result sets, then splits the data out into
the two reporting tables.

The temporary table is dropped immediately after use, and it's quite
plausible that this could run into hundreds of temporary table creates and
drops in a single transaction.

I could fairly easily change this procedure to truncate rather than drop
the temporary table, if that would lessen the exposure to the problem.
Would that be likely to help?

(Alternatively, given that the temporary table usage here is quite
inelegant, is there a better way to obtain a consistent database snapshot
across multiple queries without using SERIALIZABLE when inside a PL/pgSQL
function that has to be marked VOLATILE?)

> I'll get you a real fix as soon as I can, but might not be till
> tomorrow.

Thanks!

Michael

Re: Postgresql 8.4.1 segfault, backtrace

От
Heikki Linnakangas
Дата:
Tom Lane wrote:
> 2. By chance, a shared-cache-inval flush comes through while it's doing
> that, causing all non-open, non-nailed relcache entries to be discarded.
> Including, in particular, the one that is "next" according to the
> hash_seq_search's status.

I thought we have catchup interrupts disabled at that point. Where does
the flush come from?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: Postgresql 8.4.1 segfault, backtrace

От
Tom Lane
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> 2. By chance, a shared-cache-inval flush comes through while it's doing
>> that, causing all non-open, non-nailed relcache entries to be discarded.
>> Including, in particular, the one that is "next" according to the
>> hash_seq_search's status.

> I thought we have catchup interrupts disabled at that point. Where does
> the flush come from?

Actual overrun.  Disabling the catchup interrupt certainly can't
improve that.

(Michael's core dump showed that the failed backend was about 7000 SI
messages behind, where the overrun limit is 4K...)

            regards, tom lane

Re: Postgresql 8.4.1 segfault, backtrace

От
Tom Lane
Дата:
I wrote:
> I'll get you a real fix as soon as I can, but might not be till
> tomorrow.

The attached patch (against 8.4.x) fixes the problem as far as I can
tell.  Please test.

            regards, tom lane

Index: src/backend/utils/cache/relcache.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/cache/relcache.c,v
retrieving revision 1.287
diff -c -r1.287 relcache.c
*** src/backend/utils/cache/relcache.c    11 Jun 2009 14:49:05 -0000    1.287
--- src/backend/utils/cache/relcache.c    25 Sep 2009 17:32:02 -0000
***************
*** 1386,1392 ****
       *
       * The data we insert here is pretty incomplete/bogus, but it'll serve to
       * get us launched.  RelationCacheInitializePhase2() will read the real
!      * data from pg_class and replace what we've done here.
       */
      relation->rd_rel = (Form_pg_class) palloc0(CLASS_TUPLE_SIZE);

--- 1386,1394 ----
       *
       * The data we insert here is pretty incomplete/bogus, but it'll serve to
       * get us launched.  RelationCacheInitializePhase2() will read the real
!      * data from pg_class and replace what we've done here.  Note in particular
!      * that relowner is left as zero; this cues RelationCacheInitializePhase2
!      * that the real data isn't there yet.
       */
      relation->rd_rel = (Form_pg_class) palloc0(CLASS_TUPLE_SIZE);

***************
*** 2603,2619 ****
       * rows and replace the fake entries with them. Also, if any of the
       * relcache entries have rules or triggers, load that info the hard way
       * since it isn't recorded in the cache file.
       */
      hash_seq_init(&status, RelationIdCache);

      while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL)
      {
          Relation    relation = idhentry->reldesc;

          /*
           * If it's a faked-up entry, read the real pg_class tuple.
           */
!         if (needNewCacheFile && relation->rd_isnailed)
          {
              HeapTuple    htup;
              Form_pg_class relp;
--- 2605,2635 ----
       * rows and replace the fake entries with them. Also, if any of the
       * relcache entries have rules or triggers, load that info the hard way
       * since it isn't recorded in the cache file.
+      *
+      * Whenever we access the catalogs to read data, there is a possibility
+      * of a shared-inval cache flush causing relcache entries to be removed.
+      * Since hash_seq_search only guarantees to still work after the *current*
+      * entry is removed, it's unsafe to continue the hashtable scan afterward.
+      * We handle this by restarting the scan from scratch after each access.
+      * This is theoretically O(N^2), but the number of entries that actually
+      * need to be fixed is small enough that it doesn't matter.
       */
      hash_seq_init(&status, RelationIdCache);

      while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL)
      {
          Relation    relation = idhentry->reldesc;
+         bool        restart = false;
+
+         /*
+          * Make sure *this* entry doesn't get flushed while we work with it.
+          */
+         RelationIncrementReferenceCount(relation);

          /*
           * If it's a faked-up entry, read the real pg_class tuple.
           */
!         if (relation->rd_rel->relowner == InvalidOid)
          {
              HeapTuple    htup;
              Form_pg_class relp;
***************
*** 2630,2636 ****
               * Copy tuple to relation->rd_rel. (See notes in
               * AllocateRelationDesc())
               */
-             Assert(relation->rd_rel != NULL);
              memcpy((char *) relation->rd_rel, (char *) relp, CLASS_TUPLE_SIZE);

              /* Update rd_options while we have the tuple */
--- 2646,2651 ----
***************
*** 2639,2660 ****
              RelationParseRelOptions(relation, htup);

              /*
!              * Also update the derived fields in rd_att.
               */
!             relation->rd_att->tdtypeid = relp->reltype;
!             relation->rd_att->tdtypmod = -1;    /* unnecessary, but... */
!             relation->rd_att->tdhasoid = relp->relhasoids;

              ReleaseSysCache(htup);
          }

          /*
           * Fix data that isn't saved in relcache cache file.
           */
          if (relation->rd_rel->relhasrules && relation->rd_rules == NULL)
              RelationBuildRuleLock(relation);
          if (relation->rd_rel->relhastriggers && relation->trigdesc == NULL)
              RelationBuildTriggers(relation);
      }

      /*
--- 2654,2710 ----
              RelationParseRelOptions(relation, htup);

              /*
!              * Check the values in rd_att were set up correctly.  (We cannot
!              * just copy them over now: formrdesc must have set up the
!              * rd_att data correctly to start with, because it may already
!              * have been copied into one or more catcache entries.)
               */
!             Assert(relation->rd_att->tdtypeid == relp->reltype);
!             Assert(relation->rd_att->tdtypmod == -1);
!             Assert(relation->rd_att->tdhasoid == relp->relhasoids);

              ReleaseSysCache(htup);
+
+             /* relowner had better be OK now, else we'll loop forever */
+             if (relation->rd_rel->relowner == InvalidOid)
+                 elog(ERROR, "invalid relowner in pg_class entry for \"%s\"",
+                      RelationGetRelationName(relation));
+
+             restart = true;
          }

          /*
           * Fix data that isn't saved in relcache cache file.
+          *
+          * relhasrules or relhastriggers could possibly be wrong or out of
+          * date.  If we don't actually find any rules or triggers, clear the
+          * local copy of the flag so that we don't get into an infinite loop
+          * here.  We don't make any attempt to fix the pg_class entry, though.
           */
          if (relation->rd_rel->relhasrules && relation->rd_rules == NULL)
+         {
              RelationBuildRuleLock(relation);
+             if (relation->rd_rules == NULL)
+                 relation->rd_rel->relhasrules = false;
+             restart = true;
+         }
          if (relation->rd_rel->relhastriggers && relation->trigdesc == NULL)
+         {
              RelationBuildTriggers(relation);
+             if (relation->trigdesc == NULL)
+                 relation->rd_rel->relhastriggers = false;
+             restart = true;
+         }
+
+         /* Release hold on the relation */
+         RelationDecrementReferenceCount(relation);
+
+         /* Now, restart the hashtable scan if needed */
+         if (restart)
+         {
+             hash_seq_term(&status);
+             hash_seq_init(&status, RelationIdCache);
+         }
      }

      /*

Re: Postgresql 8.4.1 segfault, backtrace

От
Tom Lane
Дата:
I wrote:
> Interestingly, the bug can no longer be reproduced in CVS HEAD, because
> pg_database no longer has a trigger.  We had better fix it anyway of
> course, since future hash collisions are unpredictable.  I'm wondering
> though whether to bother back-patching further than 8.4.  Thoughts?

I have been poking at this some more and have confirmed that there
doesn't seem to be a crash risk before 8.4 with respect to the
next-hashtable-scan-entry problem.  However, I have also confirmed that
it is possible for the *current* relcache entry to get freed by sinval
reset, because the loop in RelationCacheInitializePhase2 doesn't bother
to increment the entry's reference count while working with it.  This is
not a risk for nailed relations of course, but it is a hazard for rels
with triggers.  If this happens, RelationBuildTriggers will build a
TriggerDesc structure and then store its pointer into an already-freed
Relation struct.  At the very least this represents a permanent memory
leak in CacheMemoryContext; but the scary thought is that the Relation
struct's memory might have already been recycled for another purpose,
in which case we have a memory clobber.  So I'm of the opinion that we
need to back-patch all the way.  Working on it now.

            regards, tom lane

Re: Postgresql 8.4.1 segfault, backtrace

От
Tom Lane
Дата:
Richard Neill <rn214@hermes.cam.ac.uk> writes:
> The good news is that the patch has now been in place for 5 days, and,
> despite some very high loading, it has survived without a single crash.
> I'd venture to say that this issue is now fixed.

Great, thanks for the followup.

            regards, tom lane

Re: Postgresql 8.4.1 segfault, backtrace

От
Richard Neill
Дата:
Dear Tom,

Thanks for this, and sorry for not replying earlier. We finally obtained
a window to deploy this patch on the real (rather busy!) production
system as of last Saturday evening.

The good news is that the patch has now been in place for 5 days, and,
despite some very high loading, it has survived without a single crash.

I'd venture to say that this issue is now fixed.

Best wishes,

Richard




Tom Lane wrote:
> I wrote:
>> I'll get you a real fix as soon as I can, but might not be till
>> tomorrow.
>
> The attached patch (against 8.4.x) fixes the problem as far as I can
> tell.  Please test.
>
>             regards, tom lane
>
>
>
> ------------------------------------------------------------------------
>
> Index: src/backend/utils/cache/relcache.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/utils/cache/relcache.c,v
> retrieving revision 1.287
> diff -c -r1.287 relcache.c
> *** src/backend/utils/cache/relcache.c    11 Jun 2009 14:49:05 -0000    1.287
> --- src/backend/utils/cache/relcache.c    25 Sep 2009 17:32:02 -0000
> ***************
> *** 1386,1392 ****
>        *
>        * The data we insert here is pretty incomplete/bogus, but it'll serve to
>        * get us launched.  RelationCacheInitializePhase2() will read the real
> !      * data from pg_class and replace what we've done here.
>        */
>       relation->rd_rel = (Form_pg_class) palloc0(CLASS_TUPLE_SIZE);
>
> --- 1386,1394 ----
>        *
>        * The data we insert here is pretty incomplete/bogus, but it'll serve to
>        * get us launched.  RelationCacheInitializePhase2() will read the real
> !      * data from pg_class and replace what we've done here.  Note in particular
> !      * that relowner is left as zero; this cues RelationCacheInitializePhase2
> !      * that the real data isn't there yet.
>        */
>       relation->rd_rel = (Form_pg_class) palloc0(CLASS_TUPLE_SIZE);
>
> ***************
> *** 2603,2619 ****
>        * rows and replace the fake entries with them. Also, if any of the
>        * relcache entries have rules or triggers, load that info the hard way
>        * since it isn't recorded in the cache file.
>        */
>       hash_seq_init(&status, RelationIdCache);
>
>       while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL)
>       {
>           Relation    relation = idhentry->reldesc;
>
>           /*
>            * If it's a faked-up entry, read the real pg_class tuple.
>            */
> !         if (needNewCacheFile && relation->rd_isnailed)
>           {
>               HeapTuple    htup;
>               Form_pg_class relp;
> --- 2605,2635 ----
>        * rows and replace the fake entries with them. Also, if any of the
>        * relcache entries have rules or triggers, load that info the hard way
>        * since it isn't recorded in the cache file.
> +      *
> +      * Whenever we access the catalogs to read data, there is a possibility
> +      * of a shared-inval cache flush causing relcache entries to be removed.
> +      * Since hash_seq_search only guarantees to still work after the *current*
> +      * entry is removed, it's unsafe to continue the hashtable scan afterward.
> +      * We handle this by restarting the scan from scratch after each access.
> +      * This is theoretically O(N^2), but the number of entries that actually
> +      * need to be fixed is small enough that it doesn't matter.
>        */
>       hash_seq_init(&status, RelationIdCache);
>
>       while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL)
>       {
>           Relation    relation = idhentry->reldesc;
> +         bool        restart = false;
> +
> +         /*
> +          * Make sure *this* entry doesn't get flushed while we work with it.
> +          */
> +         RelationIncrementReferenceCount(relation);
>
>           /*
>            * If it's a faked-up entry, read the real pg_class tuple.
>            */
> !         if (relation->rd_rel->relowner == InvalidOid)
>           {
>               HeapTuple    htup;
>               Form_pg_class relp;
> ***************
> *** 2630,2636 ****
>                * Copy tuple to relation->rd_rel. (See notes in
>                * AllocateRelationDesc())
>                */
> -             Assert(relation->rd_rel != NULL);
>               memcpy((char *) relation->rd_rel, (char *) relp, CLASS_TUPLE_SIZE);
>
>               /* Update rd_options while we have the tuple */
> --- 2646,2651 ----
> ***************
> *** 2639,2660 ****
>               RelationParseRelOptions(relation, htup);
>
>               /*
> !              * Also update the derived fields in rd_att.
>                */
> !             relation->rd_att->tdtypeid = relp->reltype;
> !             relation->rd_att->tdtypmod = -1;    /* unnecessary, but... */
> !             relation->rd_att->tdhasoid = relp->relhasoids;
>
>               ReleaseSysCache(htup);
>           }
>
>           /*
>            * Fix data that isn't saved in relcache cache file.
>            */
>           if (relation->rd_rel->relhasrules && relation->rd_rules == NULL)
>               RelationBuildRuleLock(relation);
>           if (relation->rd_rel->relhastriggers && relation->trigdesc == NULL)
>               RelationBuildTriggers(relation);
>       }
>
>       /*
> --- 2654,2710 ----
>               RelationParseRelOptions(relation, htup);
>
>               /*
> !              * Check the values in rd_att were set up correctly.  (We cannot
> !              * just copy them over now: formrdesc must have set up the
> !              * rd_att data correctly to start with, because it may already
> !              * have been copied into one or more catcache entries.)
>                */
> !             Assert(relation->rd_att->tdtypeid == relp->reltype);
> !             Assert(relation->rd_att->tdtypmod == -1);
> !             Assert(relation->rd_att->tdhasoid == relp->relhasoids);
>
>               ReleaseSysCache(htup);
> +
> +             /* relowner had better be OK now, else we'll loop forever */
> +             if (relation->rd_rel->relowner == InvalidOid)
> +                 elog(ERROR, "invalid relowner in pg_class entry for \"%s\"",
> +                      RelationGetRelationName(relation));
> +
> +             restart = true;
>           }
>
>           /*
>            * Fix data that isn't saved in relcache cache file.
> +          *
> +          * relhasrules or relhastriggers could possibly be wrong or out of
> +          * date.  If we don't actually find any rules or triggers, clear the
> +          * local copy of the flag so that we don't get into an infinite loop
> +          * here.  We don't make any attempt to fix the pg_class entry, though.
>            */
>           if (relation->rd_rel->relhasrules && relation->rd_rules == NULL)
> +         {
>               RelationBuildRuleLock(relation);
> +             if (relation->rd_rules == NULL)
> +                 relation->rd_rel->relhasrules = false;
> +             restart = true;
> +         }
>           if (relation->rd_rel->relhastriggers && relation->trigdesc == NULL)
> +         {
>               RelationBuildTriggers(relation);
> +             if (relation->trigdesc == NULL)
> +                 relation->rd_rel->relhastriggers = false;
> +             restart = true;
> +         }
> +
> +         /* Release hold on the relation */
> +         RelationDecrementReferenceCount(relation);
> +
> +         /* Now, restart the hashtable scan if needed */
> +         if (restart)
> +         {
> +             hash_seq_term(&status);
> +             hash_seq_init(&status, RelationIdCache);
> +         }
>       }
>
>       /*