Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
Дата
Msg-id 201209202355.05332.andres@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.  (Marko Tiikkaja <pgmail@joh.to>)
Список pgsql-hackers
On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > Btw, I played with this some more on Saturday and I think, while
> > definitely a bad bug, the actual consequences aren't as bad as at least
> > I initially feared.
> > 
> > Fake relcache entries are currently set in 3 scenarios during recovery:
> > 1. removal of ALL_VISIBLE in heapam.c
> > 2. incomplete splits and incomplete deletions in nbtxlog.c
> > 3. incomplete splits in ginxlog.c
> > [ #1 doesn't really hurt in 9.1, and the others are low probability ]
> 
> OK, that explains why we've not seen a blizzard of trouble reports.
> Still seems like a good idea to fix it ASAP, though.
Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user in the 
IRC Channel that had symptoms matching this bug.

Situation was that he started to get very high IO and xid wraparound shutdown 
warnings due to never finishing and not canceleable autovacuums. After some 
investigation it turned out that btree indexes were processed at that time. We 
found they had cyclic btpo_next pointers leading to an endless loop in 
_bt_pagedel.
We solved the issue by forcing leftsib = P_NONE inside the
while (P_ISDELETED(opaque) || opaque->btpo_next != target)
which let a queue DROP INDEX get the necessary locks.

Unfortuantely this was on a busy production system with a nearing shutdown, so 
not much was kept for further diagnosis.

After this bug was discovered I asked the user and indeed they previously 
shutdown the database twice in quick succession during heavy activity with -m 
immediate which could exactly lead to such a problem due to incompletely 
processed page splits.

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Assigning NULL to a record variable
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Assigning NULL to a record variable