Re: gcc 4.6 and hot standby

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: gcc 4.6 and hot standby
Дата
Msg-id 23049.1307731137@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: gcc 4.6 and hot standby  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Ответы Re: gcc 4.6 and hot standby  (Alex Hunsaker <badalex@gmail.com>)
Список pgsql-hackers
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
> On 09/06/11 06:58, Alex Hunsaker wrote:
>> Yeah :-). However ill note it looks like its the default compiler for
>> fedora 15, ubuntu natty and debian sid.

> FWIW Ubuntu natty uses gcc 4.5.2, probably just as as well in the light 
> of your findings :-)

I've been able to reproduce this on released Fedora 15, and sure enough
it is a compiler bug.  The problem comes from these fragments of
ReadRecord():

ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt)
{   XLogRecPtr    tmpRecPtr = EndRecPtr;
   if (RecPtr == NULL)       RecPtr = &tmpRecPtr;
   targetRecOff = RecPtr->xrecoff % XLOG_BLCKSZ;
   if (targetRecOff == 0)       tmpRecPtr.xrecoff += pageHeaderSize;
   record = (XLogRecord *) ((char *) readBuf + RecPtr->xrecoff % XLOG_BLCKSZ);

gcc 4.6.0 is assuming that the value it first fetches for RecPtr->xrecoff
is still usable for computing the "record" pointer, despite the possible
intervening update of tmpRecPtr.  That of course leads to "record"
pointing to the wrong place, which results in an incorrect conclusion
that we're looking at an invalid record header, which leads to killing
and restarting the walreceiver.  I haven't traced what happens after
that, but apparently in standby mode we'll come back to ReadRecord with
a record pointer that's already advanced over the page header, else it'd
be an infinite loop.

Note that this means that crash recovery, not only standby mode, is
broken with this compiler.

I've filed a bug report for this:
https://bugzilla.redhat.com/show_bug.cgi?id=712480
but I wouldn't care to hold my breath while waiting for a fix to appear
upstream, let alone propagate to all the distros already using 4.6.0.
I think we need a workaround.

The obvious question to ask here is why are we updating
"tmpRecPtr.xrecoff" and not "RecPtr->xrecoff"?  The latter choice would
be a lot more readable, since the immediately surrounding code doesn't
refer to tmpRecPtr.  I think the idea is to guarantee that the caller's
referenced record pointer (if any) isn't modified, but if RecPtr is not
pointing at tmpRecPtr here, we have got big problems anyway.

So I'm tempted to propose this fix:
   if (targetRecOff == 0)   {       /*        * Can only get here in the continuing-from-prev-page case, because
*XRecOffIsValid eliminated the zero-page-offset case otherwise. Need        * to skip over the new page's header.
*/
 
-       tmpRecPtr.xrecoff += pageHeaderSize;
+       Assert(RecPtr == &tmpRecPtr);
+       RecPtr->xrecoff += pageHeaderSize;       targetRecOff = pageHeaderSize;   }

Another possibility, which might be less ugly, is to delete the above
code entirely and handle the page-header case in the RecPtr == NULL
branch a few lines above.

Comments?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: pg_dump vs malloc
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Small SSI issues