Re: 9.4 checksum error in recovery with btree index
От | Heikki Linnakangas |
---|---|
Тема | Re: 9.4 checksum error in recovery with btree index |
Дата | |
Msg-id | 5379DDBE.2010703@vmware.com обсуждение исходный текст |
Ответ на | Re: 9.4 checksum error in recovery with btree index (Jeff Janes <jeff.janes@gmail.com>) |
Ответы |
Re: 9.4 checksum error in recovery with btree index
|
Список | pgsql-hackers |
On 05/18/2014 06:30 AM, Jeff Janes wrote: > On Saturday, May 17, 2014, Heikki Linnakangas <hlinnakangas@vmware.com> > wrote: > >> On 05/17/2014 12:28 AM, Jeff Janes wrote: >> >>> More fun with my torn page injection test program on 9.4. >>> >>> 24171 2014-05-16 14:00:44.934 PDT:WARNING: 01000: page verification >>> failed, calculated checksum 21100 but expected 3356 >>> 24171 2014-05-16 14:00:44.934 PDT:CONTEXT: xlog redo split_l: rel >>> 1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright >>> 192 >>> 24171 2014-05-16 14:00:44.934 PDT:LOCATION: PageIsVerified, >>> bufpage.c:145 >>> 24171 2014-05-16 14:00:44.934 PDT:FATAL: XX001: invalid page in block >>> 34666 of relation base/16384/16405 >>> 24171 2014-05-16 14:00:44.934 PDT:CONTEXT: xlog redo split_l: rel >>> 1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright >>> 192 >>> 24171 2014-05-16 14:00:44.934 PDT:LOCATION: ReadBuffer_common, >>> bufmgr.c:483 >>> >>> >>> I've seen this twice now, the checksum failure was both times for the >>> block >>> labelled "next" in the redo record. Is this another case where the block >>> needs to be reinitialized upon replay? >>> >> >> Hmm, it looks like I fumbled the numbering of the backup blocks in the >> b-tree split WAL record (in 9.4). I blame the comments; the comments where >> the record is generated numbers the backup blocks starting from 1, but >> XLR_BKP_BLOCK(x) and RestoreBackupBlock(...) used in replay number them >> starting from 0. >> >> Attached is a patch that I think fixes them. In addition to the >> rnext-reference, clearing the incomplete-split flag in the child page, had >> a similar numbering mishap. >> > > The seems to have fixed it. Okay, thanks, committed. Your torn-page generator seems to be very good at finding bugs - any chance you could publish it? I wonder if it could've caught the similar mishap in the clearing of the incomplete-split flag. I think you'd a checkpoint to begin in the very narrow window between splitting a page and inserting the parent pointer. - Heikki
В списке pgsql-hackers по дате отправления: