Re: hung backends stuck in spinlock heavy endless loop
От | Heikki Linnakangas |
---|---|
Тема | Re: hung backends stuck in spinlock heavy endless loop |
Дата | |
Msg-id | 54B91E68.7030400@vmware.com обсуждение исходный текст |
Ответ на | Re: hung backends stuck in spinlock heavy endless loop (Merlin Moncure <mmoncure@gmail.com>) |
Ответы |
Re: hung backends stuck in spinlock heavy endless loop
|
Список | pgsql-hackers |
On 01/16/2015 04:05 PM, Merlin Moncure wrote: > On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan <pg@heroku.com> wrote: >> On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >>> Running this test on another set of hardware to verify -- if this >>> turns out to be a false alarm which it may very well be, I can only >>> offer my apologies! I've never had a new drive fail like that, in >>> that manner. I'll burn the other hardware in overnight and report >>> back. > > huh -- well possibly. not. This is on a virtual machine attached to a > SAN. It ran clean for several (this is 9.4 vanilla, asserts off, > checksums on) hours then the starting having issues: > > [cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING: page > verification failed, calculated checksum 59143 but expected 59137 at > character 20 The calculated checksum is suspiciously close to to the expected one. It could be coincidence, but the previous checksum warning you posted was also quite close: > [cds2 18347 2015-01-15 15:58:29.955 CST 1779]WARNING: page > verification failed, calculated checksum 28520 but expected 28541 I believe the checksum algorithm is supposed to mix the bits quite thoroughly, so that a difference in a single byte in the input will lead to a completely different checksum. However, we add the block number to the checksum last: > /* Mix in the block number to detect transposed pages */ > checksum ^= blkno; > > /* > * Reduce to a uint16 (to fit in the pd_checksum field) with an offset of > * one. That avoids checksums of zero, which seems like a good idea. > */ > return (checksum % 65535) + 1; It looks very much like that a page has for some reason been moved to a different block number. And that's exactly what Peter found out in his investigation too; an index page was mysteriously copied to a different block with identical content. - Heikki
В списке pgsql-hackers по дате отправления: