Re: Enabling Checksums
От | Jeff Davis |
---|---|
Тема | Re: Enabling Checksums |
Дата | |
Msg-id | 1355863743.24766.196.camel@sussancws0025 обсуждение исходный текст |
Ответ на | Re: Enabling Checksums (Simon Riggs <simon@2ndQuadrant.com>) |
Список | pgsql-hackers |
On Tue, 2012-12-18 at 08:17 +0000, Simon Riggs wrote: > I think we should discuss whether we accept my premise? Checksums will > actually detect more errors than we see now, and people will want to > do something about that. Returning to backup is one way of handling > it, but on a busy production system with pressure on, there is > incentive to implement a workaround, not a fix. It's not an easy call > to say "we've got 3 corrupt blocks, so I'm going to take the whole > system offline while I restore from backup". Up until now, my assumption has generally been that, upon finding the corruption, the primary course of action is taking that server down (hopefully you have a good replica), and do some kind of restore or sync a new replica. It sounds like you are exploring other possibilities. > > I suppose we could have a new ReadBufferMaybe function that would only > > be used by a sequential scan; and then just skip over the page if it's > > corrupt, depending on a GUC. That would at least allow sequential scans > > to (partially) work, which might be good enough for some data recovery > > situations. If a catalog index is corrupted, that could just be rebuilt. > > Haven't thought about the details, though. > > Not sure if you're being facetious here or not. No. It was an incomplete thought (as I said), but sincere. > Mild reworking of the > logic for heap page access could cope with a NULL buffer response and > subsequent looping, which would allow us to run pg_dump against a > damaged table to allow data to be saved, keeping file intact for > further analysis. Right. > I'm suggesting we work a little harder than "your block is corrupt" > and give some thought to what the user will do next. Indexes are a > good case, because we can/should report the block error, mark the > index as invalid and then hint that it should be rebuilt. Agreed; this applies to any derived data. I don't think it will be very practical to keep a server running in this state forever, but it might give enough time to reach a suitable maintenance window. Regards,Jeff Davis
В списке pgsql-hackers по дате отправления: