On Thu, Dec 07, 2000 at 04:35:00PM -0500, Tom Lane wrote:
> Remember that we are already sitting atop hardware that's really
> pretty reliable, despite the carping that's been going on in this
> thread. All that we have to do is detect the infrequent case where a
> block of data didn't get written due to system failure. It's wildly
> pessimistic to think that we might get called on to do so as much as
> once a day (if you are trying to run a reliable database, and are
> suffering power failures once a day, and haven't bought a UPS, you're
> a lost cause). A 32-bit CRC will fail to detect such an error with a
> probability of about 1 in 2^32. So, a 32-bit CRC will have an MBTF of
> 2^32 days, or 11 million years, on the wildly pessimistic side ---
> real installations probably 100 times better. That's plenty for me,
> and improving the odds to 2^64 or 2^128 is not worth any slowdown
> IMHO.
1. Computing a CRC-64 takes only about twice as long as a CRC-32, for 2^32 times the confidence. That's pretty cheap
confidence.
2. I disagree with way the above statistics were computed. That eleven million-year figure gets whittled down pretty
quicklywhen you factor in all the sources of corruption, even without crashes. (Power failures are only one of
manysources of corruption.) They grow with the size and activity of the database. Databases are getting very
largeand busy indeed.
3. Many users clearly hope to be able to pull the plug on their hardware and get back up confidently. While we can't
promisethey won't have to go to their backups, we should at least be equipped to promise, with confidence, that they
willknow whether they need to.
4. For a way to mark the "current final" log entry, you want a lot more confidence, because you read a lot more of
them,and reading beyond the end may cause you to corrupt a currently-valid database, which seems a lot worse than
justusing a corrupted database.
Still, I agree that a 32-bit CRC is better than none at all.
Nathan Myers
ncm@zembu.com