Re: Block-level CRC checks
От | Gregory Stark |
---|---|
Тема | Re: Block-level CRC checks |
Дата | |
Msg-id | 87wsgsz0dx.fsf@oxford.xeocode.com обсуждение исходный текст |
Ответ на | Re: Block-level CRC checks (Paul Schlie <schlie@comcast.net>) |
Ответы |
Re: Block-level CRC checks
|
Список | pgsql-hackers |
Paul Schlie <schlie@comcast.net> writes: > Tom Lane wrote: >> Paul Schlie writes: >>> - yes, if you're willing to compute true CRC's as opposed to simpler >>> checksums, which may be worth the price if in fact many/most data >>> check failures are truly caused by single bit errors somewhere in the >>> chain, >> >> FWIW, not one of the corrupted-data problems I've investigated has ever >> looked like a single-bit error. So the theoretical basis for using a >> CRC here seems pretty weak. I doubt we'd even consider automatic repair >> attempts anyway. > > - although I accept that you may be correct in your assessment that most > errors are in fact multi-bit; I've seen bad memory in a SCSI controller cause single-bit errors in storage. It was quite confusing since the symptom was syntax errors in the C code we were compiling on the server. The sysadmin actually caught it reliably corrupting a block of source text written out and read back. I've also seen single-bit errors caused by bad memory in a network interface. *Twice*. Particularly nasty since the CRC on TCP/IP packets is only 16-bit so a large enough ftp transfer would eventually finish despite the packet loss but with the occasional bits flipped. In these days of SAN/NAS and SCSI over IP that's pretty scary... Several cases on list have come down to "filesystem secretly replaces entire block of data with Folger's Crystals(tm) -- let's see if the database notices". Any checksum would help in that case but I wouldn't discount single bit errors either. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
В списке pgsql-hackers по дате отправления: