Re: [HACKERS] Checksums by default?
От | Tomas Vondra |
---|---|
Тема | Re: [HACKERS] Checksums by default? |
Дата | |
Msg-id | 34e15a92-0bde-6809-b7ba-0cc1681635ab@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Checksums by default? (Jim Nasby <Jim.Nasby@BlueTreble.com>) |
Список | pgsql-hackers |
On 02/13/2017 02:29 AM, Jim Nasby wrote: > On 2/10/17 6:38 PM, Tomas Vondra wrote: >> And no, backups may not be a suitable solution - the failure happens on >> a standby, and the page (luckily) is not corrupted on the master. Which >> means that perhaps the standby got corrupted by a WAL, which would >> affect the backups too. I can't verify this, though, because the WAL got >> removed from the archive, already. But it's a possibility. > > Possibly related... I've got a customer that periodically has SR replias > stop in their tracks due to WAL checksum failure. I don't think there's > any hardware correlation (they've seen this on multiple machines). > Studying the code, it occurred to me that if there's any bugs in the > handling of individual WAL record sizes or pointers during SR then you > could get CRC failures. So far every one of these occurrences has been > repairable by replacing the broken WAL file on the replica. I've > requested that next time this happens they save the bad WAL. I don't follow. You're talking about WAL checksums, this thread is about data checksums. I'm not seeing any WAL checksum failure, but when the standby attempts to apply the WAL (in particular a Btree/DELETE WAL record), it detects an incorrect data checksum in the underlying table. So either there's a hardware issue, or the heap got corrupted by some preceding WAL. Or maybe one of the tiny gnomes in the CPU got tired and punched the bits wrong. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: