Re: [PATCH] Verify Checksums during Basebackups
От | David Steele |
---|---|
Тема | Re: [PATCH] Verify Checksums during Basebackups |
Дата | |
Msg-id | 1a230b05-6cfc-2557-20cc-09e7a176230c@pgmasters.net обсуждение исходный текст |
Ответ на | Re: [PATCH] Verify Checksums during Basebackups (Michael Banck <michael.banck@credativ.de>) |
Ответы |
Re: [PATCH] Verify Checksums during Basebackups
|
Список | pgsql-hackers |
Hi Michael, On 3/23/18 5:36 AM, Michael Banck wrote: > Am Donnerstag, den 22.03.2018, 12:22 -0400 schrieb David Steele: >> >> + if (phdr->pd_checksum != checksum) >> >> I've attached a patch that adds basic retry functionality. It's not >> terrible efficient since it rereads the entire buffer for any block >> error. A better way is to keep a bitmap for each block in the buffer, >> then on retry compare bitmaps. If the block is still bad, report it. >> If the block was corrected moved on. If a block was good before but is >> bad on retry it can be ignored. > > I have to admit I find it a bit convoluted and non-obvious on first > reading, but I'll try to check it out some more. Yeah, I think I was influenced too much by how pgBackRest does things, which doesn't work as well here. Attached is a simpler version. > I think it would be very useful if we could come up with a testcase > showing this problem, but I guess this will be quite hard to hit > reproducibly, right? This was brought up by Robert in [1] when discussing validating checksums during backup. I don't know of any way to reproduce this issue but it seems perfectly possible, if highly unlikely. >> + ereport(WARNING, >> + (errmsg("checksum verification failed in file " >> >> I'm worried about how verbose this warning could be since there are >> 131,072 blocks per segment. It's unlikely to have that many block >> errors, but users do sometimes put files in PGDATA which look like they >> should be validated. Since these warnings all go to the server log it >> could get pretty bad. > > We only verify checksums of files in tablespaces, and I don't think > dropping random files in those is supported in any way, as opposed to > maybe the top-level PGDATA directory. So I would say that this is not a > real concern. Perhaps, but a very corrupt file is still going to spew lots of warnings into the server log. >> We should either stop warning after the first failure, or aggregate the >> failures for a file into a single message. > > I agree that major corruption could make the whole output blow up but I > would prefer to keep this feature simple for now, which implies possibly > printing out a lot of WARNING or maybe just stopping after the first > one (or first few, dunno). In my experience actual block errors are relatively rare, so there aren't likely to be more than a few in a file. More common are overwritten or transposed files, rogue files, etc. These produce a lot of output. Maybe stop after five? Regards, -- -David david@pgmasters.net [1] https://www.postgresql.org/message-id/CA%2BTgmobHd%2B-yVJHofSWg%3Dg%2B%3DA3EiCN2wsAiEyj7dj1hhevNq9Q%40mail.gmail.com
Вложения
В списке pgsql-hackers по дате отправления: