Re: [patch] Fix checksum verification in base backups for zero page headers
От | Michael Banck |
---|---|
Тема | Re: [patch] Fix checksum verification in base backups for zero page headers |
Дата | |
Msg-id | 4b1c16216dcb60e1dad96142d25fd06fb77485bf.camel@credativ.de обсуждение исходный текст |
Ответ на | Re: [patch] Fix checksum verification in base backups for zero page headers (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>) |
Ответы |
Re: [patch] Fix checksum verification in base backups for zero page headers
|
Список | pgsql-hackers |
Hi, Am Mittwoch, den 02.09.2020, 16:50 +0300 schrieb Anastasia Lubennikova: > On 01.09.2020 13:22, Michael Banck wrote: > > as a continuation of [1], I've split-off the zero page header case from > > the last patch, as this one seems less contentious. > > [1] https://commitfest.postgresql.org/28/2308/ > > I've looked through the previous discussion. As far as I got it, most of > the controversy was about online checksums improvements. > > The warning about pd_upper inconsistency that you've added is a good > addition. The patch is a bit messy, though, because a huge code block > was shifted. > > Will it be different, if you just leave > "if (!PageIsNew(page) && PageGetLSN(page) < startptr)" > block as it was, and add > "else if (PageIsNew(page) && !PageIsZero(page))" ? Thanks, that does indeed look better as a patch and I think it's fine as-is for the code as well, I've attached a v2. > While on it, I also have a few questions about the code around: All fair points, but I think those should be handled in another patch, if any. > 1) Maybe we can use other header sanity checks from PageIsVerified() as > well? Or even the function itself. Yeah, I'll keep that in mind. > 2) > /* Reset loop to validate the block again */ > How many times do we try to reread the block? Is one reread enough? Not sure whether more rereads would help, but I think the general direction was to replace this with something better anyway I think (see below). > Speaking of which, 'reread_cnt' name looks confusing to me. I would > expect that this variable contains a number of attempts, not the number > of bytes read. Well, there are other "cnt" variables that keep the number of bytes read in that file, so it does not seem to be out of place for me. A comment might not hurt though. On second glance, it should maybe also be of size_t and not int type, as is the other cnt variable? > If everyone agrees, that for basebackup purpose it's enough to rely on a > single reread, I'm ok with it. > Another approach is to read the page directly from shared buffers to > ensure that the page is fine. This way is more complicated, but should > be almost bullet-proof. Using it we can also check pages with lsn >= > startptr. Right, that's what Andres also advocated for initially I think, and what should be done going forward. > 3) Judging by warning messages, we count checksum failures per file, not > per page, and don't report after a fifth failure. Why so? Is it a > common case that so many pages of one file are corrupted? I think this was a request during the original review, on the basis that if we have more than a few checksum errors, it's likely there is something fundamentally broken and it does not make sense to spam the log with potentially millions of broken checksums. Cheers, Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Вложения
В списке pgsql-hackers по дате отправления: