Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
От | Thomas Munro |
---|---|
Тема | Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum" |
Дата | |
Msg-id | CA+hUKGLbK6j-jxf=2odz2kuEEwcRxjJiko=4uMtXzktQ4KwzaA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum" (Heikki Linnakangas <hlinnaka@iki.fi>) |
Список | pgsql-bugs |
> We haven't heard of broken control files from the field, so that doesn't > seem to be a problem in practice, at least not yet. Still, I would sleep > better if the control file had more redundancy. For example, have two > copies of it on disk. At startup, read both copies, and if they're both > valid, ignore the one with older timestamp. When updating it, write over > the older copy. That way, if you crash in the middle of updating it, the > old copy is still intact. Seems like a good idea. I somehow doubt that accessing pmem through old school read()/write() interfaces is the future of databases, but ideally this should work correctly, and the dependency is indeed unnecessary if we are prepared to jump through more hoops in just a couple of places. There may also be other benefits. In hindsight, it's a bit strange that we don't have explicit documentation of this requirement. There is some related (and rather dated) discussion of sectors in wal.sgml but nothing to say that we need 512 byte atomic sectors for correct operation, unless I've managed to miss it (even though it's well known among people who read the source code). I experimented with a slightly different approach, attached, and a TAP test to exercise it. Instead of alternating between two copies, I tried writing out both copies every time with a synchronisation barrier in between (the same double-write principle some other database uses to deal with torn data pages). I think it's mostly equivalent to your scheme, though the updates are of course slower. I was thinking that there may be other benefits to having two copies of the "current" version around, for resilience (though perhaps they should be in separate files, not done here), and maybe it's better to avoid having to invent a timestamp scheme. Or maybe the two ideas should be combined: when both CRC checks pass, you could still be more careful which one you choose than I have been here. Or maybe trying to be resilient against handwavy unknown forms of corruption is a waste of time. I'm not proposing anything here, I was just trying out ideas, for discussion.
Вложения
В списке pgsql-bugs по дате отправления: