Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
От | Thomas Munro |
---|---|
Тема | Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum" |
Дата | |
Msg-id | CA+hUKGJg341Rf1zD3Rh3vXUbs_bP+LuOiT-Juj+nOWVr1QUkBg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum" (Heikki Linnakangas <hlinnaka@iki.fi>) |
Ответы |
Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
|
Список | pgsql-bugs |
On Wed, Jun 23, 2021 at 7:46 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Let's just add the lock there. +1, no doubt about that. > Now, that leaves the question with pg_control. That's a different > situation. It doesn't rely on read() and write() being atomic across > processes, but on a 512 sector write not being torn on power failure. > How strong is that guarantee? It used to be common wisdom with hard > drives, and it was carried over to SSDs although I'm not sure if it was > ever strictly speaking guaranteed. ... Right, it's always been tacit, no standard relevant to userspace mentions any of this AFAIK. > ... What about the new kid on the block: > Persistent Memory? I found this article: > https://lwn.net/Articles/686150/. So at hardware level, Persistent > Memory only guarantees atomicity at cache line level (64 bytes). To > provide the traditional 512 byte sector atomicity, there's a feature in > Linux called BTT. Perhaps we should add a note to the docs that you > should enable that. Right, also called sector mode. I don't know enough about that to comment really, but... if my google-fu is serving me, you can't actually use interesting sector sizes like 8KB (you have to choose 512 or 4096 bytes), so you'll have to pay for *two* synthetic atomic page schemes: BTT and our full page writes. That makes me wonder... if you need to leave full page writes on anyway, maybe it would be a better trade-off to do double writes of our special atomic files (relmapper files and control file) so that we could safely turn BTT off and avoid double-taxation for relation data. Just a thought. No pmem experience here, I could be way off. > We haven't heard of broken control files from the field, so that doesn't > seem to be a problem in practice, at least not yet. Still, I would sleep > better if the control file had more redundancy. For example, have two > copies of it on disk. At startup, read both copies, and if they're both > valid, ignore the one with older timestamp. When updating it, write over > the older copy. That way, if you crash in the middle of updating it, the > old copy is still intact. +1, with a flush in between so that only one can be borked no matter how the storage works. It is interesting how few reports there are on the mailing list of a control file CRC check failures though, if I'm searching for the right thing[1]. [1] https://www.postgresql.org/search/?m=1&q=calculated+CRC+checksum+does+not+match+value+stored+in+file&l=&d=-1&s=r
В списке pgsql-bugs по дате отправления: