Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
От | Michael Paquier |
---|---|
Тема | Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows |
Дата | |
Msg-id | ZR3qvrYULJWaUnBK@paquier.xyz обсуждение исходный текст |
Ответ на | Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
|
Список | pgsql-bugs |
On Thu, Oct 05, 2023 at 10:12:27AM +1300, Thomas Munro wrote: > On Thu, Oct 5, 2023 at 3:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm too lazy to check the commit log right now, but I think >> we did implement a fix for that (ie, flush dirty pages even >> if we anticipate them going away due to truncation). But as >> Laurenz says, v10 is out of support and possibly didn't get >> that fix. Even if it did, you'd need to be running one of >> the last minor releases, because this wasn't very long ago. > > This thread seems to be saying otherwise: > > https://www.postgresql.org/message-id/flat/2348.1544474335%40sss.pgh.pa.us Yeah, that's not been solved. I've put my mind on this problem for a few hours last May, just before PGCon, and there was an argument that marking all the pages as dirty is kind of a waste of resources as it would create WAL for data that's going to be gone a little bit later as an effect of the truncate, leading to an extra burst of I/O especially for large truncatoins. FWIW, I think that I'd be personally OK with using this method. At least that's safe, simple, backpatchable and it does not require any new magic. I know that there are voices that argued against this method, but here we are N years later, so perhaps we should just do that on HEAD at least. >> In the end though, the *real* problem here is running on a >> platform that randomly disallows writes to disk. There's only >> so much that Postgres can possibly do about unreliability of the >> underlying platform. I would never run a production database on >> Windows, because it's just too prone to that sort of BS. > > It's surprising that ftruncate() AKA chsize() is able to fail like > this (I am not a Windows user but AFAIR that sharing stuff obstructs > stuff like open, unlink, rename, so it surprises me to see it come up > with ftruncate, since we must already have made it past the open > stage). Hmm, the documentation is scant, but I know from my attempts > to use large files that chsize() is probably some kind of wrapper > around SetEndOfFile() or similar, and that is documented as failing if > someone has the file mapped. I don't know why someone would have the > file mapped, though. (shrug) > But as for what we should do about it, PANIC (as suggested by several > people) seems better than corruption, if we're not going to write some > kind of resilience? How else are we supposed to deal with "this > shouldn't happen, and if it does we're hosed?" A PANIC may be OK for this specific syscall and would be better, but the problematic area is larger than that as we'd still finish with a corruption as long as there's an ERROR or a FATAL between the moment the buffers (potentially dirty, with live-still-dead-in-memory tuples on disk) are discarded and the moment the truncation fails. Another method discussed is the use of a critical section (I recall that there were some pallocs in this area, actually, but got nothing on my notes about that...). -- Michael
Вложения
В списке pgsql-bugs по дате отправления: