Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
От | Robert Haas |
---|---|
Тема | Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows |
Дата | |
Msg-id | CA+TgmobH07rpdxVnXN6NgUjwK0-K9DpW02LHuE-bx6mFoNHn=Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-bugs |
On Wed, Oct 4, 2023 at 7:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > But as for what we should do about it, PANIC (as suggested by several > > people) seems better than corruption, if we're not going to write some > > kind of resilience? > > Maybe that's an acceptable answer now ... it's not great, but nobody > is in love with any of the other options either. And it would definitely > get DBAs' attention about this misbehavior of their file systems. I and others, including Andres, have been thinking that a PANIC is the right option for some time. Quoth I in https://www.postgresql.org/message-id/CA%2BTgmobwc_Rdaw%2B6TupT4_g9z55JjL%3DvhwpphsQe%3DYmBN0OPDg%40mail.gmail.com some 2 years ago... > As you say, this doesn't fix the problem that truncation might fail. > But as Andres and Sawada-san said, the solution to that is to get rid > of the comments saying that it's OK for truncation to fail and make it > a PANIC. However, I don't think that change needs to be part of this > patch. Even if we do that, we still need to do this. And even if we do > this, we still need to do that. I think the only reasons that I didn't do it at the time where (a) shortage of round tuits and (b) fear of being yelled at. But the comment is wrong, and a critical section is right. I do think that it's nice to be tolerant of bad filesystem behavior when we can. For instance if we try to write() some data to the OS and it fails for some transient reason, it's nice if we can try to write() it again. But there are always going to be cases where that sort of tolerance is not practical. Having PostgreSQL continue to operate when the filesystem isn't operating is a luxury, and we can't afford it in every situation. shared_buffers provides a layer of insulation between the logical act of modifying a buffer and the need to have a system call succeed -- dirtying the buffer is in effect making a note that the write() needs to be done later, instead of actually doing it in the moment. And since the code that actually writes it is checkpoint-aware and write-outs can be retried, we can avoid panicking. But for operations such as creating, removing, or truncating relations, there is no similar, general layer of insulation -- we have no mechanism that allows us to logically do those things now and have them actually happen at the FS level later. Which, to me, seems to mean that we have little choice but to panic if they fail. Otherwise, the primary diverges from any standbys that it has. I also think that's OK. Unreliable filesystems lead to unreliable databases, and it's better to find that out before something really bad happens. Maybe in the future we'll develop more general mechanisms for some of this stuff and maybe that will allow us to avoid panics in more cases, and then we can debate the merits of such changes. But right now, the cost of avoiding a panic here is a corrupted database, and I have to believe that the overwhelming majority of users would think that a corrupted database is worse. -- Robert Haas EDB: http://www.enterprisedb.com
В списке pgsql-bugs по дате отправления: