Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
От | Thomas Munro |
---|---|
Тема | Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows |
Дата | |
Msg-id | CA+hUKGK4omtUqvyP9rE2u=LtDUAH8WvzAp+zq9d2JG99ZajxwA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
|
Список | pgsql-bugs |
On Tue, Oct 24, 2023 at 4:06 AM Robert Haas <robertmhaas@gmail.com> wrote: > But within mdtruncate(), we've got more than one problem, I think. > mdnblocks() is a problem because of the reason that you mention, but > register_dirty_segment() doesn't look totally safe either, because it > can call RegisterSyncRequest() which, in a standalone backend, can > call RememberSyncRequest(). But that is already enkludgified thusly: /* * XXX: The checkpointer needs to add entries to the pending ops table * when absorbing fsync requests. That is done within a critical * section, which isn't usually allowed, but we make an exception. It * means that there's a theoretical possibility that you run out of * memory while absorbing fsync requests, which leads to a PANIC. * Fortunately the hash table is small so that's unlikely to happen in * practice. */ pendingOpsCxt = AllocSetContextCreate(TopMemoryContext, "Pending ops context", ALLOCSET_DEFAULT_SIZES); MemoryContextAllowInCriticalSection(pendingOpsCxt, true); > In general, it seems like it would be a lot nicer if we were doing a > lot less stuff inside the critical section here. So I think you're > right that we need some refactoring. Maybe smgr_prepare_truncate() and > smgr_execute_truncate() or something like that. I wonder if we could > actually register the dirty segment in the "prepare" phase - is it bad > if we register a dirty segment before actually dirtying it? And maybe > even CacheInvalidateSmgr() could be done at that stage? It seems > pretty clear that dropping the dirty buffers and actually truncating > the relation on disk need to happen after we've entered the critical > section, because if we fail after doing the former, we've thrown away > dirty data in anticipation of performing an operation that didn't > happen, and if we fail when attempting the latter, primaries and > standbys diverge and the originally-reported bug on this thread > happens. But we'd like to move as much other stuff as we can out of > that critical section. Hmm, yeah it seems like that direction would be a nice improvement, as long as we are sure that the fsync request can't be processed too soon.
В списке pgsql-bugs по дате отправления: