Re: Refactoring the checkpointer's fsync request queue
От | Thomas Munro |
---|---|
Тема | Re: Refactoring the checkpointer's fsync request queue |
Дата | |
Msg-id | CAEepm=01Qx5XGyE41aME5w7MXAdidfw-+AssfR6A6OT_z=ba8A@mail.gmail.com обсуждение исходный текст |
Ответ на | Refactoring the checkpointer's fsync request queue (Thomas Munro <thomas.munro@enterprisedb.com>) |
Ответы |
Re: Refactoring the checkpointer's fsync request queue
Re: Refactoring the checkpointer's fsync request queue |
Список | pgsql-hackers |
(Adding Dmitry to CC list.) On Tue, Oct 16, 2018 at 12:02 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > There is one major fly in the ointment: fsyncgate[1]. Originally I > planned to propose a patch on top of that one, but it's difficult -- > both patches move a lot of the same stuff around. Personally, I don't > think it would be a very good idea to back-patch that anyway. It'd be > riskier than the problem it aims to solve, in terms of bugs and > hard-to-foresee portability problems IMHO. I think we should consider > back-patching some variant of Craig Ringer's PANIC patch, and consider > this redesigned approach for future releases. > > So, please find attached the WIP patch that I would like to propose > for PostgreSQL 12, under a separate Commitfest entry. It incorporates > the fsyncgate work by Andres Freund (original file descriptor transfer > POC) and me (many bug fixes and improvements), and the refactoring > work as described above. Here is a rebased version of the patch, post pread()/pwrite(). I have also rewritten the commit message to try to explain the rationale concisely, instead of requiring the reader to consult multiple discussions that jump between lengthy email threads to understand the key points. There is one major problem with this patch: BufferSync(), run in the checkpointer, can deadlock against a backend that holds a buffer lock and is blocked in SendFsyncRequest(). To break this deadlock, we need way out of it on either the sending or receiving side. Here are three ideas: 1. Go back to the current pressure-valve strategy: make the sending side perform the fsync(), if it can't immediately write to the pipe. 2. Offload the BufferSync() work to bgwriter, so the checkpointer can keep draining the pipe. Communication between checkpointer and bgwriter can be fairly easily multiplexed with the pipe draining work. 3. Multiplex the checkpointer's work: Use LWLockConditionalAcquire() when locking buffers, and if that fails, try to drain the pipe, and then fall back to a LWLockTimedAcquire(), drain pipe, repeat loop. I can hear you groan already; that doesn't seem particularly elegant, and there are portability problems implementing LWLockTimedAcquire(): semtimedop() and sem_timedwait() are not available on all platforms (eg macOS). Maybe pthread_timed_condwait() could help (!). I'm not actually sure if idea 1 is correct, and I also don't like that behaviour under pressure, and I think pressure is more likely than in current master (since we gave up sender-side queue compaction, and it seems quite easy to fill up the pipe's buffer). Number 2 appeals to me the most right now, but I haven't looked into the details or tried it yet. Number 3 is a straw man that perhaps helps illustrate the problem but involves taking on unnecessary new problems and seems like a non-starter. So, is there any reason the bgwriter process shouldn't do that work on the checkpointer's behalf? Is there another way? -- Thomas Munro http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления: