Обсуждение: Wait events for delayed checkpoints
Hi, You can't tell if your checkpointer is spending a lot of time waiting around for flags in delayChkptFlags to clear. Trivial patch to add that. I've managed to see it a few times when checkpointing repeatedly with a heavy pgbench workload. I had to stop and think for a moment about whether these events belong under "WaitEventIPC", "waiting for notification from another process" or under "WaitEventTimeout", "waiting for a timeout to expire". I mean, both? It's using sleep-and-poll instead of (say) a CV due to the economics, we want to make the other side as cheap as possible, so we don't care about making the checkpointer take some micro-naps in this case. I feel like the key point here is that it's waiting for another process to do stuff and unblock it.
Вложения
On Wed, Oct 11, 2023 at 9:13 PM Thomas Munro <thomas.munro@gmail.com> wrote: > You can't tell if your checkpointer is spending a lot of time waiting > around for flags in delayChkptFlags to clear. Trivial patch to add > that. I've managed to see it a few times when checkpointing > repeatedly with a heavy pgbench workload. > > I had to stop and think for a moment about whether these events belong > under "WaitEventIPC", "waiting for notification from another process" > or under "WaitEventTimeout", "waiting for a timeout to expire". I > mean, both? It's using sleep-and-poll instead of (say) a CV due to > the economics, we want to make the other side as cheap as possible, so > we don't care about making the checkpointer take some micro-naps in > this case. I feel like the key point here is that it's waiting for > another process to do stuff and unblock it. IPC seems right to me. Yeah, a timeout is being used, but as you say, that's an implementation detail. +1 for the idea, too. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Oct 12, 2023 at 01:32:29PM -0400, Robert Haas wrote: > IPC seems right to me. Yeah, a timeout is being used, but as you say, > that's an implementation detail. > > +1 for the idea, too. Agreed that timeout makes little sense in this context, and IPC looks correct. + pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_START); do { pg_usleep(10000L); /* wait for 10 msec */ } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids, DELAY_CHKPT_START)); + pgstat_report_wait_end(); HaveVirtualXIDsDelayingChkpt() does immediately a LWLockAcquire() which would itself report a wait event for ProcArrayLock, overwriting this new one, no? -- Michael
Вложения
On Thu, Oct 12, 2023 at 7:09 PM Michael Paquier <michael@paquier.xyz> wrote: > On Thu, Oct 12, 2023 at 01:32:29PM -0400, Robert Haas wrote: > > IPC seems right to me. Yeah, a timeout is being used, but as you say, > > that's an implementation detail. > > > > +1 for the idea, too. > > Agreed that timeout makes little sense in this context, and IPC looks > correct. > > + pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_START); > do > { > pg_usleep(10000L); /* wait for 10 msec */ > } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids, > DELAY_CHKPT_START)); > + pgstat_report_wait_end(); > > HaveVirtualXIDsDelayingChkpt() does immediately a LWLockAcquire() > which would itself report a wait event for ProcArrayLock, overwriting > this new one, no? Ah, right: the wait event should be set and cleared around pg_usleep, not the whole loop. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Oct 13, 2023 at 2:19 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Oct 12, 2023 at 7:09 PM Michael Paquier <michael@paquier.xyz> wrote: > > HaveVirtualXIDsDelayingChkpt() does immediately a LWLockAcquire() > > which would itself report a wait event for ProcArrayLock, overwriting > > this new one, no? > > Ah, right: the wait event should be set and cleared around pg_usleep, > not the whole loop. Duh. Yeah. Pushed like that. Thanks both.