Re: Load Distributed Checkpoints, revised patch

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Load Distributed Checkpoints, revised patch
Дата
Msg-id 4672ED66.8090606@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Load Distributed Checkpoints, revised patch  (Alvaro Herrera <alvherre@commandprompt.com>)
Ответы Re: Load Distributed Checkpoints, revised patch  (Alvaro Herrera <alvherre@commandprompt.com>)
Список pgsql-patches
Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
>
>> - The signaling between RequestCheckpoint and bgwriter is a bit tricky.
>> Bgwriter now needs to deal immediate checkpoint requests, like those
>> coming from explicit CHECKPOINT or CREATE DATABASE commands, differently
>> from those triggered by checkpoint_segments. I'm afraid there might be
>> race conditions when a CHECKPOINT is issued at the same instant as
>> checkpoint_segments triggers one. What might happen then is that the
>> checkpoint is performed lazily, spreading the writes, and the CHECKPOINT
>> command has to wait for that to finish which might take a long time. I
>> have not been able to convince myself neither that the race condition
>> exists or that it doesn't.
>
> Isn't it just a matter of having a flag to tell whether the checkpoint
> should be quick or spread out, and have a command set the flag if a
> checkpoint is already running?

Hmm. Thinking about this some more, the core problem is that when
starting the checkpoint, bgwriter needs to read and clear the flag.
Which is not atomic, as the patch stands.

I think we already have a race condition with ckpt_time_warn. The code
to test and clear the flag does this:

>     if (BgWriterShmem->ckpt_time_warn && elapsed_secs < CheckPointWarning)
>         ereport(LOG,
>                 (errmsg("checkpoints are occurring too frequently (%d seconds apart)",
>                         elapsed_secs),
>                  errhint("Consider increasing the configuration parameter \"checkpoint_segments\".")));
>     BgWriterShmem->ckpt_time_warn = false;

In the extremely unlikely event that RequestCheckpoint sets
ckpt_time_warn right before it's cleared, after the test in the
if-statement, the warning is missed. That's a very harmless and
theoretical event, you'd have to run CHECKPOINT (or another command that
triggers a checkpoint) at the same instant that an xlog switch triggers
one, and all that happens is that you don't get the message in the log
while you should. So this is not something to worry about in this case,
but it would be more severe if we had the same problem in deciding if a
checkpoint should be spread out or not.

I think we just have to protect those signaling flags with a lock. It's
not like it's on a critical path, and though we don't know what locks
the callers to RequestCheckpoint hold, as long as we don't acquire any
other locks while holding the new proposed lock, there's no danger of
deadlocks.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

В списке pgsql-patches по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Load Distributed Checkpoints, revised patch
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Load Distributed Checkpoints, revised patch