Re: checkpointer continuous flushing
От | Amit Kapila |
---|---|
Тема | Re: checkpointer continuous flushing |
Дата | |
Msg-id | CAA4eK1LS303x6Fq425Q9guAiyJgrD_r7PFMOb2LZK9+AT+Gg9A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: checkpointer continuous flushing (Andres Freund <andres@anarazel.de>) |
Список | pgsql-hackers |
On Tue, Jan 12, 2016 at 7:24 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-01-12 19:17:49 +0530, Amit Kapila wrote:
> > Why can't we do it at larger intervals (relative to total amount of writes)?
> > To explain, what I have in mind, let us assume that checkpoint interval
> > is longer (10 mins) and in the mean time all the writes are being done
> > by bgwriter
>
> But that's not the scenario with the regression here, so I'm not sure
> why you're bringing it up?
>
> And if we're flushing significant portion of the writes, how does that
> avoid the performance problem pointed out two messages upthread? Where
> sorting leads to flushing highly contended buffers together, leading to
> excessive wal flushing?
>
> On 2016-01-12 19:17:49 +0530, Amit Kapila wrote:
> > Why can't we do it at larger intervals (relative to total amount of writes)?
> > To explain, what I have in mind, let us assume that checkpoint interval
> > is longer (10 mins) and in the mean time all the writes are being done
> > by bgwriter
>
> But that's not the scenario with the regression here, so I'm not sure
> why you're bringing it up?
>
> And if we're flushing significant portion of the writes, how does that
> avoid the performance problem pointed out two messages upthread? Where
> sorting leads to flushing highly contended buffers together, leading to
> excessive wal flushing?
>
I think it will avoid that problem, because what I am telling is not-to-sort
the buffers before writing, rather sort the flush requests. If I remember
correctly, the initial patch of Fabien doesn't have sorting at the buffer
level, but still he is able to see the benefits in many cases.
>
> But more importantly, unless you also want to delay the writes
> themselves, leaving that many dirty buffers in the kernel page cache
> will bring back exactly the type of stalls (where the kernel flushes all
> the pending dirty data in a short amount of time) we're trying to avoid
> with the forced flushing. So doing flushes in a large patches is
> something we really fundamentally do *not* want!
>
Could it be because random I/O?
> > which it registers in shared memory so that later checkpoint
> > can perform corresponding fsync's, now when the request queue
> > becomes threshhold size (let us say 1/3rd) full, then we can perform
> > sorting and merging and issue flush hints.
>
> Which means that a significant portion of the writes won't be able to be
> collapsed, since only a random 1/3 of the buffers is sorted together.
>
>
> > Basically, I think this can lead to lesser merging of neighbouring
> > writes, but might not hurt if sync_file_range() API is cheap.
>
> The cost of writing out data doess correspond heavily with the number of
> random writes - which is what you get if you reduce the number of
> neighbouring writes.
>
Yeah, thats right, but I am not sure how much difference it would
create if sorting everything at one short versus if we do that in
batches. In anycase, I am just trying to think out loud to see if we
can find some solution to the regression you have seen above
without disabling sorting altogether for certain cases.
В списке pgsql-hackers по дате отправления: