Обсуждение: pgsql-server/src backend/storage/buffer/bufmgr ...

Поиск
Список
Период
Сортировка

pgsql-server/src backend/storage/buffer/bufmgr ...

От
wieck@svr1.postgresql.org (Jan Wieck)
Дата:
CVSROOT:    /cvsroot
Module name:    pgsql-server
Changes by:    wieck@svr1.postgresql.org    04/01/24 16:00:46

Modified files:
    src/backend/storage/buffer: bufmgr.c
    src/backend/utils/misc: guc.c postgresql.conf.sample
    src/include/storage: bufmgr.h

Log message:
    Added GUC variable bgwriter_flush_method controlling the action
    done by the background writer between writing dirty blocks and
    napping.

    none (default)   no action
    sync             bgwriter calls smgrsync() causing a sync(2)

    A global sync() is only good on dedicated database servers, so
    more flush methods should be added in the future.

    Jan


Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
Tom Lane
Дата:
wieck@svr1.postgresql.org (Jan Wieck) writes:
>     Added GUC variable bgwriter_flush_method controlling the action
>     done by the background writer between writing dirty blocks and
>     napping.

>     none (default)   no action
>     sync             bgwriter calls smgrsync() causing a sync(2)

Why would that be useful at all?  I thought the purpose of the bgwriter
was to avoid I/O storms, not provoke them.

            regards, tom lane

Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
Jan Wieck
Дата:
Tom Lane wrote:

> wieck@svr1.postgresql.org (Jan Wieck) writes:
>>     Added GUC variable bgwriter_flush_method controlling the action
>>     done by the background writer between writing dirty blocks and
>>     napping.
>
>>     none (default)   no action
>>     sync             bgwriter calls smgrsync() causing a sync(2)
>
> Why would that be useful at all?  I thought the purpose of the bgwriter
> was to avoid I/O storms, not provoke them.

Calling sync(2) every time the background writer naps means calling it
every couple hundred milliseconds. That can hardly be called an IO
storm, it's more like a constant breeze.

So far nobody bothered to make any other proposal how to cause the
kernel to actually do some writing at all. A lot of people babble about
fsync(), fdatasync() and fadvise and whatnot. A week ago I posted the
proposal for this and got exactly zero response.

Before this, the bgwriter did only write the dirty blocks, so that the
checkpoint (doing the sync() call) still caused all the physical IO to
happen at once. Sure, with the bgwriter doing the major write IO, he'd
know what files have been written to and can do fsync() and fdatasync()
on the. But even with that, the checkpoint doing sync() will be in
danger to cause a lot of unexpected IO from wherenot, making the time
the checkpoint takes totally unpredictable.

The whole point of the bgwriter is to give responsetimes a better
variance, I never claimed that it will improve performance.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
Tom Lane
Дата:
Jan Wieck <JanWieck@Yahoo.com> writes:
> So far nobody bothered to make any other proposal how to cause the
> kernel to actually do some writing at all. A lot of people babble about
> fsync(), fdatasync() and fadvise and whatnot. A week ago I posted the
> proposal for this and got exactly zero response.

As I've said before, I think we need to find a way to stop using sync()
altogether --- we have to move to fsync or O_SYNC and variants.  sync
has simply got the wrong API.

Let me give an example: you write a bunch of stuff and then call sync().
Suppose the kernel is unable to write some of those blocks --- it gets
a hard I/O error, or doesn't realize it's out of disk space until the
write is attempted, or whatever.  (I think this is what happened to
Chris K-L last night.)  Is the sync call going to tell you about the
problem?  No, it is not.  If you are lucky you will get an error return
from the next operation you try on a file descriptor associated with the
failed blocks.  But by that time you've probably already written a
checkpoint record to WAL claiming that those writes were all done
successfully.  Finding out about the failures after the checkpoint is
completed is too late --- you're screwed, especially if a crash happens
before you can do anything about it.

> The whole point of the bgwriter is to give responsetimes a better
> variance, I never claimed that it will improve performance.

I want to use it to improve reliability, by getting rid of our
dependence on sync().  The bgwriter can afford to wait for writes
to occur, so it should be able to use fsync or even O_SYNC.

            regards, tom lane

Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
"Marc G. Fournier"
Дата:
On Sat, 24 Jan 2004, Tom Lane wrote:

> Jan Wieck <JanWieck@Yahoo.com> writes:
> > So far nobody bothered to make any other proposal how to cause the
> > kernel to actually do some writing at all. A lot of people babble about
> > fsync(), fdatasync() and fadvise and whatnot. A week ago I posted the
> > proposal for this and got exactly zero response.
>
> As I've said before, I think we need to find a way to stop using sync()
> altogether --- we have to move to fsync or O_SYNC and variants.  sync
> has simply got the wrong API.
>
> Let me give an example: you write a bunch of stuff and then call sync().
> Suppose the kernel is unable to write some of those blocks --- it gets
> a hard I/O error, or doesn't realize it's out of disk space until the
> write is attempted, or whatever.  (I think this is what happened to
> Chris K-L last night.)  Is the sync call going to tell you about the
> problem?  No, it is not.  If you are lucky you will get an error return
> from the next operation you try on a file descriptor associated with the
> failed blocks.  But by that time you've probably already written a
> checkpoint record to WAL claiming that those writes were all done
> successfully.  Finding out about the failures after the checkpoint is
> completed is too late --- you're screwed, especially if a crash happens
> before you can do anything about it.

Stupid question here, and I just checked postgresql.conf to make sure it
wasn't something I overlooked ... why don't we have a 'minfree' setting
for disk space?  Its not like this is a rare occurance thing, running out
of disk space ...

Personally, what I'd expect would be that the postmaster process monitors
this, and if below a certain threshold, send out a 'close connections' to
the postgres processes and refuse future connections with an 'out of
space' warning ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
Jan Wieck
Дата:
Tom Lane wrote:

> Jan Wieck <JanWieck@Yahoo.com> writes:

>> The whole point of the bgwriter is to give responsetimes a better
>> variance, I never claimed that it will improve performance.
>
> I want to use it to improve reliability, by getting rid of our
> dependence on sync().  The bgwriter can afford to wait for writes
> to occur, so it should be able to use fsync or even O_SYNC.

Agreed, that would be our long term strategy. And chances are that the
63 lines of code I added today for a functionality that is turned off by
default will not completely screw up that plan.

But as I see it, there is not even half of a proposal for all that yet.
And people have response time spike problems caused by the checkpointer
today. At least that is what I heard from the folks who where at our BOF
in New York. Those people will not mind if the option we give them in
7.5 is replaced with something better in 8.0 again. But they mind a lot
if we give them nothing because what we can do now is not optimal.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
Bruce Momjian
Дата:
Jan Wieck wrote:
> Tom Lane wrote:
>
> > wieck@svr1.postgresql.org (Jan Wieck) writes:
> >>     Added GUC variable bgwriter_flush_method controlling the action
> >>     done by the background writer between writing dirty blocks and
> >>     napping.
> >
> >>     none (default)   no action
> >>     sync             bgwriter calls smgrsync() causing a sync(2)
> >
> > Why would that be useful at all?  I thought the purpose of the bgwriter
> > was to avoid I/O storms, not provoke them.
>
> Calling sync(2) every time the background writer naps means calling it
> every couple hundred milliseconds. That can hardly be called an IO
> storm, it's more like a constant breeze.

Have you tested this option?  It seems like sub-second sync would kill
performance.

> So far nobody bothered to make any other proposal how to cause the
> kernel to actually do some writing at all. A lot of people babble about
> fsync(), fdatasync() and fadvise and whatnot. A week ago I posted the
> proposal for this and got exactly zero response.

I assumed all Unixes flush dirty pages at least every 30 seconds, so if
checkpoints are every 2-3 minutes, most of the dirty pages should
already be flushed.

Perhaps instead of tieing sync to the background writer sleeps, we should
have a sync_frequency that could be set to sync every 15 or 30 seconds.
Is there any value in doing it more frequently than that?

> Before this, the bgwriter did only write the dirty blocks, so that the
> checkpoint (doing the sync() call) still caused all the physical IO to
> happen at once. Sure, with the bgwriter doing the major write IO, he'd
> know what files have been written to and can do fsync() and fdatasync()
> on the. But even with that, the checkpoint doing sync() will be in
> danger to cause a lot of unexpected IO from wherenot, making the time
> the checkpoint takes totally unpredictable.
>
> The whole point of the bgwriter is to give responsetimes a better
> variance, I never claimed that it will improve performance.

Uh, our goal is better performance overall.  If this new options causes
dismal performance when enabled, who cares how fast the checkpoints are? :-)

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Jan Wieck <JanWieck@Yahoo.com> writes:
> > So far nobody bothered to make any other proposal how to cause the
> > kernel to actually do some writing at all. A lot of people babble about
> > fsync(), fdatasync() and fadvise and whatnot. A week ago I posted the
> > proposal for this and got exactly zero response.
>
> As I've said before, I think we need to find a way to stop using sync()
> altogether --- we have to move to fsync or O_SYNC and variants.  sync
> has simply got the wrong API.
>
> Let me give an example: you write a bunch of stuff and then call sync().
> Suppose the kernel is unable to write some of those blocks --- it gets
> a hard I/O error, or doesn't realize it's out of disk space until the
> write is attempted, or whatever.  (I think this is what happened to
> Chris K-L last night.)  Is the sync call going to tell you about the
> problem?  No, it is not.  If you are lucky you will get an error return
> from the next operation you try on a file descriptor associated with the
> failed blocks.  But by that time you've probably already written a
> checkpoint record to WAL claiming that those writes were all done
> successfully.  Finding out about the failures after the checkpoint is
> completed is too late --- you're screwed, especially if a crash happens
> before you can do anything about it.

If sync failes (kernel to disk write failes) we have a hardware failure,
and we don't pretend to recover from that, though it would be nice to
know sooner so we can exit.  One idea I floated around was to
open/write/fsync/close a temporary file after sync in the hope that it
would happen after the sync completes because the fsync would be at the
end of the disk flush queue.  However, tagged queueing could reorder
those, but hopefully it would catch a disk error before we recycle the
WAL files.


>
> > The whole point of the bgwriter is to give responsetimes a better
> > variance, I never claimed that it will improve performance.
>
> I want to use it to improve reliability, by getting rid of our
> dependence on sync().  The bgwriter can afford to wait for writes
> to occur, so it should be able to use fsync or even O_SYNC.

But I always wonder how to do that while allowing the reordering of
writes done by the kernel and disk drive, and good background writer
performance of moving pages out of the buffer cache.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
Bruce Momjian
Дата:
Jan Wieck wrote:
> Tom Lane wrote:
>
> > Jan Wieck <JanWieck@Yahoo.com> writes:
>
> >> The whole point of the bgwriter is to give responsetimes a better
> >> variance, I never claimed that it will improve performance.
> >
> > I want to use it to improve reliability, by getting rid of our
> > dependence on sync().  The bgwriter can afford to wait for writes
> > to occur, so it should be able to use fsync or even O_SYNC.
>
> Agreed, that would be our long term strategy. And chances are that the
> 63 lines of code I added today for a functionality that is turned off by
> default will not completely screw up that plan.

We don't give people options that are useless.  Are you sure this option
is useful?  "Hey, it makes the system so slow, checkpoints are now 90%
faster!"  :-)

> But as I see it, there is not even half of a proposal for all that yet.
> And people have response time spike problems caused by the checkpointer
> today. At least that is what I heard from the folks who where at our BOF
> in New York. Those people will not mind if the option we give them in
> 7.5 is replaced with something better in 8.0 again. But they mind a lot
> if we give them nothing because what we can do now is not optimal.

We need more discussion/proof before we add something like this, even if
it is only for one release.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> As I've said before, I think we need to find a way to stop using sync()
>> altogether --- we have to move to fsync or O_SYNC and variants.  sync
>> has simply got the wrong API.

> If sync failes (kernel to disk write failes) we have a hardware failure,
> and we don't pretend to recover from that,

Not necessarily --- it could be out-of-disk-space, on at least some
filesystems.  More to the point, the important thing is not to commit a
checkpoint record to WAL indicating that everything is good, when
everything is not good.  As long as we don't checkpoint we have some
hope of recovering automatically via WAL replay.

> One idea I floated around was to
> open/write/fsync/close a temporary file after sync in the hope that it
> would happen after the sync completes because the fsync would be at the
> end of the disk flush queue.

"In the hope"?  We already have a guess-and-hope approach to this, and
it will never be any better as long as we use sync(), because sync() is
fundamentally the wrong operation.  It doesn't tell you when the I/O is
done, and it doesn't tell you whether the I/O was done successfully, and
there is no possibility of working around that fundamental lack of
information except to stop using it.

            regards, tom lane

Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> As I've said before, I think we need to find a way to stop using sync()
> >> altogether --- we have to move to fsync or O_SYNC and variants.  sync
> >> has simply got the wrong API.
>
> > If sync failes (kernel to disk write failes) we have a hardware failure,
> > and we don't pretend to recover from that,
>
> Not necessarily --- it could be out-of-disk-space, on at least some
> filesystems.  More to the point, the important thing is not to commit a

I assume the operating system is already allocating file system space
during the write, and the sync is only forcing it to disk.  If the
operating system doesn't allocate file system space it couldn't properly
work, no?  In fact, it is my understanding that the file system is in
RAM and the disk is just backing store, basically.

> checkpoint record to WAL indicating that everything is good, when
> everything is not good.  As long as we don't checkpoint we have some
> hope of recovering automatically via WAL replay.
>
> > One idea I floated around was to
> > open/write/fsync/close a temporary file after sync in the hope that it
> > would happen after the sync completes because the fsync would be at the
> > end of the disk flush queue.
>
> "In the hope"?  We already have a guess-and-hope approach to this, and
> it will never be any better as long as we use sync(), because sync() is
> fundamentally the wrong operation.  It doesn't tell you when the I/O is
> done, and it doesn't tell you whether the I/O was done successfully, and
> there is no possibility of working around that fundamental lack of
> information except to stop using it.

I assumed this would be a closer guess-and-hope approach, and again, how
could sync fail unless it is a hardware problem?  NFS?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> As I've said before, I think we need to find a way to stop using sync()
> >> altogether --- we have to move to fsync or O_SYNC and variants.  sync
> >> has simply got the wrong API.
>
> > If sync failes (kernel to disk write failes) we have a hardware failure,
> > and we don't pretend to recover from that,
>
> Not necessarily --- it could be out-of-disk-space, on at least some
> filesystems.  More to the point, the important thing is not to commit a
> checkpoint record to WAL indicating that everything is good, when
> everything is not good.  As long as we don't checkpoint we have some
> hope of recovering automatically via WAL replay.
>
> > One idea I floated around was to
> > open/write/fsync/close a temporary file after sync in the hope that it
> > would happen after the sync completes because the fsync would be at the
> > end of the disk flush queue.
>
> "In the hope"?  We already have a guess-and-hope approach to this, and
> it will never be any better as long as we use sync(), because sync() is
> fundamentally the wrong operation.  It doesn't tell you when the I/O is
> done, and it doesn't tell you whether the I/O was done successfully, and
> there is no possibility of working around that fundamental lack of
> information except to stop using it.

I guess my major problem with moving away from sync is similar to the
reason we don't do raw devices --- sync is best done in the kernel and
disk driver that knows more about how to do it efficiently.  I haven't
see any non-sync solution with performance similar to sync().  However,
we are going to have to write one for win32, so we can test things once
we are done and then decide.

I think the win32 solution will be to record modified files in a central
location, and have the checkpoint open/fsync(_commit), perhaps it all
happening at the same time in different threads so it isn't serialized.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> Not necessarily --- it could be out-of-disk-space, on at least some
>> filesystems.  More to the point, the important thing is not to commit a

> I assume the operating system is already allocating file system space
> during the write, and the sync is only forcing it to disk.

Not so --- as was pointed out later in the thread, neither NFS nor AFS
work that way, and there could be other cases.

In any case, I don't subscribe to the idea that we can just abdicate all
responsibility in case of hardware problems.  Yes, we do rely on a disk
to keep storing information once it's accepted it, but that doesn't mean
that it's okay to ignore write-failure reports.  We are failing to hold
up our end of the deal if we do that.

            regards, tom lane

Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> Not necessarily --- it could be out-of-disk-space, on at least some
> >> filesystems.  More to the point, the important thing is not to commit a
>
> > I assume the operating system is already allocating file system space
> > during the write, and the sync is only forcing it to disk.
>
> Not so --- as was pointed out later in the thread, neither NFS nor AFS
> work that way, and there could be other cases.
>
> In any case, I don't subscribe to the idea that we can just abdicate all
> responsibility in case of hardware problems.  Yes, we do rely on a disk
> to keep storing information once it's accepted it, but that doesn't mean
> that it's okay to ignore write-failure reports.  We are failing to hold
> up our end of the deal if we do that.

Well, in normal usage, applications do the write and expect the data to
be pushed to disk later, so I don't see us ignoring write() failures,
but rather push to disk.  Isn't a separate fsync after sync closer to
reliable?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
Jan Wieck
Дата:
Bruce Momjian wrote:

> I guess my major problem with moving away from sync is similar to the
> reason we don't do raw devices --- sync is best done in the kernel and
> disk driver that knows more about how to do it efficiently.  I haven't
> see any non-sync solution with performance similar to sync().  However,
> we are going to have to write one for win32, so we can test things once
> we are done and then decide.

We are not doing raw devices because we don't do tablespaces. I mean in
the method where a tablespace for the OS is basically a huge container.
For every little table, PostgreSQL creates a separate file and scatters
the data all over the place because it is too dumb to group allocations
of multiple blocks together. As a consequence, it is short of file
descriptors and needs the kernel at least to reorder it's write requests
so that they are not done in the clueless order they are issued.

Now doing fsync() or fdatasync() of possibly dozens of files in a row,
forcing the kernel to do one scattered file after another, letting the
disk heads dance like step-chicken on a hot tin ... that will be an
improvement, oh boy. However safe this will be, nobody will use it
because MySQL is soooo much faster!


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
Tom Lane
Дата:
Jan Wieck <JanWieck@Yahoo.com> writes:
> Now doing fsync() or fdatasync() of possibly dozens of files in a row,
> forcing the kernel to do one scattered file after another, letting the
> disk heads dance like step-chicken on a hot tin ... that will be an
> improvement, oh boy.

I'm not convinced it would be so bad.  Normally you'd only be issuing
those operations at checkpoint time, and if the bgwriter has been doing
its job and pushing out dirty pages to the kernel, the kernel should
have been busily writing pages all along since the last checkpoint.
In theory the fsync would not force all that many new writes (certainly
lots less than a once-per-checkpoint sync does).  Also keep in mind that
fsync is not defined as "write this page NOW".  It is defined as "let me
know when you've written it".  The kernel still has flexibility in
scheduling its writes, and may choose to write other pages along the
way.

Perhaps more to the point: all this is predicated on an assumption no
longer particularly valid, which is that the kernel's ideas about disk
write scheduling matter at all.  A decent SCSI disk drive will pre-empt
the kernel's ideas anyway by absorbing as many pending writes as it can
and then doing its own write scheduling.  fsync won't affect the drive's
choices in the least, only allow us to find out when the drive is done.

            regards, tom lane

Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
Chris Watson
Дата:
On Jan 27, 2004, at 6:25 PM, Tom Lane wrote:
> Perhaps more to the point: all this is predicated on an assumption no
> longer particularly valid, which is that the kernel's ideas about disk
> write scheduling matter at all.  A decent SCSI disk drive will pre-empt
> the kernel's ideas anyway by absorbing as many pending writes as it can
> and then doing its own write scheduling.  fsync won't affect the
> drive's
> choices in the least, only allow us to find out when the drive is done.
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

Perhaps totally unrelated, as i've only read the last couple of posts
on this, but what does Postfix (the MTA) do? How does it handle this?
I trust Wietse implicitly to DTRT. If it were me I would ask him how he
handles the writes or at least check the Postfix src. *shrug* Just an
idea.


Chris Watson
M.M.
Bestor G. Brown #433
Wichita, KS
AIM: BSDUNIX44

Вложения

Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Jan Wieck <JanWieck@Yahoo.com> writes:
> > Now doing fsync() or fdatasync() of possibly dozens of files in a row,
> > forcing the kernel to do one scattered file after another, letting the
> > disk heads dance like step-chicken on a hot tin ... that will be an
> > improvement, oh boy.
>
> I'm not convinced it would be so bad.  Normally you'd only be issuing
> those operations at checkpoint time, and if the bgwriter has been doing

Agreed, I don't have a problem with fsync() during checkpoint instead of
sync.  I had problems with fsync from the background writer and
performance.  Let me post ideas to hackers & win32 list.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pgsql-server/src backend/storage/buffer/bufmgr ...

От
Bruce Momjian
Дата:
Jan Wieck wrote:
> Tom Lane wrote:
>
> > wieck@svr1.postgresql.org (Jan Wieck) writes:
> >>     Added GUC variable bgwriter_flush_method controlling the action
> >>     done by the background writer between writing dirty blocks and
> >>     napping.
> >
> >>     none (default)   no action
> >>     sync             bgwriter calls smgrsync() causing a sync(2)
> >
> > Why would that be useful at all?  I thought the purpose of the bgwriter
> > was to avoid I/O storms, not provoke them.
>
> Calling sync(2) every time the background writer naps means calling it
> every couple hundred milliseconds. That can hardly be called an IO
> storm, it's more like a constant breeze.

I talked to Jan about the idea of sync on every background writer sleep.
He is going to study the issue and report back.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073