Обсуждение: We really ought to do something about O_DIRECT and data=journalled on ext4

Поиск
Список
Период
Сортировка

We really ought to do something about O_DIRECT and data=journalled on ext4

От
Josh Berkus
Дата:
Hackers,

Some of you might already be aware that this combination produces a
fatal startup crash in PostgreSQL:

1. Create an Ext3 or Ext4 partition and mount it with data=journal on a
server with linux kernel 2.6.30 or later.
2. Initdb a PGDATA on that partition
3. Start PostgreSQL with the default config from that PGDATA

This was reported a ways back:
https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=567113

To explain: calling O_DIRECT on an ext3 or ext4 partition with
data=journalled causes a crash.  However, recent Linux kernels now
report support for O_DIRECT when we compile PostgreSQL, so we use it by
default.  This results in a "crash by default" situation with new
Linuxes if anyone sets data=journal.

We just encountered this again with another user.  With RHEL6 out now,
this seems likely to become a fairly common crash report.

Apparently, testing for O_DIRECT at compile time isn't adequate.  Ideas?

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> Apparently, testing for O_DIRECT at compile time isn't adequate.  Ideas?

We should wait for the outcome of the discussion about whether to change
the default wal_sync_method before worrying about this.
        regards, tom lane


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Josh Berkus
Дата:
On 11/30/10 7:09 PM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> Apparently, testing for O_DIRECT at compile time isn't adequate.  Ideas?
> 
> We should wait for the outcome of the discussion about whether to change
> the default wal_sync_method before worrying about this.

Are we considering backporting that change?

If so, this would be another argument in favor of changing the default.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Andrew Dunstan
Дата:

On 11/30/2010 10:09 PM, Tom Lane wrote:
> Josh Berkus<josh@agliodbs.com>  writes:
>> Apparently, testing for O_DIRECT at compile time isn't adequate.  Ideas?
> We should wait for the outcome of the discussion about whether to change
> the default wal_sync_method before worrying about this.
>
>             

Tom,

we've just had a significant PGX customer encounter this with the latest 
Postgres on Redhat's freshly released flagship product. Presumably the 
default wal_sync_method will only change prospectively. But this will 
feel to every user out there who encounters it like a bug in our code, 
and it needs attention. It was darn difficult to diagnose, and many 
people will just give up in disgust if they encounter it.

cheers

andrew


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 11/30/2010 10:09 PM, Tom Lane wrote:
>> We should wait for the outcome of the discussion about whether to change
>> the default wal_sync_method before worrying about this.

> we've just had a significant PGX customer encounter this with the latest 
> Postgres on Redhat's freshly released flagship product. Presumably the 
> default wal_sync_method will only change prospectively.

I don't think so.  The fact that Linux is changing underneath us is a
compelling reason for back-patching a change here.  Our older branches
still have to be able to run on modern OS versions.  I'm also fairly
unclear on what you think a fix would look like if it's not effectively
a change in the default.

(Hint: this *will* be changing, one way or another, in Red Hat's version
of 8.4, since that's what RH is shipping in RHEL6.)
        regards, tom lane


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> On 11/30/10 7:09 PM, Tom Lane wrote:
>> Josh Berkus <josh@agliodbs.com> writes:
>>> Apparently, testing for O_DIRECT at compile time isn't adequate.  Ideas?
>> 
>> We should wait for the outcome of the discussion about whether to change
>> the default wal_sync_method before worrying about this.

> Are we considering backporting that change?

> If so, this would be another argument in favor of changing the default.

Well, no, actually it's the same (only) argument.  We'd never consider
back-patching such a change if our hand weren't being forced by kernel
changes :-(

As things stand, though, I think the only thing that's really open for
discussion is how wide to make the scope of the default-change: should
we just do it across the board, or try to limit it to some subset of the
platforms where open_datasync is currently the default.  And that's a
decision that ought to be informed by some performance testing.
        regards, tom lane


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Dimitri Fontaine
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> As things stand, though, I think the only thing that's really open for
> discussion is how wide to make the scope of the default-change: should
> we just do it across the board, or try to limit it to some subset of the
> platforms where open_datasync is currently the default.  And that's a
> decision that ought to be informed by some performance testing.

Maybe I have a distorded view of the situation for having hit the
problem with an ubuntu upgrade, but it really does not look like a
performance item to me.

PANIC:  could not open file "pg_xlog/000000010000000000000001" (log file 0, segment 1): Invalid argument

It took me quite some time to be able to start my development cluster
again and validate some new patch to send to the list.

Now I understand that you want to test the other alternatives before to
choose among those which work, but my opinion is that it should be fixed
in HEAD before next alpha, or even ASAP. It could be that a HINT here
would be enough for contributors not to lose to much time. It would be

HINT: if you're running linux, please try to change wal_sync_method,
open_datasync is not reliable anymore in recent kernels. An example of
trustworthy setting is fdatasync.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Marti Raudsepp
Дата:
On Wed, Dec 1, 2010 at 12:35, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
> PANIC:  could not open file "pg_xlog/000000010000000000000001" (log file 0, segment 1): Invalid argument

+1 I got the same error when trying to get PostgreSQL working on tmpfs
and gave up.

> Now I understand that you want to test the other alternatives before to
> choose among those which work, but my opinion is that it should be fixed
> in HEAD before next alpha, or even ASAP.

It's queued for this month's commitfest, so things are moving.

https://commitfest.postgresql.org/action/patch_view?id=432

Regards,
Marti


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Robert Haas
Дата:
On Wed, Dec 1, 2010 at 12:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> On 11/30/10 7:09 PM, Tom Lane wrote:
>>> Josh Berkus <josh@agliodbs.com> writes:
>>>> Apparently, testing for O_DIRECT at compile time isn't adequate.  Ideas?
>>>
>>> We should wait for the outcome of the discussion about whether to change
>>> the default wal_sync_method before worrying about this.
>
>> Are we considering backporting that change?
>
>> If so, this would be another argument in favor of changing the default.
>
> Well, no, actually it's the same (only) argument.  We'd never consider
> back-patching such a change if our hand weren't being forced by kernel
> changes :-(
>
> As things stand, though, I think the only thing that's really open for
> discussion is how wide to make the scope of the default-change: should
> we just do it across the board, or try to limit it to some subset of the
> platforms where open_datasync is currently the default.  And that's a
> decision that ought to be informed by some performance testing.

If we could get a clear idea of what performance testing needs to be
done, I suspect we could find some people willing to do it.  What do
you think would be useful?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Andrew Dunstan
Дата:

On 11/30/2010 11:17 PM, Tom Lane wrote:
> Andrew Dunstan<andrew@dunslane.net>  writes:
>> On 11/30/2010 10:09 PM, Tom Lane wrote:
>>> We should wait for the outcome of the discussion about whether to change
>>> the default wal_sync_method before worrying about this.
>> we've just had a significant PGX customer encounter this with the latest
>> Postgres on Redhat's freshly released flagship product. Presumably the
>> default wal_sync_method will only change prospectively.
> I don't think so.  The fact that Linux is changing underneath us is a
> compelling reason for back-patching a change here.  Our older branches
> still have to be able to run on modern OS versions.  I'm also fairly
> unclear on what you think a fix would look like if it's not effectively
> a change in the default.
>
> (Hint: this *will* be changing, one way or another, in Red Hat's version
> of 8.4, since that's what RH is shipping in RHEL6.)
>
>             

Well, my initial idea was that if PG_O_DIRECT is non-zero, we should 
test at startup time if we can use it on the WAL file system and inhibit 
its use if not.

Incidentally, I notice it's not used at all in test_fsync.c - should it 
not be?

cheers

andrew




Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Josh Berkus
Дата:
Tom,

> Well, no, actually it's the same (only) argument.  We'd never consider
> back-patching such a change if our hand weren't being forced by kernel
> changes :-(

I think we have to back-patch the change.  The way it is now, a DBA who
thinks they are doing normal sensible configuration can cause PostgreSQL
to fail to restart.  Imagine this scenario, for example:

1) DBA, using PostgreSQL 8.3, gets worried about possible disk issues
2) DBA changes their single Ext3/4 partition to "data=journal"
3) DBA restarts system
4) PostgreSQL won't start
5) DBA thrashes around for a few hours while the site is down
6) DBA gets fired and the new DBA migrates to some other DBMS.

I simply can't think of *anywhere* we could put the information about
opensync and Linux/Ext which would be prominent enough to avoid the
above scenario.  And per replies, a lot of people have hit this issue
already.

It's a bug and it's our bug.  Back when we added O_DIRECT, we assumed
that support for O_DIRECT/opensync could be determined on an OS/kernel
basis, because that was the information we had.   Now it turns out that
support can vary *by filesystem* and *between remounts*.  We didn't have
any way of knowing different back in 2004, but that doesn't mean we
don't need to fix our mistaken assumption now.

Ideally, we would change our code to test support for O_DIRECT on
startup, rather than at compile time, and backport *that*.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> It's a bug and it's our bug.

No, it's a filesystem bug that this particular filesystem doesn't
support a perfectly reasonable combination of options, and doesn't
even fail gracefully as it could easily do.  But assigning blame
doesn't help much.

> Back when we added O_DIRECT, we assumed
> that support for O_DIRECT/opensync could be determined on an OS/kernel
> basis, because that was the information we had.   Now it turns out that
> support can vary *by filesystem* and *between remounts*.  We didn't have
> any way of knowing different back in 2004, but that doesn't mean we
> don't need to fix our mistaken assumption now.

> Ideally, we would change our code to test support for O_DIRECT on
> startup, rather than at compile time, and backport *that*.

I'm not convinced that a startup-time test would be enough either,
since as you note a remount might be enough to change the situation.

I think the best answer is to get out of the business of using
O_DIRECT by default, especially seeing that available evidence
suggests it might not be a performance win anyway.
        regards, tom lane


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Josh Berkus
Дата:
> I think the best answer is to get out of the business of using
> O_DIRECT by default, especially seeing that available evidence
> suggests it might not be a performance win anyway.

Well, we don't have any performance evidence ... there's an issue with
the fsync-test script which causes it not to use O_DIRECT.

However, we haven't seen any evidence for benefits on any production
filesystem, either.  So given the lack of evidence of performance
benefit, combined with the definite evidence of related failures, I
agree that simply disabling O_DIRECT by default would be a good way to
solve this.

It might be nice to add new sync_method options, "osync_odirect" and
"odatasync_odirect" for DBAs who think they know enough to tune with
non-defaults.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Andres Freund
Дата:
On Wednesday 01 December 2010 19:09:05 Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
> > It's a bug and it's our bug.
> 
> No, it's a filesystem bug that this particular filesystem doesn't
> support a perfectly reasonable combination of options, and doesn't
> even fail gracefully as it could easily do.  But assigning blame
> doesn't help much.
I wouldnt call it a reasonable combination - promising fs-level data-
journaling (data=journal) and O_DIRECT are not really compatible with each 
other...

Andres



Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> It might be nice to add new sync_method options, "osync_odirect" and
> "odatasync_odirect" for DBAs who think they know enough to tune with
> non-defaults.

That would have the benefit that we'd not have to argue with people
who liked the current behavior (assuming there are any).  I'm not
sure there's much technical advantage, but from a political standpoint
it might be the easiest sort of change to push through.

However, this doesn't really address the question of what a sensible
choice of default is.  If there's little evidence about whether the
current flavor of open_datasync is really the fastest way, there's
none whatsoever that establishes open_datasync_without_o_direct
being a sane choice of default.
        regards, tom lane


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Josh Berkus
Дата:
> However, this doesn't really address the question of what a sensible
> choice of default is.  If there's little evidence about whether the
> current flavor of open_datasync is really the fastest way, there's
> none whatsoever that establishes open_datasync_without_o_direct
> being a sane choice of default.

No, I'd switch to fdatasync.  That's the performance that most people
are familiar with anyway, since it was all Linux supported before.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Andrew Dunstan
Дата:

On 12/01/2010 01:41 PM, Andres Freund wrote:
> On Wednesday 01 December 2010 19:09:05 Tom Lane wrote:
>> Josh Berkus<josh@agliodbs.com>  writes:
>>> It's a bug and it's our bug.
>> No, it's a filesystem bug that this particular filesystem doesn't
>> support a perfectly reasonable combination of options, and doesn't
>> even fail gracefully as it could easily do.  But assigning blame
>> doesn't help much.
> I wouldnt call it a reasonable combination - promising fs-level data-
> journaling (data=journal) and O_DIRECT are not really compatible with each
> other...
>
>

OK, but how is an application supposed to know that data journaling is 
set. Postgres doesn't even look at the FS type, let alone the mount 
options. From the app's POV it's perfectly reasonable. If the OS is 
going to provide the API, it should expect people to use it.

cheers

andrew


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Greg Smith
Дата:
Tom Lane wrote:
> I think the best answer is to get out of the business of using
> O_DIRECT by default, especially seeing that available evidence
> suggests it might not be a performance win anyway.
>   

I was concerned that open_datasync might be doing a better job of 
forcing data out of drive write caches.  But the tests I've done on 
RHEL6 so far suggest that's not true; the write guarantees seem to be 
the same as when using fdatasync.  And there's certainly one performance 
regression possible going from fdatasync to open_datasync, the case 
where you're overflowing wal_buffers before you actually commit.

Below is a test of the troublesome behavior on the same RHEL6 system I 
gave test_fsync performance test results from at 
http://archives.postgresql.org/message-id/4CE2EBF8.4040602@2ndquadrant.com

This confirms that the kernel now defining O_DSYNC behavior as being 
available, but not actually supporting it when running the filesystem in 
journaled mode, is the problem here.  That's clearly a kernel bug and no 
fault of PostgreSQL, it's just never been exposed in a default 
configuration before.  The RedHat bugzilla report seems a bit unclear 
about what's going on here, may be worth updating that to note the 
underlying cause.

Regardless, I'm now leaning heavily toward the idea of avoiding 
open_datasync by default given this bug, and backpatching that change to 
at least 8.4.  I'll do some more database-level performance tests here 
just as a final sanity check on that.  My gut feel is now that we'll 
eventually be taking something like Marti's patch, adding some more 
documentation around it, and applying that to HEAD as well as some 
number of back branches.

$ mount | head -n 1
/dev/sda7 on / type ext4 (rw)
$ cat $PGDATA/postgresql.conf | grep wal_sync_method
#wal_sync_method = fdatasync        # the default is the first option
$ pg_ctl start
server starting
LOG:  database system was shut down at 2010-12-01 17:20:16 EST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
$ psql -c "show wal_sync_method"wal_sync_method
-----------------open_datasync

[Edit /etc/fstab, change mount options to be "data=journal" and reboot]

$ mount | grep journal
/dev/sda7 on / type ext4 (rw,data=journal)
$ cat postgresql.conf | grep wal_sync_method
#wal_sync_method = fdatasync        # the default is the first option
$ pg_ctl start
server starting
LOG:  database system was shut down at 2010-12-01 12:14:50 EST
PANIC:  could not open file "pg_xlog/000000010000000000000001" (log file 
0, segment 1): Invalid argument
LOG:  startup process (PID 2690) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure
$ pg_ctl stop

$ vi $PGDATA/postgresql.conf
$ cat $PGDATA/postgresql.conf | grep wal_sync_method
wal_sync_method = fdatasync        # the default is the first option
$ pg_ctl start
server starting
LOG:  database system was shut down at 2010-12-01 12:14:40 EST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books



Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Bruce Momjian
Дата:
Andrew Dunstan wrote:
> 
> 
> On 11/30/2010 11:17 PM, Tom Lane wrote:
> > Andrew Dunstan<andrew@dunslane.net>  writes:
> >> On 11/30/2010 10:09 PM, Tom Lane wrote:
> >>> We should wait for the outcome of the discussion about whether to change
> >>> the default wal_sync_method before worrying about this.
> >> we've just had a significant PGX customer encounter this with the latest
> >> Postgres on Redhat's freshly released flagship product. Presumably the
> >> default wal_sync_method will only change prospectively.
> > I don't think so.  The fact that Linux is changing underneath us is a
> > compelling reason for back-patching a change here.  Our older branches
> > still have to be able to run on modern OS versions.  I'm also fairly
> > unclear on what you think a fix would look like if it's not effectively
> > a change in the default.
> >
> > (Hint: this *will* be changing, one way or another, in Red Hat's version
> > of 8.4, since that's what RH is shipping in RHEL6.)
> >
> >             
> 
> Well, my initial idea was that if PG_O_DIRECT is non-zero, we should 
> test at startup time if we can use it on the WAL file system and inhibit 
> its use if not.
> 
> Incidentally, I notice it's not used at all in test_fsync.c - should it 
> not be?

test_fsync certainly should be using PG_O_DIRECT in the same places the
backend does.  Once we decide how to handle PG_O_DIRECT, I will modify
test_fsync to match.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Josh Berkus
Дата:
All,

So, I've been doing some reading about this issue, and I think
regardless of what other changes we make we should never enable O_DIRECT
automatically on Linux, and it was a mistake for us to do so in the
first place.

First, in the Linux docs for open():

=========

In summary, O_DIRECT is a potentially powerful tool that should be used
with caution.  It is recommended that applications treat use of O_DIRECT
as a performance option which is disabled by default.

=========

Second, Linus has a quote about O_DIRECT that I think should serve as an
indicator to us that directIO will never be beneficial-by-default on
Linux, and might even someday be desupported:

============

The right way to do it is to just not use O_DIRECT.

The whole notion of "direct IO" is totally braindamaged. Just say no.
This is your brain: OThis is your brain on O_DIRECT: .
Any questions?

I should have fought back harder. There really is no valid reason for EVER
using O_DIRECT. You need a buffer whatever IO you do, and it might as well
be the page cache. There are better ways to control the page cache than
play games and think that a page cache isn't necessary.

So don't use O_DIRECT. Use things like madvise() and posix_fadvise()
instead.
    Linus
=============



--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Heikki Linnakangas
Дата:
On 03.12.2010 21:55, Josh Berkus wrote:
> All,
>
> So, I've been doing some reading about this issue, and I think
> regardless of what other changes we make we should never enable O_DIRECT
> automatically on Linux, and it was a mistake for us to do so in the
> first place.
>
> First, in the Linux docs for open():

The quote on that man page is hilarious:

"The thing that has always disturbed me about O_DIRECT  is  that the whole interface is just stupid, and was probably
designedby a deranged monkey on some serious mind-controlling  substances."              -- Linus
 

I agree we should not enable it by default. If it's faster on some 
circumstances, the admin is free to do the research and enable it, but 
defaults need to be safe above all.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Tom Lane
Дата:
Greg Smith <greg@2ndquadrant.com> writes:
> Regardless, I'm now leaning heavily toward the idea of avoiding 
> open_datasync by default given this bug, and backpatching that change to 
> at least 8.4.  I'll do some more database-level performance tests here 
> just as a final sanity check on that.  My gut feel is now that we'll 
> eventually be taking something like Marti's patch, adding some more 
> documentation around it, and applying that to HEAD as well as some 
> number of back branches.

I think we have got consensus that (1) open_datasync should not be the
default on Linux, and (2) this change needs to be back-patched.  What
is not clear to me is whether we have consensus to change the option
preference order globally, or restrict the change to just be effective
on Linux.  The various testing that's been reported so far is all for
Linux and thus doesn't directly address the question of whether other
kernels will have similar performance properties.  However, it seems
reasonable to me to suppose that open_datasync could only be a win in
very restricted scenarios and thus shouldn't be a preferred default.
Also, I dread trying to document the behavior if the preference order
becomes platform-dependent.

With the holidays fast approaching, our window to do something about
this in a timely fashion grows short.  If we don't schedule update
releases to be made this week, I think we're looking at not getting the
updates out till after New Year's.  Do we want to wait that long?  Is
anyone actually planning to do performance testing that would prove
anything about non-Linux platforms?
        regards, tom lane


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Greg Smith
Дата:
Tom Lane wrote:<br /><blockquote cite="mid:1909.1291668822@sss.pgh.pa.us" type="cite"><pre wrap="">The various testing
that'sbeen reported so far is all for
 
Linux and thus doesn't directly address the question of whether other
kernels will have similar performance properties.</pre></blockquote><br /> Survey of some popular platforms:<br /><br
/>Linux:  don't want O_DIRECT by default for reliability reasons, and there's no clear performance win in the default
configwith small wal_buffers<br /><br /> Solaris:  O_DIRECT doesn't work, there's another API support has never been
addedfor; see <a class="moz-txt-link-freetext"
href="http://blogs.sun.com/jkshah/entry/postgresql_wal_sync_method_and">http://blogs.sun.com/jkshah/entry/postgresql_wal_sync_method_and</a><br
/><br/> Windows:  Small reported gains for O_DIRECT, i.e 10% at <a class="moz-txt-link-freetext"
href="http://archives.postgresql.org/pgsql-hackers/2007-03/msg01615.php">http://archives.postgresql.org/pgsql-hackers/2007-03/msg01615.php</a><br
/><br/> FreeBSD:  It probably works there, but I've never seen good performance tests of it on this platform.<br /><br
/>Mac OS X:  Like Solaris, there's a similar mechanism but it's not O_DIRECT; see <a class="moz-txt-link-freetext"
href="http://stackoverflow.com/questions/2299402/how-does-one-do-raw-io-on-mac-os-x-ie-equivalent-to-linuxs-o-direct-flag">http://stackoverflow.com/questions/2299402/how-does-one-do-raw-io-on-mac-os-x-ie-equivalent-to-linuxs-o-direct-flag</a>
fornotes about the <span style="font-family: georgia;">F_NOCACHE  feature used.  Same basic situation as Solaris;
there'san API, but PostgreSQL doesn't use it yet.<br /><br /> So my guess is that some small percentage of Windows
usersmight notice a change here, and some testing on FreeBSD would be useful too.  That's about it for platforms that I
thinkanybody needs to worry about.<br /></span><br /><pre class="moz-signature" cols="72">-- 
 
Greg Smith   2ndQuadrant US    <a class="moz-txt-link-abbreviated"
href="mailto:greg@2ndQuadrant.com">greg@2ndQuadrant.com</a>  Baltimore, MD
 
PostgreSQL Training, Services and Support        <a class="moz-txt-link-abbreviated"
href="http://www.2ndQuadrant.us">www.2ndQuadrant.us</a>
"PostgreSQL 9.0 High Performance": <a class="moz-txt-link-freetext"
href="http://www.2ndQuadrant.com/books">http://www.2ndQuadrant.com/books</a>
</pre>

Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Steve Singer
Дата:
On 10-12-06 06:56 PM, Greg Smith wrote:
> Tom Lane wrote:
>> The various testing that's been reported so far is all for
>> Linux and thus doesn't directly address the question of whether other
>> kernels will have similar performance properties.
>
> Survey of some popular platforms:
>

<snip>

> So my guess is that some small percentage of Windows users might notice
> a change here, and some testing on FreeBSD would be useful too. That's
> about it for platforms that I think anybody needs to worry about.

If you tell me which options to pgbench and which .conf file settings 
you'd like to see I can probably arrange to run some tests on AIX.



>
> --
> Greg Smith   2ndQuadrant USgreg@2ndQuadrant.com    Baltimore, MD
> PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
> "PostgreSQL 9.0 High Performance":http://www.2ndQuadrant.com/books
>



Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Tom Lane
Дата:
Greg Smith <greg@2ndquadrant.com> writes:
> So my guess is that some small percentage of Windows users might notice 
> a change here, and some testing on FreeBSD would be useful too.  That's 
> about it for platforms that I think anybody needs to worry about.

To my mind, O_DIRECT is not really the key issue here, it's whether to
prefer O_DSYNC or fdatasync.  I looked back in the archives, and I think
that the main reason we prefer O_DSYNC when available is the results
I got here:

http://archives.postgresql.org/pgsql-hackers/2001-03/msg00381.php

which demonstrated a performance benefit on HPUX 10.20, though with a
test tool much more primitive than test_fsync.  I still have that
machine, although the disk that was in it at the time died awhile back.
What's in there now is a Seagate ST336607LW spinning at 10000 RPM (166
rev/sec) and today I get numbers like this from test_fsync:

Simple write:       8k write                      28331.020/second

Compare file sync methods using one write:       open_datasync 8k write          161.190/second       open_sync 8k
write             156.478/second       8k write, fdatasync              54.302/second       8k write, fsync
    51.810/second
 

Compare file sync methods using two writes:       2 open_datasync 8k writes        81.702/second       2 open_sync 8k
writes           80.172/second       8k write, 8k write, fdatasync    40.829/second       8k write, 8k write, fsync
  39.836/second
 

Compare open_sync with different sizes:       open_sync 16k write              80.192/second       2 open_sync 8k
writes           78.018/second
 

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)       8k write, fsync, close           52.527/second       8k write, close, fsync
54.092/second

So *on that rather ancient platform* there's a measurable performance
benefit to O_DSYNC, but this seems to be largely because fdatasync is
stubbed to fsync in userspace rather than because fdatasync wouldn't
be a better idea in the abstract.  Also, a lot of the argument against
fsync at the time was that it forced the kernel to iterate through all
the buffers for the WAL file to see if any were dirty.  I would imagine
that modern kernels are a tad smarter about that; and even if they
aren't, the CPU speed versus disk speed tradeoff has changed enough
since 2001 that iterating through 16MB of buffers isn't as interesting
as it was then.

So to my mind, switching to the preference order fdatasync,
fsync_writethrough, fsync seems like the thing to do.  Since we assume
fsync is always available, that means that O_DSYNC/O_SYNC will not be
the defaults on any platform.
        regards, tom lane


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Josh Berkus
Дата:
Steve,

> If you tell me which options to pgbench and which .conf file settings
> you'd like to see I can probably arrange to run some tests on AIX.

Compile and run test_fsync in PGSRC/src/tools/fsync.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Josh Berkus
Дата:
> Mac OS X:  Like Solaris, there's a similar mechanism but it's not
> O_DIRECT; see
> http://stackoverflow.com/questions/2299402/how-does-one-do-raw-io-on-mac-os-x-ie-equivalent-to-linuxs-o-direct-flag
> for notes about the F_NOCACHE  feature used.  Same basic situation as
> Solaris; there's an API, but PostgreSQL doesn't use it yet.

Actually, on OSX 10.5.8, o_dsync and fdatasync aren't even available.
From my run, it looks like even so regular fsync might be better than
open_sync.  Results from a MacBook:

Sidney-Stratton:fsync josh$ ./test_fsync
Loops = 10000

Simple write:8k write                       2121.004/second

Compare file sync methods using one write:(open_datasync unavailable)open_sync 8k write
1993.833/second(fdatasyncunavailable)8k write, fsync                1878.154/second
 

Compare file sync methods using two writes:(open_datasync unavailable)2 open_sync 8k writes
1005.009/second(fdatasyncunavailable)8k write, 8k write, fsync      1709.862/second
 

Compare open_sync with different sizes:open_sync 16k write            1728.803/second2 open_sync 8k writes
969.416/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)8k write, fsync, close         1772.572/second8k write, close, fsync
1939.897/second


--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Robert Haas
Дата:
On Mon, Dec 6, 2010 at 9:04 PM, Josh Berkus <josh@agliodbs.com> wrote:
>
>> Mac OS X:  Like Solaris, there's a similar mechanism but it's not
>> O_DIRECT; see
>> http://stackoverflow.com/questions/2299402/how-does-one-do-raw-io-on-mac-os-x-ie-equivalent-to-linuxs-o-direct-flag
>> for notes about the F_NOCACHE  feature used.  Same basic situation as
>> Solaris; there's an API, but PostgreSQL doesn't use it yet.
>
> Actually, on OSX 10.5.8, o_dsync and fdatasync aren't even available.
> From my run, it looks like even so regular fsync might be better than
> open_sync.

But I think you need to use fsync_writethrough if you actually want durability.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Dec 6, 2010 at 9:04 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> Actually, on OSX 10.5.8, o_dsync and fdatasync aren't even available.
>> From my run, it looks like even so regular fsync might be better than
>> open_sync.

> But I think you need to use fsync_writethrough if you actually want durability.

Yeah.  Unless your laptop contains an SSD, those numbers are garbage on
their face.  So that's another problem with test_fsync: it omits
fsync_writethrough.
        regards, tom lane


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Josh Berkus
Дата:
On 12/6/10 6:10 PM, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Dec 6, 2010 at 9:04 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> Actually, on OSX 10.5.8, o_dsync and fdatasync aren't even available.
>>> From my run, it looks like even so regular fsync might be better than
>>> open_sync.
> 
>> But I think you need to use fsync_writethrough if you actually want durability.
> 
> Yeah.  Unless your laptop contains an SSD, those numbers are garbage on
> their face.  So that's another problem with test_fsync: it omits
> fsync_writethrough.

Yeah, the issue with test_fsync appears to be that it's designed to work
without os-specific switches no matter what, not to accurately reflect
how we access wal.

I'll see if I can do better.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Josh Berkus
Дата:
All,

Geirth's results from his FreeBSD 7.1 server using 8.4's test_fsync:

Simple write timing:       write                    0.007081

Compare fsync times on write() and non-write() descriptor:
If the times are similar, fsync() can sync data written
on a different descriptor.       write, fsync, close      5.937933       write, close, fsync      8.056394

Compare one o_sync write to two:       one 16k o_sync write     7.366927       two 8k o_sync writes    15.299300

Compare file sync methods with one 8k write:       (o_dsync unavailable)       open o_sync, write       7.512682
(fdatasyncunavailable)       write, fsync             5.856480
 

Compare file sync methods with two 8k writes:       (o_dsync unavailable)       open o_sync, write      15.472910
(fdatasyncunavailable)       write, fsync             5.880319
 


... again, open_sync does not look very impressive.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Steve Singer
Дата:
On 10-12-06 09:00 PM, Josh Berkus wrote:
> Steve,
>
>> If you tell me which options to pgbench and which .conf file settings
>> you'd like to see I can probably arrange to run some tests on AIX.
>
> Compile and run test_fsync in PGSRC/src/tools/fsync.
>

Attached are runs against two different disk sub-systems from a server 
running AIX 5.3.

The first one is against the local disks


Loops = 10000

Simple write:        8k write                      60812.454/second

Compare file sync methods using one write:        open_datasync 8k write          162.160/second        open_sync 8k
write             158.472/second        8k write, fdatasync             158.157/second        8k write, fsync
      45.382/second
 

Compare file sync methods using two writes:        2 open_datasync 8k writes        79.472/second        2 open_sync 8k
writes           80.095/second        8k write, 8k write, fdatasync   159.268/second        8k write, 8k write, fsync
    44.725/second
 

Compare open_sync with different sizes:        open_sync 16k write             162.017/second        2 open_sync 8k
writes           79.709/second
 

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)        8k write, fsync, close           45.361/second        8k write, close, fsync
 36.311/second
 



================================

The below profile is from the same machine using an IBM DS 6800 SAN for 
storage.


Loops = 10000

Simple write:        8k write                      75933.027/second

Compare file sync methods using one write:        open_datasync 8k write         2762.801/second        open_sync 8k
write            2453.822/second        8k write, fdatasync            2867.331/second        8k write, fsync
    1094.048/second
 

Compare file sync methods using two writes:        2 open_datasync 8k writes      1287.845/second        2 open_sync 8k
writes         1332.084/second        8k write, 8k write, fdatasync  1966.411/second        8k write, 8k write, fsync
  1048.354/second
 

Compare open_sync with different sizes:        open_sync 16k write            2281.425/second        2 open_sync 8k
writes         1401.561/second
 

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)        8k write, fsync, close         1298.404/second        8k write, close, fsync
1188.582/second





Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Marti Raudsepp
Дата:
On Tue, Dec 7, 2010 at 03:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> To my mind, O_DIRECT is not really the key issue here, it's whether to
> prefer O_DSYNC or fdatasync.

Since different platforms implement these primitives differently, and
it's not always clear from the header file definitions which options
are actually implemented, how about simply hard-coding a default value
for each platform?

1. This would be quite straightforward to code and document (a table
of platforms and their default wal_sync_method setting)

2. The best performing (or safest) method can be chosen on every
platform. From the above discussion it seems that Windows and OSX
should default to fdatasync_writethrough even if other methods are
available

3. It would pre-empt similar surprises if other platforms change their
header files, like what happened on Linux now.

Sounds like the simple and foolproof solution.

Regards,
Marti


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Tom Lane
Дата:
Marti Raudsepp <marti@juffo.org> writes:
> On Tue, Dec 7, 2010 at 03:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> To my mind, O_DIRECT is not really the key issue here, it's whether to
>> prefer O_DSYNC or fdatasync.

> Since different platforms implement these primitives differently, and
> it's not always clear from the header file definitions which options
> are actually implemented, how about simply hard-coding a default value
> for each platform?

There's not a fixed finite list of "platforms we support".  In general
we prefer to avoid designing things that way at all.  If we have to have
specific exceptions for specific platforms, we grin and bear it, but for
the most part behavioral differences ought to be driven by configure's
probes for platform features.
        regards, tom lane


Re: We really ought to do something about O_DIRECT and data=journalled on ext4

От
Bruce Momjian
Дата:
Josh Berkus wrote:
> On 12/6/10 6:10 PM, Tom Lane wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> On Mon, Dec 6, 2010 at 9:04 PM, Josh Berkus <josh@agliodbs.com> wrote:
> >>> Actually, on OSX 10.5.8, o_dsync and fdatasync aren't even available.
> >>> From my run, it looks like even so regular fsync might be better than
> >>> open_sync.
> >
> >> But I think you need to use fsync_writethrough if you actually want durability.
> >
> > Yeah.  Unless your laptop contains an SSD, those numbers are garbage on
> > their face.  So that's another problem with test_fsync: it omits
> > fsync_writethrough.
>
> Yeah, the issue with test_fsync appears to be that it's designed to work
> without os-specific switches no matter what, not to accurately reflect
> how we access wal.

I have now modified pg_test_fsync to use O_DIRECT for O_SYNC/O_FSYNC,
and O_DSYNC, if supported, so it now matches how we use WAL (except we
don't use O_DIRECT when in 'archive' and 'hot standby' mode).  Applied
patch attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_test_fsync/pg_test_fsync.c b/contrib/pg_test_fsync/pg_test_fsync.c
new file mode 100644
index d075483..49a7b3c
*** a/contrib/pg_test_fsync/pg_test_fsync.c
--- b/contrib/pg_test_fsync/pg_test_fsync.c
***************
*** 23,29 ****
  #define XLOG_BLCKSZ_K    (XLOG_BLCKSZ / 1024)

  #define LABEL_FORMAT        "        %-32s"
! #define NA_FORMAT            LABEL_FORMAT "%18s"
  #define OPS_FORMAT            "%9.3f ops/sec"

  static const char *progname;
--- 23,29 ----
  #define XLOG_BLCKSZ_K    (XLOG_BLCKSZ / 1024)

  #define LABEL_FORMAT        "        %-32s"
! #define NA_FORMAT            "%18s"
  #define OPS_FORMAT            "%9.3f ops/sec"

  static const char *progname;
*************** handle_args(int argc, char *argv[])
*** 134,139 ****
--- 134,144 ----
      }

      printf("%d operations per test\n", ops_per_test);
+ #if PG_O_DIRECT != 0
+     printf("O_DIRECT supported on this platform for open_datasync and open_sync.\n");
+ #else
+     printf("Direct I/O is not supported on this platform.\n");
+ #endif
  }

  static void
*************** test_sync(int writes_per_op)
*** 184,226 ****
      /*
       * Test open_datasync if available
       */
! #ifdef OPEN_DATASYNC_FLAG
!     printf(LABEL_FORMAT, "open_datasync"
! #if PG_O_DIRECT != 0
!         " (non-direct I/O)*"
! #endif
!         );
      fflush(stdout);

!     if ((tmpfile = open(filename, O_RDWR | O_DSYNC, 0)) == -1)
!         die("could not open output file");
!     gettimeofday(&start_t, NULL);
!     for (ops = 0; ops < ops_per_test; ops++)
!     {
!         for (writes = 0; writes < writes_per_op; writes++)
!             if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
!                 die("write failed");
!         if (lseek(tmpfile, 0, SEEK_SET) == -1)
!             die("seek failed");
!     }
!     gettimeofday(&stop_t, NULL);
!     close(tmpfile);
!     print_elapse(start_t, stop_t);
!
!     /*
!      * If O_DIRECT is enabled, test that with open_datasync
!      */
! #if PG_O_DIRECT != 0
      if ((tmpfile = open(filename, O_RDWR | O_DSYNC | PG_O_DIRECT, 0)) == -1)
      {
!         printf(NA_FORMAT, "o_direct", "n/a**\n");
          fs_warning = true;
      }
      else
      {
!         printf(LABEL_FORMAT, "open_datasync (direct I/O)");
!         fflush(stdout);
!
          gettimeofday(&start_t, NULL);
          for (ops = 0; ops < ops_per_test; ops++)
          {
--- 189,207 ----
      /*
       * Test open_datasync if available
       */
!     printf(LABEL_FORMAT, "open_datasync");
      fflush(stdout);

! #ifdef OPEN_DATASYNC_FLAG
      if ((tmpfile = open(filename, O_RDWR | O_DSYNC | PG_O_DIRECT, 0)) == -1)
      {
!         printf(NA_FORMAT, "n/a*\n");
          fs_warning = true;
      }
      else
      {
!         if ((tmpfile = open(filename, O_RDWR | O_DSYNC | PG_O_DIRECT, 0)) == -1)
!             die("could not open output file");
          gettimeofday(&start_t, NULL);
          for (ops = 0; ops < ops_per_test; ops++)
          {
*************** test_sync(int writes_per_op)
*** 234,252 ****
          close(tmpfile);
          print_elapse(start_t, stop_t);
      }
- #endif
-
  #else
!     printf(NA_FORMAT, "open_datasync", "n/a\n");
  #endif

  /*
   * Test fdatasync if available
   */
- #ifdef HAVE_FDATASYNC
      printf(LABEL_FORMAT, "fdatasync");
      fflush(stdout);

      if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
          die("could not open output file");
      gettimeofday(&start_t, NULL);
--- 215,231 ----
          close(tmpfile);
          print_elapse(start_t, stop_t);
      }
  #else
!     printf(NA_FORMAT, "n/a\n");
  #endif

  /*
   * Test fdatasync if available
   */
      printf(LABEL_FORMAT, "fdatasync");
      fflush(stdout);

+ #ifdef HAVE_FDATASYNC
      if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
          die("could not open output file");
      gettimeofday(&start_t, NULL);
*************** test_sync(int writes_per_op)
*** 263,269 ****
      close(tmpfile);
      print_elapse(start_t, stop_t);
  #else
!     printf(NA_FORMAT, "fdatasync", "n/a\n");
  #endif

  /*
--- 242,248 ----
      close(tmpfile);
      print_elapse(start_t, stop_t);
  #else
!     printf(NA_FORMAT, "n/a\n");
  #endif

  /*
*************** test_sync(int writes_per_op)
*** 292,301 ****
  /*
   * If fsync_writethrough is available, test as well
   */
- #ifdef HAVE_FSYNC_WRITETHROUGH
      printf(LABEL_FORMAT, "fsync_writethrough");
      fflush(stdout);

      if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
          die("could not open output file");
      gettimeofday(&start_t, NULL);
--- 271,280 ----
  /*
   * If fsync_writethrough is available, test as well
   */
      printf(LABEL_FORMAT, "fsync_writethrough");
      fflush(stdout);

+ #ifdef HAVE_FSYNC_WRITETHROUGH
      if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
          die("could not open output file");
      gettimeofday(&start_t, NULL);
*************** test_sync(int writes_per_op)
*** 313,361 ****
      close(tmpfile);
      print_elapse(start_t, stop_t);
  #else
!     printf(NA_FORMAT, "fsync_writethrough", "n/a\n");
  #endif

  /*
   * Test open_sync if available
   */
! #ifdef OPEN_SYNC_FLAG
!     printf(LABEL_FORMAT, "open_sync"
! #if PG_O_DIRECT != 0
!         " (non-direct I/O)*"
! #endif
!         );
      fflush(stdout);

!     if ((tmpfile = open(filename, O_RDWR | OPEN_SYNC_FLAG, 0)) == -1)
!         die("could not open output file");
!     gettimeofday(&start_t, NULL);
!     for (ops = 0; ops < ops_per_test; ops++)
!     {
!         for (writes = 0; writes < writes_per_op; writes++)
!             if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
!                 die("write failed");
!         if (lseek(tmpfile, 0, SEEK_SET) == -1)
!             die("seek failed");
!     }
!     gettimeofday(&stop_t, NULL);
!     close(tmpfile);
!     print_elapse(start_t, stop_t);
!
!     /*
!      * If O_DIRECT is enabled, test that with open_sync
!      */
! #if PG_O_DIRECT != 0
      if ((tmpfile = open(filename, O_RDWR | OPEN_SYNC_FLAG | PG_O_DIRECT, 0)) == -1)
      {
!         printf(NA_FORMAT, "o_direct", "n/a**\n");
          fs_warning = true;
      }
      else
      {
-         printf(LABEL_FORMAT, "open_sync (direct I/O)");
-         fflush(stdout);
-
          gettimeofday(&start_t, NULL);
          for (ops = 0; ops < ops_per_test; ops++)
          {
--- 292,314 ----
      close(tmpfile);
      print_elapse(start_t, stop_t);
  #else
!     printf(NA_FORMAT, "n/a\n");
  #endif

  /*
   * Test open_sync if available
   */
!     printf(LABEL_FORMAT, "open_sync");
      fflush(stdout);

! #ifdef OPEN_SYNC_FLAG
      if ((tmpfile = open(filename, O_RDWR | OPEN_SYNC_FLAG | PG_O_DIRECT, 0)) == -1)
      {
!         printf(NA_FORMAT, "n/a*\n");
          fs_warning = true;
      }
      else
      {
          gettimeofday(&start_t, NULL);
          for (ops = 0; ops < ops_per_test; ops++)
          {
*************** test_sync(int writes_per_op)
*** 369,388 ****
          close(tmpfile);
          print_elapse(start_t, stop_t);
      }
- #endif
-
  #else
!     printf(NA_FORMAT, "open_sync", "n/a\n");
! #endif
!
! #if defined(OPEN_DATASYNC_FLAG) || defined(OPEN_SYNC_FLAG)
!     if (PG_O_DIRECT != 0)
!         printf("* This non-direct I/O mode is not used by Postgres.\n");
  #endif

      if (fs_warning)
      {
!         printf("** This file system and its mount options do not support direct\n");
          printf("I/O, e.g. ext4 in journaled mode.\n");
      }
  }
--- 322,334 ----
          close(tmpfile);
          print_elapse(start_t, stop_t);
      }
  #else
!     printf(NA_FORMAT, "n/a\n");
  #endif

      if (fs_warning)
      {
!         printf("* This file system and its mount options do not support direct\n");
          printf("I/O, e.g. ext4 in journaled mode.\n");
      }
  }
*************** test_open_syncs(void)
*** 407,422 ****
  static void
  test_open_sync(const char *msg, int writes_size)
  {
- #ifdef OPEN_SYNC_FLAG
      int        tmpfile, ops, writes;

      if ((tmpfile = open(filename, O_RDWR | OPEN_SYNC_FLAG | PG_O_DIRECT, 0)) == -1)
!         printf(NA_FORMAT, "o_direct", "n/a**\n");
      else
      {
-         printf(LABEL_FORMAT, msg);
-         fflush(stdout);
-
          gettimeofday(&start_t, NULL);
          for (ops = 0; ops < ops_per_test; ops++)
          {
--- 353,368 ----
  static void
  test_open_sync(const char *msg, int writes_size)
  {
      int        tmpfile, ops, writes;

+     printf(LABEL_FORMAT, msg);
+     fflush(stdout);
+
+ #ifdef OPEN_SYNC_FLAG
      if ((tmpfile = open(filename, O_RDWR | OPEN_SYNC_FLAG | PG_O_DIRECT, 0)) == -1)
!         printf(NA_FORMAT, "n/a*\n");
      else
      {
          gettimeofday(&start_t, NULL);
          for (ops = 0; ops < ops_per_test; ops++)
          {
*************** test_open_sync(const char *msg, int writ
*** 433,439 ****
      }

  #else
!     printf(NA_FORMAT, "open_sync", "n/a\n");
  #endif
  }

--- 379,385 ----
      }

  #else
!     printf(NA_FORMAT, "n/a\n");
  #endif
  }