Обсуждение: Fwd: Is the fsync() fake on FreeBSD6.1?

Поиск
Список
Период
Сортировка

Fwd: Is the fsync() fake on FreeBSD6.1?

От
Jim Nasby
Дата:
I thought folks might be interested in this... note in particular the  
comment about linux.

Begin forwarded message:

> From: Greg 'groggy' Lehey <grog@FreeBSD.org>
> Date: June 26, 2006 11:34:12 PM EDT
> To: leo huang <leo.huang.list@gmail.com>
> Cc: freebsd-performance@freebsd.org
> Subject: Re: Is the fsync() fake on FreeBSD6.1?
>
> On Tuesday, 27 June 2006 at 10:18:47 +0800, leo huang wrote:
>> Hi,
>>
>> I benchmarked MySQL 4.1.18 on FreeBSD 6.1 and Debian 3.1 using  
>> Super Smack
>> 1.3 some days ago.
>>
>> ...
>>
>> The result surprise me. The MySQL Performance on FreeBSD6.1 is about
>> 10 times of on Debian3.1??and the output of iostat also shows it.
>>
>> I know that MySQL uses fsync() to flush both the data and log files
>> at default when using innodb engine(
>> http://dev.mysql.com/doc/refman/4.1/en/innodb-parameters.html). Our
>> evaluating computer only has a 10000RPM SCSI hard disk. I think it
>> can do about 200 sequential fsync() calls per second if the fsync()
>> is real.
>>
>> Is the fsync() on FreeBSD6.1 fake?
>
> My understanding from the last time I looked at the code was that
> fsync does the right thing:
>
>      The fsync() system call causes all modified data and  
> attributes of fd to
>      be moved to a permanent storage device.  This normally results  
> in all in-
>      core modified copies of buffers for the associated file to be  
> written to
>      a disk.
>
> This is not the case for Linux, where fsync syncs the entire file
> system.  That could explain some of the performance difference, but
> not all of it.  I suppose it's worth noting that, in general, people
> report much better performance with MySQL on Linux than on FreeBSD.
>
>> I mean than the data is only written to the drives memory and so can
>> be lost if power goes down.
>
> I don't believe that fsync is required to flush the drive buffers.  It
> would be nice to have a function that did, though.
>
>> And how I can confirm this?
>
> Trial and error?
>
> Greg
> --
> See complete headers for address and phone numbers.

--
Jim Nasby                                    jimn@enterprisedb.com
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)




Re: Fwd: Is the fsync() fake on FreeBSD6.1?

От
mark@mark.mielke.cc
Дата:
On Fri, Sep 22, 2006 at 01:52:02PM -0400, Jim Nasby wrote:
> I thought folks might be interested in this... note in particular the  
> comment about linux.
...
> >From: Greg 'groggy' Lehey <grog@FreeBSD.org>
> >Date: June 26, 2006 11:34:12 PM EDT
> >To: leo huang <leo.huang.list@gmail.com>
> >Cc: freebsd-performance@freebsd.org
> >Subject: Re: Is the fsync() fake on FreeBSD6.1?
> >...
> >My understanding from the last time I looked at the code was that
> >fsync does the right thing:
> >
> >     The fsync() system call causes all modified data and  
> >attributes of fd to
> >     be moved to a permanent storage device.  This normally results  
> >in all in-
> >     core modified copies of buffers for the associated file to be  
> >written to
> >     a disk.
> >
> >This is not the case for Linux, where fsync syncs the entire file
> >system.  That could explain some of the performance difference, but
> >not all of it.  I suppose it's worth noting that, in general, people
> >report much better performance with MySQL on Linux than on FreeBSD.

I see Greg's comment as contradictory. People see better performance with
MySQL on Linux than on FreeBSD, fsync() on Linux syncs the whole file
system?

I don't believe that fsync() on Linux syncs the whole file system
either.  This sounds made up, or a confusion with 'sync'. Perhaps
people @FreeBSD.org are not as familiar with Linux.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: Fwd: Is the fsync() fake on FreeBSD6.1?

От
Tom Lane
Дата:
mark@mark.mielke.cc writes:
> I don't believe that fsync() on Linux syncs the whole file system
> either.

Indeed.  I'd disregard this as coming from someone who knows much
less than he thinks.

(The most likely explanation for his results, I expect, is that FreeBSD
is trying to fsync and the disk drive is lying to it, whereas on his
comparison Linux machine the drive is not configured to lie about
write-complete.)
        regards, tom lane


Re: Fwd: Is the fsync() fake on FreeBSD6.1?

От
AgentM
Дата:
On Sep 22, 2006, at 15:00 , mark@mark.mielke.cc wrote:

> On Fri, Sep 22, 2006 at 01:52:02PM -0400, Jim Nasby wrote:
>> I thought folks might be interested in this... note in particular the
>> comment about linux.
> ...
>>> From: Greg 'groggy' Lehey <grog@FreeBSD.org>
>>> Date: June 26, 2006 11:34:12 PM EDT
>>> To: leo huang <leo.huang.list@gmail.com>
>>> Cc: freebsd-performance@freebsd.org
>>> Subject: Re: Is the fsync() fake on FreeBSD6.1?
>>> ...
>>> My understanding from the last time I looked at the code was that
>>> fsync does the right thing:
>>>
>>>     The fsync() system call causes all modified data and
>>> attributes of fd to
>>>     be moved to a permanent storage device.  This normally results
>>> in all in-
>>>     core modified copies of buffers for the associated file to be
>>> written to
>>>     a disk.

This is probably the same issue that the hackers encountered on  
Darwin- namely fsync() flushes the kernel cache, but a further  
function call was needed to flush the hard drive buffers. This meets  
the standard's definition of fsync because the data is indeed moved  
to the device, but it happens to just be the device's buffer instead  
of non-volatile storage.

-M


Re: Fwd: Is the fsync() fake on FreeBSD6.1?

От
Andrew - Supernews
Дата:
On 2006-09-22, Jim Nasby <jim@nasby.net> wrote:
> I thought folks might be interested in this... note in particular the  
> comment about linux.

I don't believe that either person in that discussion knows what they are
really talking about.

fsync() on FreeBSD does, as is required, force any modified data for the
file, plus any metadata, plus any modifications to any parent directories,
to the underlying disk device and waits for that device to report the
write as complete.

Whether the underlying device lies about the write completion is another
matter. All current SCSI disks have WCE enabled by default, which means
that they will lie about write completion if FUA was not set in the
request, which FreeBSD never sets. (It's not possible to get correct
results by having fsync() somehow selectively set FUA, because that would
leave previously-completed requests in the cache.)

WCE can be disabled on either a temporary or permanent basis by changing
the appropriate modepage. It's possible that Linux does this automatically,
or sets FUA on all writes, though that would surprise me considerably;
however I disclaim any knowledge of Linux internals.

On FreeBSD, this command will disable WCE permanently on a SCSI drive:

echo 'WCE: 0' | camcontrol modepage daXX -m 8 -P3 -e

(use -P0 to disable it only temporarily, or you can use just the second of
those commands alone to interactively edit the mode page)

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: Fwd: Is the fsync() fake on FreeBSD6.1?

От
Tom Lane
Дата:
Andrew - Supernews <andrew+nonews@supernews.com> writes:
> Whether the underlying device lies about the write completion is another
> matter. All current SCSI disks have WCE enabled by default, which means
> that they will lie about write completion if FUA was not set in the
> request, which FreeBSD never sets.

Huh?  The entire point of the SCSI command set is that it's not
necessary to lie about write completion for performance reasons, because
the architecture has always supported the concept of multiple requests
in-flight concurrently.  Has the disk drive industry gotten a whole lot
stupider in the fifteen years since I last wrote a SCSI driver?
        regards, tom lane


Re: Fwd: Is the fsync() fake on FreeBSD6.1?

От
Andrew - Supernews
Дата:
On 2006-09-23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew - Supernews <andrew+nonews@supernews.com> writes:
>> Whether the underlying device lies about the write completion is another
>> matter. All current SCSI disks have WCE enabled by default, which means
>> that they will lie about write completion if FUA was not set in the
>> request, which FreeBSD never sets.
>
> Huh?  The entire point of the SCSI command set is that it's not
> necessary to lie about write completion for performance reasons, because
> the architecture has always supported the concept of multiple requests
> in-flight concurrently.

I seem to recall we've had this conversation previously.

> Has the disk drive industry gotten a whole lot
> stupider in the fifteen years since I last wrote a SCSI driver?

Quite possibly, yes.

I certainly would never claim that WCE is a good idea, or that having it
enabled by default is a good idea, I merely report the _fact_ that it is
indeed enabled by default on every SCSI drive that I have recently
encountered (over several different vendors).

On my database machines I am careful to disable it (and check that this
does indeed take effect). I would recommend that others do likewise. The
performance impact of disabling WCE is not serious (other than removing
the unsafe speed gains of course).

Since posting the previous response I've been directed to a document that
seems to imply that Linux drivers now attempt to handle write-order
guarantees by introducing the concept of a "write barrier", i.e. a write
request which must complete after all previous writes and before all
subsequent ones.  Achieving this requires different strategies depending
on whether the underlying device allows command-queueing and/or exposes a
useful cache flush command; the implication of this is that for SCSI disks
with WCE, the linux driver will actually send SYNCHRONIZE CACHE when doing
a write barrier (which could be expensive of course). If (and I have no
idea if this is true) fsync() is implemented by means of such a barrier,
then this implies that an fsync()-heavy workload will perform much worse
on Linux when WCE is enabled than when it is disabled, since in the latter
case the driver will not issue SYNCHRONIZE CACHE and will simply ensure
that the relevent writes are all completed.

It would be interesting to see benchmarks of this.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: Fwd: Is the fsync() fake on FreeBSD6.1?

От
Ron Mayer
Дата:
Andrew - Supernews wrote:
> 
> Whether the underlying device lies about the write completion is another
> matter. All current SCSI disks have WCE enabled by default, which means
> that they will lie about write completion if FUA was not set in the
> request, which FreeBSD never sets. (It's not possible to get correct
> results by having fsync() somehow selectively set FUA, because that would
> leave previously-completed requests in the cache.)
> 
> WCE can be disabled on either a temporary or permanent basis by changing
> the appropriate modepage. It's possible that Linux does this automatically,
> or sets FUA on all writes, though that would surprise me considerably;
> however I disclaim any knowledge of Linux internals.


The Linux SATA driver author Jeff Garzik suggests [note 1] that
"The ability of a filesystem or fsync(2) to cause a [FLUSH|SYNC] CACHEcommand to be generated has only been present in
themost recent [as ofmid 2005] 2.6.x kernels.  See the "write barrier" stuff that peoplehave been discussing.
"Furthermore,read-after-write implies nothingat all.  The only way to you can be assured that your data has "hitthe
platter"is  (1) issuing [FLUSH|SYNC] CACHE, or  (2) using FUA-style disk commandsIt sounds like your test (or
reasoning)is invalid.
 
"


Before those min-2005 2.6.x kernels apparently fsync on linux didn't
really try to flush caches even when drives supported it (which
apparently most actually do if the requests are actually sent).

[note 1] http://lkml.org/lkml/2005/5/15/82