AW: WAL and raw devices (was: volume management)

Поиск
Список
Период
Сортировка
От Zeugswetter Andreas SB
Тема AW: WAL and raw devices (was: volume management)
Дата
Msg-id 11C1E6749A55D411A9670001FA6879633682C5@sdexcsrv1.f000.d0188.sd.spardat.at
обсуждение исходный текст
Список pgsql-hackers
> > As an aside, I do however think, that optimizing the O_SYNC path of
> > the WAL code to block writes to larger blocks (doesn't need to be
> > more than 256k) would lead to nearly the same performance as a raw
> > device on most filesystems. (Maybe also add code to reuse backed up
> > logfiles to avoid the need to preallocate space) Imho this is the part
> > of the code where the brainwork should first be put into. It is also a
> > prerequisite to make raw devices fast, since if you write 8k blocks to
> > a raw device, that is slow (not faster than a fs).
> 
> You cannot block writes to the WAL without blocking transactions waiting 
> on the write, because completion of that write is necessary for the 
> transaction to complete.

Yes, this is obvious, but:

You *can* block writes into larger blocks as long as no commit comes 
inbetween. This essentially increases performance e.g. for bulk loads
where single transactions are > 8k of WAL. A typical example is even in the 
regression test, the "copy ... from" statements. They really suffer from
the O_SYNC mode. This mode is essentially what you would have now for a
raw device WAL.

> Moving the WAL volume's disk head into position is the major investment 
> you are amortizing with your large blocks.   If the head is already in 
> position, it is about as efficient to write a little as to write a lot.

This is only half of the story for large transactions. For large transactions
you need to write more than the current 8k in one call (only in the raw device, 
or O_SYNC mode of course). Writing in large blocks also helps the fs to reduce 
head movement. After every write call the OS suspends the current 
process, and makes room for another backend to e.g read a block on the same drive, 
thus forcing head movement.

I suggest you do some tests with raw devices, which I already did, to see what happens
if you only write 8k blocks (you only get 50-60% performance compared to 256k).

The IO performance gain you can achieve on a raw device compared to a 
preallocated filesystem file is imho neglectible. e.g. on AIX it is due to a global
kernel parameter, that defaults to a max 32k block size for read ahead and write behind. 
I noted the advantages in a previous thread about why Oracle wants raw devices,
and I don't think they are worth it at the current state of PostgreSQL.  
Andreas


В списке pgsql-hackers по дате отправления:

Предыдущее
От: The Hermit Hacker
Дата:
Сообщение: Re: 7.1.2 release
Следующее
От: 施銘斌
Дата:
Сообщение: Can PostgreSQL's Stored Procedure return a ReccordSet?