Re: Separate BLCKSZ for data and logging
От | Mark Wong |
---|---|
Тема | Re: Separate BLCKSZ for data and logging |
Дата | |
Msg-id | 200603162327.k2GNRRDZ014683@smtp.osdl.org обсуждение исходный текст |
Ответ на | Re: Separate BLCKSZ for data and logging (Simon Riggs <simon@2ndquadrant.com>) |
Список | pgsql-hackers |
On Thu, 16 Mar 2006 20:51:54 +0000 Simon Riggs <simon@2ndquadrant.com> wrote: > On Thu, 2006-03-16 at 12:22 -0800, Mark Wong wrote: > > > I was hoping that in the case where 2 or more data blocks are written to > > the log that they could written once within a single larger log block. > > The log block size must be larger than the data block size, of course. > > I think Tom's right... the OS blocksize is smaller than BLCKSZ, so > reducing the size might help with a very high transaction load when > commits are required very frequently. At checkpoint it sounds like we > might benefit from a large WAL blocksize because of all the additional > blocks written, but we often write more than one block at a time anyway, > and that still translates to multiple OS blocks whichever way you cut > it, so I'm not convinced yet. > > On Thu, 2006-03-16 at 15:21 -0500, Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > > Overall, the two things are fairly separate, apart from the fact that we > > > do currently log whole data blocks straight to the log. Usually just > > > one, but possibly 2 or three. So I have a feeling that things would > > > become less efficient if you did this, not more. > > > > > But its a good line of thought and I'll have a look at that. > > > > I too think reducing the size of WAL blocks might be a win, because > > we currently always write whole blocks, and so a series of small > > transactions will be rewriting the same 8K block multiple times. > > If the filesystem's native block size is less than 8K, matching that > > size should theoretically make things faster. > > Might it be possible to do this: When committing, if the current WAL > page is less than half-full wait for a single spin-lock cycle and then > do the write? (With the spin-lock, I mean on a single CPU we wait zero, > on a multi-CPU we wait a while). This is effectively a modification of > the group commit idea, but not to wait every time - only when it is > write-efficient to do so. (And we'd make that optional, too). We could > then ditch the remnant of the group-commit code. Sounds like there is some agreement that this could be an interesting exercise. I'll see what I can do. Thanks, Mark
В списке pgsql-hackers по дате отправления: