Performance lossage in checkpoint dumping
От | Tom Lane |
---|---|
Тема | Performance lossage in checkpoint dumping |
Дата | |
Msg-id | 23621.982377108@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: Performance lossage in checkpoint dumping
|
Список | pgsql-hackers |
While poking at Peter Schmidt's comments about pgbench showing worse performance than for 7.0 (using -F in both cases), I noticed that given enough buffer space, FileWrite never seemed to get called at all. A little bit of sleuthing revealed the following: 1. Under WAL, we don't write dirty buffers out of the shared memory at every transaction commit. Instead, as long as a dirty buffer's slot isn't needed for something else, it just sits there until the next checkpoint or shutdown. CreateCheckpoint calls FlushBufferPool which writes out all the dirty buffers in one go. This is a Good Thing; it lets us consolidate multiple updates of a single datafile page by successive transactions into one disk write. We need this to buy back some of the extra I/O required to write the WAL logfile. 2. However, this means that a lot of the dirty-buffer writes get done by the periodic checkpoint process, not by the backends that originally dirtied the buffers. And that means that every last one gets done by blind write, because the checkpoint process isn't going to have opened any relation cache entries --- maybe a couple of system catalog relations, but for sure it won't have any for user relations. If you look at BufferSync, any page that the current process doesn't have an already-open relcache entry for is sent to smgrblindwrt not smgrwrite. 3. Blind write is gratuitously inefficient: it does separate open, seek, write, close kernel calls for every request. This was the right thing in 7.0.*, because backends relatively seldom did blind writes and even less often needed to blindwrite multiple pages of a single relation in succession. But the typical usage has changed a lot. I am thinking it'd be a good idea if blind write went through fd.c and thus was able to re-use open file descriptors, just like normal writes. This should improve the efficiency of dumping dirty buffers during checkpoint by a noticeable amount. Comments? regards, tom lane
В списке pgsql-hackers по дате отправления: