Re: Improvement of checkpoint IO scheduler for stable transaction responses
От | KONDO Mitsumasa |
---|---|
Тема | Re: Improvement of checkpoint IO scheduler for stable transaction responses |
Дата | |
Msg-id | 51CD873A.5070705@lab.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: Improvement of checkpoint IO scheduler for stable transaction responses (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Improvement of checkpoint IO scheduler for stable transaction
responses
|
Список | pgsql-hackers |
(2013/06/28 0:08), Robert Haas wrote: > On Tue, Jun 25, 2013 at 4:28 PM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: > I'm pretty sure Greg Smith tried it the fixed-sleep thing before and > it didn't work that well. I have also tried it and the resulting > behavior was unimpressive. It makes checkpoints take a long time to > complete even when there's very little data to flush out to the OS, > which is annoying; and when things actually do get ugly, the sleeps > aren't long enough to matter. See the timings Kondo-san posted > downthread: 100ms delays aren't going let the system recover in any > useful way when the fsync can take 13 s for one file. On a system > that's badly weighed down by I/O, the fsync times are often > *extremely* long - 13 s is far from the worst you can see. You have > to give the system a meaningful time to recover from that, allowing > other processes to make meaningful progress before you hit it again, > or system performance just goes down the tubes. Greg's test, IIRC, > used 3 s sleeps rather than your proposal of 100 ms, but it still > wasn't enough. Yes. In write phase, checkpointer writes numerous 8KB dirty pages in each SyncOneBuffer(), therefore it can be well for tiny(100ms) sleep time. But in fsync phase, checkpointer writes scores of relation files in each fsync(), therefore it can not be well for tiny sleep. It shoud need longer sleep time for recovery IO performance. If we know its best sleep time, we had better use previous fsync time. And if we want to prevent fast long fsync time, we had better change relation file size which is 1GB in default max size to smaller. Go back to the subject. Here is our patches test results. Fsync + write patch was not good result in past result, so I retry benchmark in same condition. It seems to get good perfomance than past result. * Performance result in DBT-2 (WH340) | TPS 90%tile Average Maximum ---------------+--------------------------------------- original_0.7 | 3474.62 18.348328 5.739 36.977713 original_1.0 | 3469.03 18.637865 5.842 41.754421 fsync | 3525.03 13.872711 5.382 28.062947 write | 3465.96 19.653667 5.804 40.664066 fsync + write | 3586.85 14.459486 4.960 27.266958 Heikki's patch | 3504.3 19.731743 5.761 38.33814 * HTML result in DBT-2 http://pgstatsinfo.projects.pgfoundry.org/RESULT/ In attached text, I also describe in each checkpoint time. fsync patch was seemed to have longer time than not fsync patch. However, checkpoint schedule is on time in checkpoint_timeout and allowable time. I think that it is most important things in fsync phase that fast finished checkpoint is not but definitely and assurance write pages in end of checkpoint. So my fsync patch is not wrong working any more. My write patch seems to have lot of riddle, so I try to investigate objective result and theory of effect. Best regards, -- Mitsumasa KONDO NTT Open Source Software Center
Вложения
В списке pgsql-hackers по дате отправления: