Re: Improvement of checkpoint IO scheduler for stable transaction responses
От | KONDO Mitsumasa |
---|---|
Тема | Re: Improvement of checkpoint IO scheduler for stable transaction responses |
Дата | |
Msg-id | 51F0F9C1.9080207@lab.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: Improvement of checkpoint IO scheduler for stable transaction responses (Greg Smith <greg@2ndQuadrant.com>) |
Список | pgsql-hackers |
Hi, I understand why my patch is faster than original, by executing Heikki's patch. His patch execute write() and fsync() in each relation files in write-phase in checkpoint. Therefore, I expected that write-phase would be slow, and fsync-phase would be fast. Because disk-write had executed in write-phase. But fsync time in postgresql with his patch is almost same time as original. It's very mysterious! I checked /proc/meminfo in executing benchmark and other resources. As a result, this was caused by separating checkpointer process and writer process. In 9.1 or older, checkpoint and background-write are executed in writer process by serial schedule. But in 9.2 or later, it is executed by parallel schedule, regardless executing checkpoint. Therefore, less fsync and long-term fsync schedule method which likes my patch are so faster. Because waste disk-write was descend by thease method. In worst case his patch, same peges disk-write are executed twice in one checkpoint, moreover it might be random disk-write. By the way, when dirty buffers which have always under dirty_background_ratio * physical memory / 100, write-phase does not disk-write at all. Therefore, in fsync-phase disk-write all of dirty buffer. So when this case, write-schedule is not making sense. It's very heavy and waste, but it might not change by OS and postgres parameters. I set small dirty_backjground_ratio, but the result was very miserable... Now, I am confirming my theory by dbt-2 benchmark in lru_max_pages = 0. And I will be told about OS background-writing mechanism by my colleague who is kernel hacker next week. What do you think? Best regards, -- Mitsumasa KONDO NTT Open Source Software Center
В списке pgsql-hackers по дате отправления: