Re: Improvement of checkpoint IO scheduler for stable transaction responses
От | Robert Haas |
---|---|
Тема | Re: Improvement of checkpoint IO scheduler for stable transaction responses |
Дата | |
Msg-id | CA+TgmoZsh0zRdLoPh+PaGswMKqHRLZcAb89O+XRQLhSsjYOaYg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Improvement of checkpoint IO scheduler for stable transaction responses (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>) |
Ответы |
Re: Improvement of checkpoint IO scheduler for stable transaction
responses
Re: Improvement of checkpoint IO scheduler for stable transaction responses |
Список | pgsql-hackers |
On Wed, Jul 3, 2013 at 4:18 AM, KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp> wrote: > I tested and changed segsize=0.25GB which is max partitioned table file size and > default setting is 1GB in configure option (./configure --with-segsize=0.25). > Because I thought that small segsize is good for fsync phase and background disk > write in OS in checkpoint. I got significant improvements in DBT-2 result! This is interesting. Unfortunately, it has a significant downside: potentially, there will be a lot more files in the data directory. As it is, the number of files that exist there today has caused performance problems for some of our customers. I'm not sure off-hand to what degree those problems have been related to overall inode consumption vs. the number of files in the same directory. If the problem is mainly with number of of files in the same directory, we could consider revising our directory layout. Instead of: base/${DBOID}/${RELFILENODE}_{FORK} We could have: base/${DBOID}/${FORK}/${RELFILENODE} That would move all the vm and fsm forks to separate directories, which would cut down the number of files in the main-fork directory significantly. That might be worth doing independently of the issue you're raising here. For large clusters, you'd even want one more level to keep the directories from getting too big: base/${DBOID}/${FORK}/${X}/${RELFILENODE} ...where ${X} is two hex digits, maybe just the low 16 bits of the relfilenode number. But this would be not as good for small clusters where you'd end up with oodles of little-tiny directories, and I'm not sure it'd be practical to smoothly fail over from one system to the other. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: