Обсуждение: disk I/O problems and Solutions
Hey folks, CentOS / PostgreSQL shop over here. I'm hitting 3 of my favorite lists with this, so here's hoping that the BCC trick is the right way to do it :-) We've just discovered thanks to a new Munin plugin http://blogs.amd.co.at/robe/2008/12/graphing-linux-disk-io-statistics-with-munin.html that our production DB is completely maxing out in I/O for about a 3 hour stretch from 6am til 9am This is "device utilization" as per the last graph at the above link. Load went down for a while but is now between 70% and 95% sustained. We've only had this plugin going for less than a day so I don't really have any more data going back further. But we've suspected a disk issue for some time - just have not been able to prove it. Our system IBM 3650 - quad 2Ghz e5405 Xeon 8K SAS RAID Controller 6 x 300G 15K/RPM SAS Drives /dev/sda - 2 drives configured as a RAID 1 for 300G for the OS /dev/sdb - 3 drives configured as RAID5 for 600G for the DB 1 drive as a global hot spare /dev/sdb is the one that is maxing out. We need to have a very serious look at fixing this situation. But we don't have the money to be experimenting with solutions that won't solve our problem. And our budget is fairly limited. Is there a public library somewhere of disk subsystems and their performance figures? Done with some semblance of a standard benchmark? One benchmark I am partial to is this one : http://wiki.postgresql.org/wiki/PgCon_2009/Greg_Smith_Hardware_Benchmarking_notes#dd_test One thing I am thinking of in the immediate term is taking the RAID5 + hot spare and converting it to RAID10 with the same amount of storage. Will that perform much better? In general we are planning to move away from RAID5 toward RAID10. We also have on order an external IBM array (don't have the exact name on hand but model number was 3000) with 12 drive bays. We ordered it with just 4 x SATAII drives, and were going to put it on a different system as a RAID10. These are just 7200 RPM drives - the goal was cheaper storage because the SAS drives are about twice as much per drive, and it is only a 300G drive versus the 1T SATA2 drives. IIRC the SATA2 drives are about $200 each and the SAS 300G drives about $500 each. So I have 2 thoughts with this 12 disk array. 1 is to fill it up with 12 x cheap SATA2 drives and hope that even though the spin-rate is a lot slower, that the fact that it has more drives will make it perform better. But somehow I am doubtful about that. The other thought is to bite the bullet and fill it up with 300G SAS drives. any thoughts here? recommendations on what to do with a tight budget? It could be the answer is that I just have to go back to the bean counters and tell them we have no choice but to start spending some real money. But on what? And how do I prove that this is the only choice? -- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food"
----- "Alan McKay" <alan.mckay@gmail.com> escreveu: > CentOS / PostgreSQL shop over here. > > Our system > IBM 3650 - quad 2Ghz e5405 Xeon > 8K SAS RAID Controller > 6 x 300G 15K/RPM SAS Drives > /dev/sda - 2 drives configured as a RAID 1 for 300G for the OS > /dev/sdb - 3 drives configured as RAID5 for 600G for the DB > 1 drive as a global hot spare > > /dev/sdb is the one that is maxing out. What are you calling "maxing out"? Excess IOPS, MB/s or high response times? Each of these have different approaches when trying to find out a solution. > Is there a public library somewhere of disk subsystems and their > performance figures? Done with some semblance of a standard > benchmark? you should try using iostat or sar utilities. Both can give you complete reports of your online disk activity and probablywere the tools in the backend used by your tool as the frontend. It's very important to figure out that the percentage seen is all about CPU time used when in an I/O operation. If you have100% you have to worry but not too desperatelly. What matters most for me is the disk operation response time and queue size. If you have these numbers increasing then yourdatabase performance will suffer. Always check the man pages for iostat to understand what those numbers are all about. > One thing I am thinking of in the immediate term is taking the RAID5 > + > hot spare and converting it to RAID10 with the same amount of > storage. > Will that perform much better? Usually yes for write operations because the raid controller doesn't have to calculate parity for the spare disk. You'llhave some improvements in the disk seek time and your database will be snapier if you have an OLTP application. RAID5 can handle more IOPS, otherwise. It can be good for your pg_xlog directory, but the amount of disk space needed forWAL is just a small amount. > In general we are planning to move away from RAID5 toward RAID10. > > We also have on order an external IBM array (don't have the exact > name > on hand but model number was 3000) with 12 drive bays. We ordered it > with just 4 x SATAII drives, and were going to put it on a different > system as a RAID10. These are just 7200 RPM drives - the goal was > cheaper storage because the SAS drives are about twice as much per > drive, and it is only a 300G drive versus the 1T SATA2 drives. IIRC > the SATA2 drives are about $200 each and the SAS 300G drives about > $500 each. I think it's a good choice. > So I have 2 thoughts with this 12 disk array. 1 is to fill it up > with 12 x cheap SATA2 drives and hope that even though the spin-rate > is a lot slower, that the fact that it has more drives will make it > perform better. But somehow I am doubtful about that. The other > thought is to bite the bullet and fill it up with 300G SAS drives. > > any thoughts here? recommendations on what to do with a tight > budget? Take you new storage system when it arrives, make it RAID10 and administer it using LVM in Linux. If you need greater performance later you will be able to make stripes between raid arrays. Regards Flavio Henrique A. Gurgel Consultor -- 4Linux tel. 55-11-2125.4765 fax. 55-11-2125.4777 www.4linux.com.br
On Fri, Oct 9, 2009 at 10:45 AM, Alan McKay <alan.mckay@gmail.com> wrote: > Hey folks, > > CentOS / PostgreSQL shop over here. > > I'm hitting 3 of my favorite lists with this, so here's hoping that > the BCC trick is the right way to do it :-) I added pgsql-performance back in in my reply so we can share with the rest of the class. > We've just discovered thanks to a new Munin plugin > http://blogs.amd.co.at/robe/2008/12/graphing-linux-disk-io-statistics-with-munin.html > that our production DB is completely maxing out in I/O for about a 3 > hour stretch from 6am til 9am > This is "device utilization" as per the last graph at the above link. What does vmstat, sar, or top have to say about it? If you're at 100% IO Wait, then yeah, your disk subsystem is your bottleneck. > Our system > IBM 3650 - quad 2Ghz e5405 Xeon > 8K SAS RAID Controller Does this RAID controller have a battery backed cache on it? > 6 x 300G 15K/RPM SAS Drives > /dev/sda - 2 drives configured as a RAID 1 for 300G for the OS > /dev/sdb - 3 drives configured as RAID5 for 600G for the DB > 1 drive as a global hot spare > > /dev/sdb is the one that is maxing out. Yeah, with RAID-5 that's not surprising. Especially if you've got even a decent / small percentage of writes in the mix, RAID-5 is gonna be pretty slow. > We need to have a very serious look at fixing this situation. But we > don't have the money to be experimenting with solutions that won't > solve our problem. And our budget is fairly limited. > > Is there a public library somewhere of disk subsystems and their > performance figures? Done with some semblance of a standard > benchmark? Not that I know of, and if there is, I'm as eager as you to find it. This mailing list's archives are as close as I've come to finding it. > One benchmark I am partial to is this one : > http://wiki.postgresql.org/wiki/PgCon_2009/Greg_Smith_Hardware_Benchmarking_notes#dd_test > > One thing I am thinking of in the immediate term is taking the RAID5 + > hot spare and converting it to RAID10 with the same amount of storage. > Will that perform much better? Almost certainly. > In general we are planning to move away from RAID5 toward RAID10. > > We also have on order an external IBM array (don't have the exact name > on hand but model number was 3000) with 12 drive bays. We ordered it > with just 4 x SATAII drives, and were going to put it on a different > system as a RAID10. These are just 7200 RPM drives - the goal was > cheaper storage because the SAS drives are about twice as much per > drive, and it is only a 300G drive versus the 1T SATA2 drives. IIRC > the SATA2 drives are about $200 each and the SAS 300G drives about > $500 each. > So I have 2 thoughts with this 12 disk array. 1 is to fill it up > with 12 x cheap SATA2 drives and hope that even though the spin-rate > is a lot slower, that the fact that it has more drives will make it > perform better. But somehow I am doubtful about that. The other > thought is to bite the bullet and fill it up with 300G SAS drives. I'd give the SATA drives a try. If they aren't fast enough, then everybody in the office gets a free / cheap drive upgrade in their desktop machine. More drives == faster RAID-10 up to the point you saturate your controller / IO bus on your machine
On Fri, Oct 9, 2009 at 9:45 AM, Alan McKay <alan.mckay@gmail.com> wrote: > We've just discovered thanks to a new Munin plugin > http://blogs.amd.co.at/robe/2008/12/graphing-linux-disk-io-statistics-with-munin.html > that our production DB is completely maxing out in I/O for about a 3 > hour stretch from 6am til 9am > This is "device utilization" as per the last graph at the above link. As Flavio mentioned, we really need to know if it's seek limited or bandwidth limited, but I suspect it's seek limited. Actual data from vmstat or sar would be helpful. Also knowing what kind of raid controller is being used and whether or not it has a BBU or not would be useful. And finally, you didn't mention what version of CentOS or PostgreSQL. > One thing I am thinking of in the immediate term is taking the RAID5 + > hot spare and converting it to RAID10 with the same amount of storage. > Will that perform much better? Depends on how the array is IO limited. But in general, RAID10 > RAID5 in terms of performance. > So I have 2 thoughts with this 12 disk array. 1 is to fill it up > with 12 x cheap SATA2 drives and hope that even though the spin-rate > is a lot slower, that the fact that it has more drives will make it > perform better. But somehow I am doubtful about that. The other > thought is to bite the bullet and fill it up with 300G SAS drives. Not a bad idea. Keep in mind that your 15k drives can seek about twice as fast as 7200 rpm drives, so you'll probably need close to twice as many to match performance with the same configuration. If you're random IO limited, though, RAID5 will only write about as fast as a single disk (but sometimes a LOT slower!) - a 12-disk RAID10 will write about 6 times faster than a single disk. So overall, the 12 disk 7.2k RAID10 array should be significantly faster than the 3 disk 15k RAID5 array. > any thoughts here? recommendations on what to do with a tight budget? > It could be the answer is that I just have to go back to the bean > counters and tell them we have no choice but to start spending some > real money. But on what? And how do I prove that this is the only > choice? It's hard to say without knowing all the information. One free possibility would be to move the log data onto the RAID1 from the RAID5, thus splitting up your database load over all of your disks. You can do this by moving the pg_xlog folder to the RAID1 array and symlink it back to your data folder. Should be able to try this with just a few seconds of downtime. -Dave
> >> any thoughts here? recommendations on what to do with a tight budget? >> It could be the answer is that I just have to go back to the bean >> counters and tell them we have no choice but to start spending some >> real money. But on what? And how do I prove that this is the only >> choice? > > It's hard to say without knowing all the information. One free > possibility would be to move the log data onto the RAID1 from the > RAID5, thus splitting up your database load over all of your disks. > You can do this by moving the pg_xlog folder to the RAID1 array and > symlink it back to your data folder. Should be able to try this with > just a few seconds of downtime. > Do the above first. Then, on your sdb, set the scheduler to 'deadline' If it is ext3, mount sdb as 'writeback,noatime'. If you have your pg_xlog on your RAID 5, using ext3 in 'ordered' mode, then you are going to be continuously throwing small writes at it. If this is the case then the above configuration changes will easily double your performance, most likely. > -Dave > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >