Обсуждение: New boxes available for QA
-hackers, As I announced it a couple of months ago, apart from the boxes donated to PostgreSQLFr (affected to the web team IIRC), Continuent also donated 7 servers and a Gb/s switch to us for QA testing. It took some time to set them up but they're now up and running and available. These servers are available 24/7 to PostgreSQL QA and won't be used for other purposes. The servers are mostly P4 2.8-3 GHz with 512 to 3 GB of RAM and SATA disk(s) and they are running CentOS 5. The purposes I had in mind when I asked these servers to Robert Hodges were to use them for: - running buildfarm animals with unusual options: perhaps another box with -DCLOBBER_CACHE_ALWAYS or the recent --disable-integer-datetimes option and virtually any future options we'd like to test (Tom, any thoughts?) - feel free to ask and I can give people access if they want to be able to set up the animals by themselves (Andrew?); - running benchfarm clients the day we'll have a benchfarm; - give (well-known) people of the community who don't have access to several servers the ability to perform tests on this platform (depending on how many servers we dedicate to the 2 above points). I'm open to any suggestions as they are really here to serve the community and I'd really like to use them for any sort of QA possible. Concerning the second point, I wonder if it's not worth it to have a very simple thing already reporting results as the development cycle for 8.4 has already started (perhaps several pgbench unit tests testing various type of queries with a daily tree). Thoughts? The good news is that we will add a couple of new boxes to this platform soon. These "new" servers are dual Xeon boxes with more than 2GB RAM (from 2 to 4) and SCSI/SAS disks. We also have a quad Xeon MP 2.2 GHz box and a quad Xeon MP 700 Mhz which may be affected to the project if we really need them (I know sometimes people are looking for slow multi processors boxes so the quad Xeon 700 box may be a good choice) - they are huge 6U boxes so if we don't need them for specific purposes, I prefer affecting 1U boxes to the community. If we need them, it's the good moment to ask for them. The new boxes are donated by Cityvox. All these boxes are hosted in Villeurbanne, France by Open Wide, the company I work for. I'm looking forward to your comments and ideas. Regards, -- Guillaume
Guillaume, * Guillaume Smet (guillaume.smet@gmail.com) wrote: > These servers are available 24/7 to PostgreSQL QA and won't be used > for other purposes. Awesome. > Concerning the second point, I wonder if it's not worth it to have a > very simple thing already reporting results as the development cycle > for 8.4 has already started (perhaps several pgbench unit tests > testing various type of queries with a daily tree). Thoughts? It didn't occur to me before, but, if you've got a decent amount of disk space and server time.. I'm almost done scripting up everything to load the TIGER/Line Shapefiles from the US Census into PostgreSQL/PostGIS. Once it's done and working I would be happy to provide it to whomever asks, and it might be an interesting data set to load/query and look at benchmarks with. There's alot of GIST index creation, as well as other indexes like soundex(), and I'm planning to use partitioning of some sort for the geocoder. We could, for example, come up with some set of arbitrary addresses to geocode and see what the performance of that is. It's just a thought, and it's a large/"real" data set to play with. The data set is 22G compressed shapefiles/dbf files. Based on my initial numers I think it'll grow to around 50G loaded into PostgreSQL (I'll have better numbers later today). You can get the files from here: http://ftp2.census.gov/geo/tiger/TIGER2007FE/ Or, if you run into a problem with that, I can provide a pretty fast site to pull them from as well (15Mb/s). Thanks, Stephen
On Tue, 1 Apr 2008, Guillaume Smet wrote: > I wonder if it's not worth it to have a very simple thing already > reporting results as the development cycle for 8.4 has already started > (perhaps several pgbench unit tests testing various type of queries with > a daily tree) The pgbench-tools utilities I was working on at one point anticipated this sort of test starting one day. You can't really get useful results out of pgbench without running it enough times that you get average or median values. I dump everything into a results database which can be separated from the databases used for running the test, and then it's easy to compare day to day aggregate results across different query types. I haven't had a reason to work on that recently, but if you've got a semi-public box ready for benchmarks now I do. Won't be able to run any serious benchmarks on the systems you described, but should be great for detecting basic regressions and testing less popular compile-time options as you describe. As far as the other more powerful machines you mentioned go, would need to know a bit more about the disks and disk controller in there to comment about whether those are worth the trouble to integrate. The big missing piece of community hardware that remains elusive would be a system with >=4 cores, >=8GB RAM, and >=8 disks with a usable write-caching controller in it. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
* Greg Smith (gsmith@gregsmith.com) wrote: > >=4 cores, >=8GB RAM, and >=8 disks with a usable write-caching controller > in it. hrmmm. So a DL385G2, dual-proc/dual-core with 16GB of ram and 8 SAS disks with a Smart Array P800 w/ 512MB of write cache would be helpful? I've got quite a few such machines, along with larger DL585s. I can't make one externally available immediately but I could set one up to do benchmark runs and to dump the results to a public site. What I don't have atm is alot of time though, of course. Are there scripts and whatnot to get such a set up going quickly? I'll also investigate actually making one available to the community. Thanks, Stephen
On Wed, Apr 2, 2008 at 1:53 AM, Greg Smith <gsmith@gregsmith.com> wrote: > The pgbench-tools utilities I was working on at one point anticipated this > sort of test starting one day. You can't really get useful results out of > pgbench without running it enough times that you get average or median > values. I dump everything into a results database which can be separated > from the databases used for running the test, and then it's easy to > compare day to day aggregate results across different query types. I already used your pgbench tools but I just used the ability to draw graphs with gnuplot, I didn't test the database thing. > I haven't had a reason to work on that recently, but if you've got a > semi-public box ready for benchmarks now I do. Won't be able to run any > serious benchmarks on the systems you described, but should be great for > detecting basic regressions and testing less popular compile-time options > as you describe. Yeah, that's exactly what they are for. > As far as the other more powerful machines you mentioned go, would need to > know a bit more about the disks and disk controller in there to comment > about whether those are worth the trouble to integrate. The big missing > piece of community hardware that remains elusive would be a system with > >=4 cores, >=8GB RAM, and >=8 disks with a usable write-caching controller > in it. All the other boxes are Dell boxes (1750/1850/2950/6850) with PERC 4 or 5 depending on the servers. Two of them have external attachments to a disk array but it's an old one with 2 separated arrays (4 disks + 5 disks IIRC). They aren't big beasts but I think they can be useful to hackers who don't have any hardware fully available and also run more serious continuous tests than the other ones. I'll post the specs of the servers that may be fully available for community purposes tomorrow. -- Guillaume
On Tue, Apr 1, 2008 at 3:29 PM, Stephen Frost <sfrost@snowman.net> wrote: > I'm almost done scripting up everything to load the TIGER/Line > Shapefiles from the US Census into PostgreSQL/PostGIS. Once it's done > and working I would be happy to provide it to whomever asks, and it > might be an interesting data set to load/query and look at benchmarks > with. There's alot of GIST index creation, as well as other indexes > like soundex(), and I'm planning to use partitioning of some sort for > the geocoder. We could, for example, come up with some set of arbitrary > addresses to geocode and see what the performance of that is. > > It's just a thought, and it's a large/"real" data set to play with. I must admit that the first step I want to be achieved is to have the most simple regression tests running on a daily basis. A real database with advanced features can be very interesting for the future. I'm not sure loading the full database will provide useful results on this hardware but we can always work on a subset of it. -- Guillaume
On Wed, Apr 2, 2008 at 1:53 AM, Greg Smith <gsmith@gregsmith.com> wrote: > As far as the other more powerful machines you mentioned go, would need to > know a bit more about the disks and disk controller in there to comment > about whether those are worth the trouble to integrate. The big missing > piece of community hardware that remains elusive would be a system with Here we go: - a couple of 1750 servers: dual Xeon 2.8 boxes with PERC 4/DI, 2 internal disks, from 2 to 3 GB of RAM, we can probably get one of them up to 4 GB if needed - a PV 220 S disk array with: 4 x 36 GB + 5 x 73 GB. I think I can get 8 identical disks in the box by switching the 73 GB disks with the 36 GB ones from the other boxes but I'm not sure we can make only one RAID array from the 2 parts of the PV 220 S. - one of the above boxes also has a PERC 4/DC and is connected to the disk array. - a 6650 box: quad Xeon MP 2.2 with 4 GB: it has 2 internal disks and an external attachment to the disk array. All the disks are 10k rpm. What I was thinking about is that it can be useful to have several boxes connected to validate features too, not only performances (who says read access to a warm standby?). Note that if we don't find any good usage for them, it won't be a problem to affect them to our internal test platform. If everything goes well, we plan to buy a big box for internal PostgreSQL benchmarking and testing. It's obvious we won't use it night and day so I may be able to provide windows of time when the community can use it. This one is hypothetical though, the other ones are real and dedicated to community usage (yeah, it wasn't an April's fool). -- Guillaume
On Apr 1, 2008, at 7:20 PM, Stephen Frost wrote: > * Greg Smith (gsmith@gregsmith.com) wrote: >>> =4 cores, >=8GB RAM, and >=8 disks with a usable write-caching >>> controller >> in it. > > hrmmm. So a DL385G2, dual-proc/dual-core with 16GB of ram and 8 SAS > disks with a Smart Array P800 w/ 512MB of write cache would be > helpful? > > I've got quite a few such machines, along with larger DL585s. I can't > make one externally available immediately but I could set one up to do > benchmark runs and to dump the results to a public site. What I don't > have atm is alot of time though, of course. Are there scripts and > whatnot to get such a set up going quickly? Ditto here; I could possibly find one for running benchmarks for the community. We're also working towards building our own performance lab and running our own benchmarks (that reflect our application workload); once that's up I could run benchmarks against other versions if that would be useful. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
FYI, we (Stefan and I) started a wiki page to organize this effort: http://wiki.postgresql.org/wiki/Performances_QA_testing . Ideas and participation are very welcome. I also described the platform we have here and the usage of each server: http://wiki.postgresql.org/wiki/QA_Platform_hosted_at_Open_Wide_%28France%29 . I started working on it this week-end. I'll update this page as servers are booked/used and when we add more boxes. -- Guillaume