Re: ok you all win what is best opteron (I dont want a hosed system
От | William Yu |
---|---|
Тема | Re: ok you all win what is best opteron (I dont want a hosed system |
Дата | |
Msg-id | d66601$30l6$1@news.hub.org обсуждение исходный текст |
Ответ на | Re: ok you all win what is best opteron (I dont want a hosed system again) ("Joel Fradkin" <jfradkin@wazagua.com>) |
Ответы |
Re: ok you all win what is best opteron (I dont want a hosed system
|
Список | pgsql-performance |
4-way SMP Opteron system is actually pretty damn cheap -- if you get 2xDual Core versus 4xSingle. I just ordered a 2x265 (4x1.8ghz) system and the price was about $1300 more than a 2x244 (2x1.8ghz). Now you might ask, is a 2xDC comparable to 4x1? Here's some benchmarks I've found that showing DC versus Single @ the same clock rates/same # cores. SpecIntRate Windows: 4x846 = 56.7 2x270 = 62.6 SpecFPRate Windows: 4x846 = 52.5 2x270 = 55.3 SpecWeb99SSL: 4x846 = 3399 2x270 = 4100 (2 870s were used) Specjbb2000 IBM JVM: 4x848 = 146385 4x275 = 157432 What it looks like is a DC system is about 1 clock blip faster than a corresponding single core SMP system. E.g. if you have a 2xDC @ 1.8ghz, you need a 4x1 @ 2ghz to match the speed. (In some benchmarks, the difference is 2 clock steps up.) On the surface, it looks pretty amazing that a 4x1 Opteron with twice the memory bandwidth is slower than a corresponding 2xDC. (DC Opterons use the same socket as plain jane Opterons so they use the same 2xDDR memory setup.) It turns out the latency in a 2xDC setup is just so much lower and most apps like lower latency than higher bandwidth. Look at the diagram of the following Tyan 4-processor MB: ftp://ftp.tyan.com/datasheets/d_s4882_100.pdf Take particular note of the lack of diagonal lines connecting CPUs. What this means is if a process running on CPU0 needs memory attached to CPU3, it must request either CPU1 or CPU2 to forward the request for it. Without NUMA support, we're looking at 25% of memory access runs @ 50ns, 50% 110ns, 25% 170ns. (Rough numbers, I'd have to do a lot of googling to the find the exact latencies but I'm just too lazy now.) Now consider a 2xDC system. The 2 cores inside a single package are connected by an immensely fast internal SRQ connection. As long as there's no bandwidth limitation, both cores have fullspeed access to memory while core-to-core snooping on each respective cache is roughly 10ns. So memory access speeds look like so: 50% 50ns, 50% 110ns. If the memory locations you are need to access happen to be contained in the L1/L2 cache, this makes the difference even more pronounced. You then get memory access patterns for 4x1: 25% 5ns, 50% 65ns, 25% 125ns versus 2xDC: 25% 5ns, 25% 15ns, 50% 65ns. Joel Fradkin wrote: > Thank you much for the info. > I will take a look. I think the prices I have been seeing may exclude us > getting another 4 proc box this soon. My boss asked me to get something in > the 15K range (I spent 30 on the Dell). > The HP seemed to run around 30 but it had a lot more drives then the dell > (speced it with 14 10k drives).
В списке pgsql-performance по дате отправления: