Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
От | Pietro Pugni |
---|---|
Тема | Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL |
Дата | |
Msg-id | 0E6681E8-7412-4A8F-8AEC-FD6A29820B66@gmail.com обсуждение исходный текст |
Ответ на | Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL (didier <did447@gmail.com>) |
Список | pgsql-performance |
Hi didier,
thank you for your time.
I forgot to display before the output of free. I’ve looked into it before and I found difficult to fully understand if there was something wrong.
Before starting Postgres:
total used free shared buffers cached
Mem: 125G 9G 115G 15M 362M 8.1G
-/+ buffers/cache: 1.5G 124G
Swap: 127G 0B 127G
Here’s an example of free output when queries B_1 and B_2 are running (they’re part of the same transaction). Generally values remains the same. For what I can understand, RAM isn’t used at all (there’s a lot of unused RAM).
total used free shared buffers cached
Mem: 125G 13G 112G 3.1G 362M 11G
-/+ buffers/cache: 1.9G 123G
Swap: 127G 0B 127G
With Postgres running after transaction has been executed:
total used free shared buffers cached
Mem: 125G 13G 112G 3.1G 362M 11G
-/+ buffers/cache: 1.5G 124G
Swap: 127G 0B 127G
there's also huge page
/sys/kernel/mm/transparent_hugepage/enabled
can you try to disable it?
It was enabled and after disabling it nothing changed: time execution is practically the same (131s for the same transaction tested in previous emails, which is composed by queries B_1 and B_2).
Also test on the dell:
select tmp.cf, tmp.dt from grep_studi.tmp;
and
select tmp.cf, tmp.dt from grep_studi.tmp order by tmp.cf;
in Query B_2
the sort is 9 time slower on the dell, you have to find why…
Here’s the output for the two queries:
select tmp.cf, tmp.dt from grep_studi.tmp;
"Seq Scan on grep_studi.tmp (cost=0.00..11007.74 rows=1346868 width=72) (actual time=0.082..618.709 rows=2951191 loops=1)"
" Output: cf, dt"
" Buffers: shared hit=512 read=7802 dirtied=8314"
"Planning time: 0.087 ms"
"Execution time: 745.505 ms"
select tmp.cf, tmp.dt from grep_studi.tmp;
"Sort (cost=38431.55..39104.99 rows=1346868 width=72) (actual time=3146.548..3306.179 rows=2951191 loops=1)"
" Output: cf, dt"
" Sort Key: tmp.cf"
" Sort Method: quicksort Memory: 328866kB"
" Buffers: shared hit=8317"
" -> Seq Scan on grep_studi.tmp (cost=0.00..11007.74 rows=1346868 width=72) (actual time=0.012..373.346 rows=2951191 loops=1)"
" Output: cf, dt"
" Buffers: shared hit=8314"
"Planning time: 0.034 ms"
"Execution time: 3459.065 ms"
32 GB for buffers is too high for the queries in your test but it
doesn't matter.
I’ve set shared_buffers to be 1/4 of the total RAM. I’ve changed kernel values to accomodate this value. Lowering to smaller values doesn’t improve the transaction results. Here’s the results with 1 run for each level of shared_buffers:
32GB: 131s
16GB: 132s
8GB: 133s
4GB: 132s
2GB: 143s
1GB: 148s
512MB: 183s
256MB: 192s
Probably I can keep 4GB but I make use of several partitions with tens of millions of records each. This is why I keep shared_buffers high. My applications is also similar to a DWH solution with one user. Like you said, big values of shared_buffers shouldn’t be a issue.
I’ve done some tests with sysbench on Dell T420 and MacMini.
T420 - RAM READ - 16GB / 1MB
sh-4.3# sysbench --test=memory --memory-oper=read --memory-block-size=1MB --memory-total-size=16GB run
sysbench 0.4.12: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Doing memory operations speed test
Memory block size: 1024K
Memory transfer size: 16384M
Memory operations type: read
Memory scope type: global
Threads started!
Done.
Operations performed: 16384 (3643025.32 ops/sec)
16384.00 MB transferred (3643025.32 MB/sec)
Test execution summary:
total time: 0.0045s
total number of events: 16384
total time taken by event execution: 0.0031
per-request statistics:
min: 0.00ms
avg: 0.00ms
max: 0.02ms
approx. 95 percentile: 0.00ms
Threads fairness:
events (avg/stddev): 16384.0000/0.00
execution time (avg/stddev): 0.0031/0.00
MacMini - RAM READ - 16GB / 1MB
server:sysbench Pietro$ ./sysbench --test=memory --memory-oper=read --memory-block-size=1MB --memory-total-size=16GB run
sysbench 0.5: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Random number generator seed is 0 and will be ignored
Threads started!
Operations performed: 16384 ( 5484.50 ops/sec)
16384.00 MB transferred (5484.50 MB/sec)
General statistics:
total time: 2.9873s
total number of events: 16384
total time taken by event execution: 2.9836s
response time:
min: 0.18ms
avg: 0.18ms
max: 0.24ms
approx. 95 percentile: 0.19ms
Threads fairness:
events (avg/stddev): 16384.0000/0.00
execution time (avg/stddev): 2.9836/0.00
T420 - RAM WRITE - 16GB / 1MB
sh-4.3# sysbench --test=memory --memory-oper=write --memory-block-size=1MB --memory-total-size=16GB run
sysbench 0.4.12: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Doing memory operations speed test
Memory block size: 1024K
Memory transfer size: 16384M
Memory operations type: write
Memory scope type: global
Threads started!
Done.
Operations performed: 16384 ( 8298.97 ops/sec)
16384.00 MB transferred (8298.97 MB/sec)
Test execution summary:
total time: 1.9742s
total number of events: 16384
total time taken by event execution: 1.9723
per-request statistics:
min: 0.12ms
avg: 0.12ms
max: 0.25ms
approx. 95 percentile: 0.12ms
Threads fairness:
events (avg/stddev): 16384.0000/0.00
execution time (avg/stddev): 1.9723/0.00
MacMini - RAM WRITE - 16GB / 1MB
server:sysbench Pietro$ ./sysbench --test=memory --memory-oper=write --memory-block-size=1MB --memory-total-size=16GB run
sysbench 0.5: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Random number generator seed is 0 and will be ignored
Threads started!
Operations performed: 16384 ( 5472.90 ops/sec)
16384.00 MB transferred (5472.90 MB/sec)
General statistics:
total time: 2.9937s
total number of events: 16384
total time taken by event execution: 2.9890s
response time:
min: 0.18ms
avg: 0.18ms
max: 0.32ms
approx. 95 percentile: 0.19ms
Threads fairness:
events (avg/stddev): 16384.0000/0.00
execution time (avg/stddev): 2.9890/0.00
T420 - CPU
sh-4.3# sysbench --test=cpu run
sysbench 0.4.12: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Doing CPU performance benchmark
Threads started!
Done.
Maximum prime number checked in CPU test: 10000
Test execution summary:
total time: 13.0683s
total number of events: 10000
total time taken by event execution: 13.0674
per-request statistics:
min: 1.30ms
avg: 1.31ms
max: 1.44ms
approx. 95 percentile: 1.35ms
Threads fairness:
events (avg/stddev): 10000.0000/0.00
execution time (avg/stddev): 13.0674/0.00
MacMini - CPU
server:sysbench Pietro$ ./sysbench --test=cpu run
sysbench 0.5: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Random number generator seed is 0 and will be ignored
Primer numbers limit: 10000
Threads started!
General statistics:
total time: 11.5728s
total number of events: 10000
total time taken by event execution: 11.5703s
response time:
min: 1.15ms
avg: 1.16ms
max: 2.17ms
approx. 95 percentile: 1.17ms
Threads fairness:
events (avg/stddev): 10000.0000/0.00
execution time (avg/stddev): 11.5703/0.00
I’ve done these tests because someone else on this discussion asked me to investigate on memory bandwidth and because I found this interesting article about Intel Xeon vs Intel i5 with different Postgres versions: http://blog.pgaddict.com/posts/performance-since-postgresql-7-4-to-9-4-pgbench
Hope this helps to better understand the problem.
Thank you very much.
Best regards,
Pietro
В списке pgsql-performance по дате отправления:
Предыдущее
От: didierДата:
Сообщение: Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Следующее
От: Pietro PugniДата:
Сообщение: Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL