Queries runs slow on GPU with PG-Strom
От | YANG |
---|---|
Тема | Queries runs slow on GPU with PG-Strom |
Дата | |
Msg-id | BLU436-SMTP200807E5D5EABD07576C20C1830@phx.gbl обсуждение исходный текст |
Ответы |
Re: Queries runs slow on GPU with PG-Strom
(Kouhei Kaigai <kaigai@ak.jp.nec.com>)
|
Список | pgsql-hackers |
Hello, I've performed some tests on pg_strom according to the wiki. But it seems that queries run slower on GPU than CPU. Can someone shed a light on what's wrong with my settings. My setup was Quadro K620 + CUDA 7.0 (For Ubuntu 14.10) + Ubuntu 15.04. And the results was with pg_strom ============= explain SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=190993.70..190993.71 rows=1 width=0) (actual time=18792.236..18792.236 rows=1 loops=1) -> Custom Scan (GpuPreAgg) (cost=7933.07..184161.18 rows=86 width=108) (actual time=4249.656..18792.074 rows=77 loops=1) Bulkload:On (density: 100.00%) Reduction: NoGroup Device Filter: (sqrt((((x - '25.6'::double precision) ^ '2'::doubleprecision) + ((y - '12.8'::double precision) ^ '2'::double precision))) < '10'::double precision) -> CustomScan (BulkScan) on t0 (cost=6933.07..182660.32 rows=10000060 width=0) (actual time=139.399..18499.246 rows=10000000loops=1)Planning time: 0.262 msExecution time: 19268.650 ms (8 rows) explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------HashAggregate (cost=298541.48..298541.81 rows=26 width=12) (actual time=11311.568..11311.572 rows=26 loops=1) Group Key: t0.cat -> CustomScan (GpuPreAgg) (cost=5178.82..250302.07 rows=1088 width=52) (actual time=3304.727..11310.021 rows=2307 loops=1) Bulkload: On (density: 100.00%) Reduction: Local + Global -> Custom Scan (GpuJoin) (cost=4178.82..248541.18rows=10000060 width=12) (actual time=923.417..2661.113 rows=10000000 loops=1) Bulkload:On (density: 100.00%) Depth 1: Logic: GpuHashJoin, HashKeys: (aid), JoinQual: (aid = aid), nrows_ratio:1.00000000 -> Custom Scan (BulkScan) on t0 (cost=0.00..242858.60 rows=10000060 width=16) (actualtime=6.980..871.431 rows=10000000 loops=1) -> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=4)(actual time=0.204..7.309 rows=40000 loops=1)Planning time: 47.834 msExecution time: 11355.103 ms (12 rows) without pg_strom ================ test=# explain analyze SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=426193.03..426193.04 rows=1 width=0) (actual time=3880.379..3880.379 rows=1 loops=1) -> Seq Scan on t0 (cost=0.00..417859.65rows=3333353 width=0) (actual time=0.075..3859.200 rows=314063 loops=1) Filter: (sqrt((((x -'25.6'::double precision) ^ '2'::double precision) + ((y - '12.8'::double precision) ^ '2'::double precision))) < '10'::doubleprecision) Rows Removed by Filter: 9685937Planning time: 0.411 msExecution time: 3880.445 ms (6 rows) t=# explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------HashAggregate (cost=431593.73..431594.05 rows=26 width=12) (actual time=4960.810..4960.812 rows=26 loops=1) Group Key: t0.cat -> HashJoin (cost=1234.00..381593.43 rows=10000060 width=12) (actual time=20.859..3367.510 rows=10000000 loops=1) HashCond: (t0.aid = t1.aid) -> Seq Scan on t0 (cost=0.00..242858.60 rows=10000060 width=16) (actual time=0.021..895.908rows=10000000 loops=1) -> Hash (cost=734.00..734.00 rows=40000 width=4) (actual time=20.567..20.567rows=40000 loops=1) Buckets: 65536 Batches: 1 Memory Usage: 1919kB -> SeqScan on t1 (cost=0.00..734.00 rows=40000 width=4) (actual time=0.017..11.013 rows=40000 loops=1)Planning time: 0.567msExecution time: 4961.029 ms (10 rows) Here is the details how I installed pg_strom, 1. download postgresql 9.5alpha1 and compile it with ,---- | ./configure --prefix=/export/pg-9.5 --enable-debug --enable-cassert | make -j8 all | make install `---- 2. install cuda-7.0 (ubuntu 14.10 package from nvidia website) 3. download and compile pg_strom with pg_config in /export/pg-9.5/bin ,---- | make | make install `---- 4. create a db with --no-local ,---- | initdb --no-local 9.5 `---- 5. change postgresql.conf ,---- | shared_buffers=1GB | shared_preload_libraries='pg_strom.so' | logging_collector = on | log_filename='postgresql-%d.log' | pg_strom.enabled=on `---- 6. start postgres ,---- | pg_ctl -D 9.5 start `---- and got the following outputs ,---- | LOG: CUDA Runtime version: 7.0.0 | LOG: NVIDIA driver version: 346.59 | LOG: GPU0 QuadroK620 (384 CUDA cores, 1124MHz), L2 2048KB, RAM 2047MB (128bits, 900KHz), capability 5.0 | LOG: NVRTC - CUDARuntime Compilation vertion 7.0 | LOG: redirecting log output to logging collector process | HINT: Futurelog output will appear in directory "pg_log". `---- 7. import testdb ,---- | createdb test | psql test < ~/devel/pg_strom/test/testdb.sql | psql test -c 'create extensionpg_strom' `----
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Ildus KurbangalievДата:
Сообщение: Re: RFC: replace pg_stat_activity.waiting with something more descriptive