Re: Support tid range scan in parallel?

Поиск

Список

Период

Сортировка

От	Cary Huang
Тема	Re: Support tid range scan in parallel?
Дата	30 апреля 2024 г. 19:10:12
Msg-id	18f306a1ffd.c3e46e455499033.265730539542922404@highgo.ca обсуждение исходный текст
Ответ на	Re: Support tid range scan in parallel? (David Rowley <dgrowleyml@gmail.com>)
Ответы	Re: Support tid range scan in parallel?
Список	pgsql-hackers

Дерево обсуждения

Hi David

Thank you for your reply.

> From a CPU point of view, I'd hard to imagine that a SELECT * query
> without any other items in the WHERE clause other than the TID range
> quals would run faster with multiple workers than with 1.  The problem
> is the overhead of pushing tuples to the main process often outweighs
> the benefits of the parallelism.  However, from an I/O point of view
> on a server with slow enough disks, I can imagine there'd be a
> speedup.

yeah, this is generally true. With everything set to default, the planner would not choose parallel sequential scan if
thescan range covers mostly all tuples of a table (to reduce the overhead of pushing tuples to main proc as you
mentioned).It is preferred when the target data is small but the table is huge. In my case, it is also the same, the
plannerby default uses normal tid range scan, so I had to alter cost parameters to influence the planner's decision.
Thisis where I found that with WHERE clause only containing TID ranges that cover the entire table would result faster
withparallel workers, at least in my environment.
 

> Of course, it may be beneficial to have parallel TID Range for other
> cases when more row filtering or aggregation is being done as that
> requires pushing fewer tuples over from the parallel worker to the
> main process. It just would be good to get to the bottom of if there's
> still any advantage to parallelism when no filtering other than the
> ctid quals is being done now that we've less chance of having to wait
> for I/O coming from disk with the read streams code.

I believe so too. I shared my test procedure below with ctid being the only quals. 

>> below is the timing to complete a select query covering all the records in a simple 2-column table with 40 million
records,
>>
>> - tid range scan takes 10216ms
>> - tid range scan with 2 workers takes 7109ms
>> - sequential scan with 2 workers takes 8499ms
>
> Can you share more details about this test? i.e. the query, what the
> times are that you've measured (EXPLAIN ANALYZE, or SELECT, COPY?).
> Also, which version/commit did you patch against? I was wondering if
> the read stream code added in v17 would result in the serial case
> running faster because the parallelism just resulted in more I/O
> concurrency.

Yes of course. These numbers were obtained earlier this year on master with the patch applied most likely without the
readstream code you mentioned. The patch attached here is rebased to commit dd0183469bb779247c96e86c2272dca7ff4ec9e7 on
master,which is quite recent and should have the read stream code for v17 as I can immediately tell that the serial
scansrun much faster now in my setup. I increased the records on the test table from 40 to 100 million because serial
scansare much faster now. Below is the summary and details of my test. Note that I only include the EXPLAIN ANALYZE
detailsof round1 test. Round2 is the same except for different execution times. 
 

[env]
- OS: Ubuntu 18.04
- CPU: 4 cores @ 3.40 GHz
- MEM: 16 GB

[test table setup]
initdb with all default values
CREATE TABLE test (a INT, b TEXT);
INSERT INTO test VALUES(generate_series(1,100000000), 'testing');
SELECT min(ctid), max(ctid) from test;
  min  |     max
-------+--------------
 (0,1) | (540540,100)
(1 row)

[summary]
round 1:
tid range scan: 14915ms
tid range scan 2 workers: 12265ms
seq scan with 2 workers: 12675ms

round2:
tid range scan: 12112ms
tid range scan 2 workers: 10649ms
seq scan with 2 workers: 11206ms

[details of EXPLAIN ANALYZE below]

[default tid range scan]
EXPLAIN ANALYZE SELECT a FROM test WHERE ctid >= '(1,0)' AND ctid <= '(540540,100)';
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Tid Range Scan on test  (cost=0.01..1227029.81 rows=68648581 width=4) (actual time=0.188..12280.791 rows=99999815
loops=1)
   TID Cond: ((ctid >= '(1,0)'::tid) AND (ctid <= '(540540,100)'::tid))
 Planning Time: 0.817 ms
 Execution Time: 14915.035 ms
(4 rows)

[parallel tid range scan with 2 workers]
set parallel_setup_cost=0;
set parallel_tuple_cost=0;
set min_parallel_table_scan_size=0;
set max_parallel_workers_per_gather=2;

EXPLAIN ANALYZE SELECT a FROM test WHERE ctid >= '(1,0)' AND ctid <= '(540540,100)';
                                                               QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.01..511262.43 rows=68648581 width=4) (actual time=1.322..9249.197 rows=99999815 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Tid Range Scan on test  (cost=0.01..511262.43 rows=28603575 width=4) (actual time=0.332..4906.262
rows=33333272loops=3)
 
         TID Cond: ((ctid >= '(1,0)'::tid) AND (ctid <= '(540540,100)'::tid))
 Planning Time: 0.213 ms
 Execution Time: 12265.873 ms
(7 rows)

[parallel seq scan with 2 workers]
set enable_tidscan = 'off';

EXPLAIN ANALYZE SELECT a FROM test WHERE ctid >= '(1,0)' AND ctid <= '(540540,100)';
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.00..969595.42 rows=68648581 width=4) (actual time=4.489..9713.299 rows=99999815 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test  (cost=0.00..969595.42 rows=28603575 width=4) (actual time=0.995..5541.178
rows=33333272loops=3)
 
         Filter: ((ctid >= '(1,0)'::tid) AND (ctid <= '(540540,100)'::tid))
         Rows Removed by Filter: 62
 Planning Time: 0.129 ms
 Execution Time: 12675.681 ms
(8 rows)


Best regards

Cary Huang
-------------
HighGo Software Inc. (Canada)
cary.huang@highgo.ca
www.highgo.ca

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Support tid range scan in parallel?