Обсуждение: Asynchronous MergeAppend

Поиск

Список

Период

Сортировка

Asynchronous MergeAppend

От

Alexander Pyhalov

Дата:

17 июля 2024 г., 13:24:28

Hello.

I'd like to make MergeAppend node Async-capable like Append node. 
Nowadays when planner chooses MergeAppend plan, asynchronous execution 
is not possible. With attached patches you can see plans like

EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
                                                           QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
  Merge Append
    Sort Key: async_pt.b, async_pt.a
    ->  Async Foreign Scan on public.async_p1 async_pt_1
          Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
          Remote SQL: SELECT a, b, c FROM public.base_tbl1 WHERE (((b % 
100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
    ->  Async Foreign Scan on public.async_p2 async_pt_2
          Output: async_pt_2.a, async_pt_2.b, async_pt_2.c
          Remote SQL: SELECT a, b, c FROM public.base_tbl2 WHERE (((b % 
100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST

This can be quite profitable (in our test cases you can gain up to two 
times better speed with MergeAppend async execution on remote servers).

Code for asynchronous execution in Merge Append was mostly borrowed from 
Append node.

What significantly differs - in ExecMergeAppendAsyncGetNext() you must 
return tuple from the specified slot.
Subplan number determines tuple slot where data should be retrieved to. 
When subplan is ready to provide some data,
it's cached in ms_asyncresults. When we get tuple for subplan, specified 
in ExecMergeAppendAsyncGetNext(),
ExecMergeAppendAsyncRequest() returns true and loop in 
ExecMergeAppendAsyncGetNext() ends. We can fetch data for
subplans which either don't have cached result ready or have already 
returned them to the upper node. This
flag is stored in ms_has_asyncresults. As we can get data for some 
subplan either earlier or after loop in ExecMergeAppendAsyncRequest(),
we check this flag twice in this function.
Unlike ExecAppendAsyncEventWait(), it seems 
ExecMergeAppendAsyncEventWait() doesn't need a timeout - as there's no 
need to get result
from synchronous subplan if a tuple form async one was explicitly 
requested.

Also we had to fix postgres_fdw to avoid directly looking at Append 
fields. Perhaps, accesors to Append fields look strange, but allows
to avoid some code duplication. I suppose, duplication could be even 
less if we reworked async Append implementation, but so far I haven't
tried to do this to avoid big diff from master.

Also mark_async_capable() believes that path corresponds to plan. This 
can be not true when create_[merge_]append_plan() inserts sort node.
In this case mark_async_capable() can treat Sort plan node as some other 
and crash, so there's a small fix for this.
-- 
Best regards,
Alexander Pyhalov,
Postgres Professional

Вложения

Re: Asynchronous MergeAppend

От

Alena Rybakina

Дата:

10 августа 2024 г., 20:24:43

Hi! Thank you for your work on this subject! I think this is a very 
useful optimization)

While looking through your code, I noticed some points that I think 
should be taken into account. Firstly, I noticed only two tests to 
verify the functionality of this function and I think that this is not 
enough.
Are you thinking about adding some tests with queries involving, for 
example, join connections with different tables and unusual operators?

In addition, I have a question about testing your feature on a 
benchmark. Are you going to do this?

On 17.07.2024 16:24, Alexander Pyhalov wrote:
> Hello.
>
> I'd like to make MergeAppend node Async-capable like Append node. 
> Nowadays when planner chooses MergeAppend plan, asynchronous execution 
> is not possible. With attached patches you can see plans like
>
> EXPLAIN (VERBOSE, COSTS OFF)
> SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
>                                                           QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------

>
>  Merge Append
>    Sort Key: async_pt.b, async_pt.a
>    ->  Async Foreign Scan on public.async_p1 async_pt_1
>          Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
>          Remote SQL: SELECT a, b, c FROM public.base_tbl1 WHERE (((b % 
> 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
>    ->  Async Foreign Scan on public.async_p2 async_pt_2
>          Output: async_pt_2.a, async_pt_2.b, async_pt_2.c
>          Remote SQL: SELECT a, b, c FROM public.base_tbl2 WHERE (((b % 
> 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
>
> This can be quite profitable (in our test cases you can gain up to two 
> times better speed with MergeAppend async execution on remote servers).
>
> Code for asynchronous execution in Merge Append was mostly borrowed 
> from Append node.
>
> What significantly differs - in ExecMergeAppendAsyncGetNext() you must 
> return tuple from the specified slot.
> Subplan number determines tuple slot where data should be retrieved 
> to. When subplan is ready to provide some data,
> it's cached in ms_asyncresults. When we get tuple for subplan, 
> specified in ExecMergeAppendAsyncGetNext(),
> ExecMergeAppendAsyncRequest() returns true and loop in 
> ExecMergeAppendAsyncGetNext() ends. We can fetch data for
> subplans which either don't have cached result ready or have already 
> returned them to the upper node. This
> flag is stored in ms_has_asyncresults. As we can get data for some 
> subplan either earlier or after loop in ExecMergeAppendAsyncRequest(),
> we check this flag twice in this function.
> Unlike ExecAppendAsyncEventWait(), it seems 
> ExecMergeAppendAsyncEventWait() doesn't need a timeout - as there's no 
> need to get result
> from synchronous subplan if a tuple form async one was explicitly 
> requested.
>
> Also we had to fix postgres_fdw to avoid directly looking at Append 
> fields. Perhaps, accesors to Append fields look strange, but allows
> to avoid some code duplication. I suppose, duplication could be even 
> less if we reworked async Append implementation, but so far I haven't
> tried to do this to avoid big diff from master.
>
> Also mark_async_capable() believes that path corresponds to plan. This 
> can be not true when create_[merge_]append_plan() inserts sort node.
> In this case mark_async_capable() can treat Sort plan node as some 
> other and crash, so there's a small fix for this.

I think you should add this explanation to the commit message because 
without it it's hard to understand the full picture of how your code works.

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Asynchronous MergeAppend

От

Alexander Pyhalov

Дата:

26 июля, 10:56:22

Hi.

I've updated patches for asynchronous merge append. They allowed us to 
significantly improve performance in practice. Earlier select from 
partitioned (and distributed table) could switch to synchronous merge 
append plan from asynchronous append. Given that table could have 20+ 
partitions, it was cheaper, but much less efficient due to remote parts 
executing synchronously.

In this version there's a couple of small fixes - earlier 
ExecMergeAppend() scanned all asyncplans, but should do this only for 
valid asyncplans. Also incorporated logic from

commit af717317a04f5217728ce296edf4a581eb7e6ea0
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date:   Wed Mar 12 20:53:09 2025 +0200

     Handle interrupts while waiting on Append's async subplans

into ExecMergeAppendAsyncEventWait().

-- 
Best regards,
Alexander Pyhalov,
Postgres Professional

Вложения

Re: Asynchronous MergeAppend

От

Álvaro Herrera

Дата:

25 октября, 14:59:09

I noticed that this patch has gone largely unreviewed, but it needs
rebase due to the GUC changes, so here it is again.

Thanks

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Вложения

Re: Asynchronous MergeAppend

От

"Matheus Alcantara"

Дата:

03 ноября, 16:00:48

Hi, thanks for working on this!

On Tue Aug 20, 2024 at 6:14 AM -03, Alexander Pyhalov wrote:
>> In addition, I have a question about testing your feature on a
>> benchmark. Are you going to do this?
>>
>
> The main reason for this work is a dramatic performance degradation when
> Append plans with async foreign scan nodes are switched to MergeAppend
> plans with synchronous foreign scans.
>
> I've performed some synthetic tests to prove the benefits of async Merge
> Append. So far tests are performed on one physical host.
>
> For tests I've deployed 3 PostgreSQL instances on ports 5432-5434.
>
> The first instance:
> create server s2 foreign data wrapper postgres_fdw OPTIONS ( port
> '5433', dbname 'postgres', async_capable 'on');
> create server s3 foreign data wrapper postgres_fdw OPTIONS ( port
> '5434', dbname 'postgres', async_capable 'on');
>
> create foreign table players_p1 partition of players for values with
> (modulus 4, remainder 0) server s2;
> create foreign table players_p2 partition of players for values with
> (modulus 4, remainder 1) server s2;
> create foreign table players_p3 partition of players for values with
> (modulus 4, remainder 2) server s3;
> create foreign table players_p4 partition of players for values with
> (modulus 4, remainder 3) server s3;
>
> s2 instance:
> create table players_p1  (id int, name text, score int);
> create table players_p2  (id int, name text, score int);
> create index on players_p1(score);
> create index on players_p2(score);
>
> s3 instance:
> create table players_p3  (id int, name text, score int);
> create table players_p4  (id int, name text, score int);
> create index on players_p3(score);
> create index on players_p4(score);
>
> s1 instance:
> insert into players select i, 'player_' ||i, random()* 100 from
> generate_series(1,100000) i;
>
> pgbench script:
> \set rnd_offset random(0,200)
> \set rnd_limit  random(10,20)
>
> select * from players order by score desc offset :rnd_offset limit
> :rnd_limit;
>
> pgbench was run as:
> pgbench -n -f 1.sql  postgres -T 100 -c 16 -j 16
>
> CPU idle was about 5-10%.
>
> pgbench results:
>
> [...]
> However, if we set number of threads to 1, so that CPU has idle cores,
> we'll see more evident improvements:
>
> Patched, async_capable on:
> pgbench (14.13, server 18devel)
> transaction type: 1.sql
> scaling factor: 1
> query mode: simple
> number of clients: 1
> number of threads: 1
> duration: 100 s
> number of transactions actually processed: 20221
> latency average = 4.945 ms
> initial connection time = 7.035 ms
> tps = 202.221816 (without initial connection time)
>
>
> Patched, async_capable off
> transaction type: 1.sql
> scaling factor: 1
> query mode: simple
> number of clients: 1
> number of threads: 1
> duration: 100 s
> number of transactions actually processed: 14941
> latency average = 6.693 ms
> initial connection time = 7.037 ms
> tps = 149.415688 (without initial connection time)
>
I ran some benchmarks based on v4 attached by Alvaro in [1] using a
smaller number of threads so that some CPU cores would be idle and I
also obtained better results:

Patched, async_capable on:
tps = 4301.567405

Master, async_capable on:
tps = 3847.084545

So I'm +1 for the idea. I know it's been while since the last patch, and
unfortunully it hasn't received reviews since then. Do you still plan to
work on it? I still need to take a look on the code to see if I can help
with some comments.

During the tests I got compiler errors due to fce7c73fba4, so I'm
attaching a v5 with guc_parameters.dat correctly sorted.

The postgres_fdw/regress tests was also failling due to some whitespace
problems, v5 also fix this.

[1] https://www.postgresql.org/message-id/202510251154.isknefznk566%40alvherre.pgsql

--
Matheus Alcantara

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Asynchronous MergeAppend

Asynchronous MergeAppend

Вложения

Re: Asynchronous MergeAppend

Re: Asynchronous MergeAppend

Вложения

Re: Asynchronous MergeAppend

Вложения

Re: Asynchronous MergeAppend

Вложения