Re: An improvement on parallel DISTINCT

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: An improvement on parallel DISTINCT
Дата
Msg-id CAApHDvrcDvMpAraQ15TzANgKed9pREBCZ+qXLiuKrfHPf_jhaA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: An improvement on parallel DISTINCT  (Richard Guo <guofenglinux@gmail.com>)
Ответы Re: An improvement on parallel DISTINCT  (David Rowley <dgrowleyml@gmail.com>)
Re: An improvement on parallel DISTINCT  (Richard Guo <guofenglinux@gmail.com>)
Список pgsql-hackers
On Fri, 2 Feb 2024 at 20:47, Richard Guo <guofenglinux@gmail.com> wrote:
>
>
> On Fri, Feb 2, 2024 at 11:26 AM David Rowley <dgrowleyml@gmail.com> wrote:
>>
>> In light of this, do you still think it's worthwhile making this change?
>>
>> For me, I think all it's going to result in is extra planner work
>> without any performance gains.
>
>
> Hmm, with the query below, I can see that the new plan is cheaper than
> the old plan, and the cost difference exceeds STD_FUZZ_FACTOR.
>
> create table t (a int, b int);
> insert into t select i%100000, i from generate_series(1,10000000)i;
> analyze t;
>
> explain (costs on) select distinct a from t order by a limit 1;

OK, a LIMIT clause... I didn't think of that.  Given the test results
below, I'm pretty convinced we should make the change.

Performance testing on an AMD 3990x with work_mem=4MB and hash_mem_multiplier=2.

$ cat bench.sql
select distinct a from t order by a limit 1;
$ pgbench -n -T 60 -f bench.sql postgres

-- Master

max_parallel_workers_per_gather=2;
latency average = 470.310 ms
latency average = 468.673 ms
latency average = 469.463 ms

max_parallel_workers_per_gather=4;
latency average = 346.012 ms
latency average = 346.662 ms
latency average = 347.591 ms

max_parallel_workers_per_gather=8; + alter table t set (parallel_workers=8);
latency average = 300.298 ms
latency average = 300.029 ms
latency average = 300.314 ms

-- Patched

max_parallel_workers_per_gather=2;
latency average = 424.176 ms
latency average = 431.870 ms
latency average = 431.870 ms (9.36% faster than master)

max_parallel_workers_per_gather=4;
latency average = 279.837 ms
latency average = 280.893 ms
latency average = 281.518 ms (23.51% faster than master)

max_parallel_workers_per_gather=8; + alter table t set (parallel_workers=8);
latency average = 178.585 ms
latency average = 178.780 ms
latency average = 179.768 ms (67.68% faster than master)

So the gains increase with more parallel workers due to pushing more
work to the worker. Amdahl's law approves of this.

I'll push the patch shortly.

David



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: InstallXLogFileSegment() vs concurrent WAL flush
Следующее
От: Ranier Vilela
Дата:
Сообщение: Re: Avoid unncessary always true test (src/backend/storage/buffer/bufmgr.c)