Re: Use merge-based matching for MCVs in eqjoinsel

Поиск

Список

Период

Сортировка

От	David Geier
Тема	Re: Use merge-based matching for MCVs in eqjoinsel
Дата	17 ноября 18:28:23
Msg-id	3a4211f6-c6d6-4ff3-a5c5-231179e4e519@gmail.com обсуждение исходный текст
Ответ на	Re: Use merge-based matching for MCVs in eqjoinsel (Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>)
Ответы	Re: Use merge-based matching for MCVs in eqjoinsel
Список	pgsql-hackers

Дерево обсуждения

Hi Ilia!

On 16.09.2025 17:52, Ilia Evdokimov wrote:
> Hi hackers,
> 
> On 10.09.2025 16:56, Ilia Evdokimov wrote:
>> Unfortunately, the JOB benchmark does not contain semi join nodes.
>> However, TPC-DS does. I'll look for the queries with slowest planner
>> times there and check them.
>>
>> I'll need some time to check both join and semi join cases with small
>> and large default_statistics_target. I'll share the results later.
> 
> JOIN
> ==============================
> 
> I’ve benchmarked the new implementation of eqjoinsel() with different
> values of default_statistics_target. On small targets (1, 5, 10, 25, 50,
> 75, 100) the results are all within statistical noise, and I did not
> observe any regressions. In my view, it’s reasonable to keep the current
> condition that the hash table is not used for default_statistics_target
> = 1. Raising that threshold does not seem useful.
> 
> Here are the results for JOB queries (where the effect of semi join is
> not visible due to different data distributions):
> 
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> ------------------------------------------------------------------------------------------
> 1                         | 1.00                | 1846.643            |
> 1847.409
> 5                         | 1.00                | 1836.391            |
> 1828.318
> 10                        | 0.95                | 1841.750            |
> 1929.722
> 25                        | 0.99                | 1873.172            |
> 1890.741
> 50                        | 0.98                | 1869.897            |
> 1898.470
> 75                        | 1.02                | 1969.368            |
> 1929.521
> 100                       | 0.97                | 1857.890            |
> 1921.207
> 1000                      | 1.14                | 2279.700            |
> 1997.102
> 2500                      | 1.78                | 4682.658            |
> 2636.202
> 5000                      | 6.45                | 15943.696           |
> 2471.242
> 7500                      | 12.45               | 34350.855           |
> 2758.565
> 10000                     | 20.52               | 62519.342           |
> 3046.819
> 

Good that we've confirmed that.

> SEMI JOIN
> ==============================
> 
> Unfortunately, in TPC-DS it is not possible to clearly see improvements
> for semi joins. To address this, I designed a synthetic example where
> the data distribution forces the loop to run fully, without exiting
> early, which makes the effect on semi joins more visible. In this setup,
> I also ensured that the length of the MCV array is equal to the chosen
> default_statistics_target.
> 
> CREATE TABLE t1 AS
> SELECT CASE
>          WHEN g <= 3000000 * 0.9 THEN (g % 10000) + 1
>          ELSE (g % 1000000) + 10000
>        END AS id
> FROM generate_series(1, 3000000) g;
> 
> CREATE TABLE t2 AS
> SELECT CASE
>          WHEN g <= 3000000 * 0.9 THEN (g % 10000) + 10001
>          ELSE (g % 1000000) + 20000
>        END AS id
> FROM generate_series(1, 3000000) g;
> 
> ANALYZE t1, t2;
> 
> The results of the query are:
> 
> SELECT * FROM t1
> WHERE id IN (SELECT id FROM t2);
> 
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> ------------------------------------------------------------------------------------------
> 1                         | 1.12                | 1.191               |
> 1.062
> 5                         | 1.02                | 0.493               |
> 0.481
> 10                        | 0.92                | 0.431               |
> 0.471
> 25                        | 1.27                | 0.393               |
> 0.309
> 50                        | 1.04                | 0.432               |
> 0.416
> 75                        | 0.96                | 0.398               |
> 0.415
> 100                       | 0.95                | 0.450               |
> 0.473
> 1000                      | 9.42                | 6.742               |
> 0.716
> 2500                      | 19.15               | 21.621              |
> 1.129
> 5000                      | 46.74               | 85.667              |
> 1.833
> 7500                      | 73.26               | 194.806             |
> 2.659
> 10000                     | 107.95              | 349.981             |
> 3.242
> 

That's some decent speedups, considering that it's planning time.

Thanks for testing the code!

--
David Geier

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Use merge-based matching for MCVs in eqjoinsel