Обсуждение: Use merge-based matching for MCVs in eqjoinsel

Поиск
Список
Период
Сортировка

Use merge-based matching for MCVs in eqjoinsel

От
Ilia Evdokimov
Дата:

Hi hackers,

While analyzing planner performance on JOB with default_statistics_target = 1000, I noticed that a significant portion of planning time is spent inside the eqjoinsel() function. According to perf, in most JOB queries at default_statistics_target = 1000, eqjoinsel() is the most expensive function during planning, accounting for approximately 8% of total CPU time. At default_statistics_target = 10000, the planner spend up to 75% of its time inside eqjoinsel(), making it one of the primary bottlenecks.

Еhis overhead is caused by the O(N^2) nested-loop comparison of MCVs in var1 = var2 clauses.

I propose an optimization: when the column datatype supports ordering(i.e., has < and >), we can sort both MCV lists and apply mege-style algorithm to detect matches. This reduces runtime from O(N^2) to O(NlogN), where N is the number of MCV entries. The patch also applies the same optimization to semi-join clauses, which show similar performance behavior.

On JOB, this changes reduce planner time in most queries with complex joins and large MCVs with no observable effect on plan quality. I’ve also attached bar charts showing per-query planner time before and after the patch for default_statistics_target = 100, 1000, 10000 along with query numbers for reference.

Any feedback or suggestions are welcome!

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Вложения

Re: Use merge-based matching for MCVs in eqjoinsel

От
Ilia Evdokimov
Дата:


On 21.07.2025 16:55, Ilia Evdokimov wrote:

While analyzing planner performance on JOB with default_statistics_target = 1000, I noticed that a significant portion of planning time is spent inside the eqjoinsel() function. According to perf, in most JOB queries at default_statistics_target = 1000, eqjoinsel() is the most expensive function during planning, accounting for approximately 8% of total CPU time. At default_statistics_target = 10000, the planner spend up to 75% of its time inside eqjoinsel(), making it one of the primary bottlenecks.

This overhead is caused by the O(N^2) nested-loop comparison of MCVs in var1 = var2 clauses.

I propose an optimization: when the column datatype supports ordering(i.e., has < and >), we can sort both MCV lists and apply mege-style algorithm to detect matches. This reduces runtime from O(N^2) to O(NlogN), where N is the number of MCV entries. The patch also applies the same optimization to semi-join clauses, which show similar performance behavior.


Following up on my previous message about optimizing eqjoinsel() for Var1 = Var2 and semijoin clauses, I’d like to share more detailed performance results across different values of default_statistics_target on the JOB benchmark.

The performance improvement grows as the number of MCV entries increases (i.e., with higher default_statistics_target). The table below shows total planner time summed over all 113 queries in JOB for each setting of default_statistics_target, before and after applying patch:

Total planner time across all JOB queries
=========================================
default_statistics_target | Before Patch (ms) | After Patch (ms)
--------------------------+-------------------+------------------
                      100 |          1828.433 |         1820.556
                     1000 |          2194.282 |         1963.110
                     2500 |          4606.705 |         2140.126
                     5000 |         16661.581 |         2616.109
                     7500 |         35988.569 |         3061.161
                    10000 |         66616.620 |         3504.144

For default_statistics_target < 1000, the planning time remains the same before and after the patch. The optimization reduces planner time substantially - by up to 13X at default_statistics_target = 10000 - while the generated plans and selectivity calculations remain unchanged. To illustrate this, the table below shows the 10 slowest JOB queries (by planning time), along with their planning times before and after applying the patch.

Top 10 slowest queries at default_statistics_target = 10000
===========================================================
Query | Before Patch (ms) | After Patch (ms)
------+--------------------+-------------------
  29c |           1939.282 |           111.219
  29b |           1939.237 |           100.109
  29a |           1931.870 |           100.224
  31c |           1622.255 |            67.609
  30c |           1602.156 |            70.942
  28b |           1521.026 |            84.058
  30b |           1519.972 |            68.777
  30a |           1518.014 |            70.529
  28a |           1514.908 |            86.662
  31a |           1507.303 |            63.579

As shown, the total planner time for these top 10 queries drops substantially with the optimization.


I’ve identified and fixed two issues in the original v1 patch: In 'eqjoinsel_semi' the second MCV array was allocated with an incorrect size. And the initialization of FunctionCallInfoData was moved outside the comparator compare_mcv_items() to avoid repeated setup during sorting. I've attached the updated v2 patch with changes.

Any suggestions?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Вложения

Re: Use merge-based matching for MCVs in eqjoinsel

От
Ilia Evdokimov
Дата:

Following up on my previous messages about optimizing eqjoinsel() and eqjoinsel_semi() for Var1 = Var2 clauses, I’d like to share detailed profiling results showing the effect of the patch on JOB for different values of default_statistics_target.

The first table shows the total planner time (summed over all 113 queries) before and after applying the patch, along with the speedup achieved:

default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms)
--------------------------+---------------------+---------------------+--------------------
                     100  |         1.00x       |        1828.433     |        1820.556
                    1000  |         1.12x       |        2194.282     |        1963.110
                    2500  |         2.15x       |        4606.705     |        2140.126
                    5000  |         6.37x       |       16661.581     |        2616.109
                    7500  |        11.76x       |       35988.569     |        3061.161
                   10000  |        19.01x       |       66616.620     |        3504.144


The second table shows the profiling of eqjoinsel() using perf, demonstrating that the function, which dominates planning at high statistics targets, becomes essentially negligible after the patch:

default_statistics_target | eqjoinsel() Before (perf) | eqjoinsel() After (perf)
--------------------------+---------------------------+--------------------------
                     100  |                     0.01% |                     0.04%
                    1000  |                     6.23% |                     0.06%
                    2500  |                    35.45% |                     0.23%
                    5000  |                    66.14% |                     0.53%
                    7500  |                    72.70% |                     0.97%
                   10000  |                    75.42% |                     1.25%

I’ve attached v3 of the patch. This version adds a check for NULL values when comparing MCV entries, ensuring correctness in edge cases.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Вложения

Re: Use merge-based matching for MCVs in eqjoinsel

От
Tom Lane
Дата:
Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes:
> I’ve attached v3 of the patch. This version adds a check for NULL values 
> when comparing MCV entries, ensuring correctness in edge cases.

Um ... what edge cases would those be?  We do not put NULL into
MCV arrays.

            regards, tom lane



Re: Use merge-based matching for MCVs in eqjoinsel

От
Ilia Evdokimov
Дата:
On 03.09.2025 23:26, Tom Lane wrote:
> Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes:
>> I’ve attached v3 of the patch. This version adds a check for NULL values
>> when comparing MCV entries, ensuring correctness in edge cases.
> Um ... what edge cases would those be?  We do not put NULL into
> MCV arrays.


You're right - MCV arrays never contain NULLs. However, comparing two
MCV values could theoretically return NULL, even though this is very
unlikely. This check existed even before my changes, and similar checks
are used in other selectivity-estimation functions in 'selfuncs.c'.

...
fcinfo->isnull = false;
fresult = FunctionCallInvoke(fcinfo);
if (!fcinfo->isnull && DatumGetBool(fresult))
...

By "edge cases" I was referring to this situation; I probably did not
choose the best wording.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com




Re: Use merge-based matching for MCVs in eqjoinsel

От
David Geier
Дата:
Hi Ilia!

On 29.07.2025 16:07, Ilia Evdokimov wrote:
> 
> On 21.07.2025 16:55, Ilia Evdokimov wrote:
>>
>> While analyzing planner performance on JOB with
>> default_statistics_target = 1000, I noticed that a significant portion
>> of planning time is spent inside the eqjoinsel() function. According
>> to perf, in most JOB queries at default_statistics_target = 1000,
>> eqjoinsel() is the most expensive function during planning, accounting
>> for approximately 8% of total CPU time. At default_statistics_target =
>> 10000, the planner spend up to 75% of its time inside eqjoinsel(),
>> making it one of the primary bottlenecks.
>>
>> This overhead is caused by the O(N^2) nested-loop comparison of MCVs
>> in var1 = var2 clauses.

Thanks for working on this. I've wanted to submit a patch for the very
same issue for a while. I've come across this issue multiple times in
the field.

>> I propose an optimization: when the column datatype supports
>> ordering(i.e., has < and >), we can sort both MCV lists and apply
>> mege-style algorithm to detect matches. This reduces runtime from
>> O(N^2) to O(NlogN), where N is the number of MCV entries. The patch
>> also applies the same optimization to semi-join clauses, which show
>> similar performance behavior.

Why do you sort both lists and then merge instead of putting the smaller
list into a hash map and then doing hash lookups (if the type is hashable)?

There are more problems like this in the planner. For example col IN
(many values) is also quadratic because for every value in the IN list
all MCVs are checked. It would be great to fix this as well.

--
David Geier



Re: Use merge-based matching for MCVs in eqjoinsel

От
David Geier
Дата:
Hi!

On 03.09.2025 18:53, Ilia Evdokimov wrote:
> Following up on my previous messages about optimizing eqjoinsel() and
> eqjoinsel_semi() for Var1 = Var2 clauses, I’d like to share detailed
> profiling results showing the effect of the patch on JOB for different
> values of default_statistics_target.
>
> The first table shows the total planner time (summed over all 113
> queries) before and after applying the patch, along with the speedup
> achieved:
>
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------+---------------------+---------------------
> +--------------------
>                      100  | *1.00x*       | 1828.433     |        1820.556
>                     1000  | *1.12x*       | 2194.282     |        1963.110
>                     2500  | *2.15x*       | 4606.705     |        2140.126
>                     5000  | *6.37x*       | 16661.581     |
2616.109
>                     7500  | *11.76x*       | 35988.569     |
> 3061.161
>                    10000  | *19.01x*       | 66616.620     |
> 3504.144
>

It looks to me like these results are with optimizations disabled?

Can you share the SQL script you used for testing?

--
David Geier



Re: Use merge-based matching for MCVs in eqjoinsel

От
David Geier
Дата:
Hi Ilia!

On 05.09.2025 16:03, David Geier wrote:
>>> I propose an optimization: when the column datatype supports
>>> ordering(i.e., has < and >), we can sort both MCV lists and apply
>>> mege-style algorithm to detect matches. This reduces runtime from
>>> O(N^2) to O(NlogN), where N is the number of MCV entries. The patch
>>> also applies the same optimization to semi-join clauses, which show
>>> similar performance behavior.
> 
> Why do you sort both lists and then merge instead of putting the smaller
> list into a hash map and then doing hash lookups (if the type is hashable)?

I've tested your and my code with the following script:

CREATE TABLE foo(col TEXT);
CREATE TABLE bar(col TEXT);
SET default_statistics_target = 10000;

-- Generate MCV values. PostgreSQL doesn't store MCVs if the table has
-- only a single value or every value has exactly the same cardinality.
DO $$
BEGIN
  FOR i IN 1..10000 LOOP
    FOR j IN 1..least(i, 50) LOOP
      INSERT INTO foo VALUES ('aaaaaaaaaaaaaaaaaaaa' || i::TEXT);
      INSERT INTO bar VALUES ('aaaaaaaaaaaaaaaaaaaa' || (i + 100000)::TEXT);
    END LOOP;
  END LOOP;
END;
$$;

ANALYZE foo, bar;
\timing on
EXPLAIN SELECT * FROM foo f, bar b WHERE f.col = b.col;

Results are:

- master: 433 ms
- Order+Merge: 11 ms
- Hash map: 4 ms

I've attached my draft patch.

--
David Geier
Вложения

Re: Use merge-based matching for MCVs in eqjoinsel

От
Ilia Evdokimov
Дата:
On 08.09.2025 13:08, David Geier wrote:
> Hi Ilia!
>
> On 05.09.2025 16:03, David Geier wrote:
>>>> I propose an optimization: when the column datatype supports
>>>> ordering(i.e., has < and >), we can sort both MCV lists and apply
>>>> mege-style algorithm to detect matches. This reduces runtime from
>>>> O(N^2) to O(NlogN), where N is the number of MCV entries. The patch
>>>> also applies the same optimization to semi-join clauses, which show
>>>> similar performance behavior.
>> Why do you sort both lists and then merge instead of putting the smaller
>> list into a hash map and then doing hash lookups (if the type is hashable)?
> I've tested your and my code with the following script:
>
> CREATE TABLE foo(col TEXT);
> CREATE TABLE bar(col TEXT);
> SET default_statistics_target = 10000;
>
> -- Generate MCV values. PostgreSQL doesn't store MCVs if the table has
> -- only a single value or every value has exactly the same cardinality.
> DO $$
> BEGIN
>    FOR i IN 1..10000 LOOP
>      FOR j IN 1..least(i, 50) LOOP
>        INSERT INTO foo VALUES ('aaaaaaaaaaaaaaaaaaaa' || i::TEXT);
>        INSERT INTO bar VALUES ('aaaaaaaaaaaaaaaaaaaa' || (i + 100000)::TEXT);
>      END LOOP;
>    END LOOP;
> END;
> $$;
>
> ANALYZE foo, bar;
> \timing on
> EXPLAIN SELECT * FROM foo f, bar b WHERE f.col = b.col;
>
> Results are:
>
> - master: 433 ms
> - Order+Merge: 11 ms
> - Hash map: 4 ms
>
> I've attached my draft patch.
>
> --
> David Geier


Hi David,

Thank you for reviewing.

I have read all the previous messages - and yes, you are right. I don’t
know why I didn’t consider using a hash table approach initially. Your
idea makes a lot of sense.

To evaluate it, I ran benchmarks on JOB with three variants:

$ ./benchmark.sh master
$ ./benchmark.sh merge
$ ./benchmark.sh hash

I compared total planning time across all 113 queries.

$ python3 planning_time.py master hash
default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1892.627            |
1886.969
1000                      | 1.09                | 2286.922            |
2100.099
2500                      | 1.94                | 4647.167            |
2400.711
5000                      | 6.15                | 17964.779           |
2919.914
7500                      | 10.58               | 38622.443           |
3650.375
10000                     | 16.33               | 69538.085           |
4257.864

$ python3 planning_time.py master merge
default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1892.627            |
1898.622
1000                      | 1.12                | 2286.922            |
2033.553
2500                      | 1.92                | 4647.167            |
2423.552
5000                      | 5.94                | 17964.779           |
3025.739
7500                      | 10.48               | 38622.443           |
3684.262
10000                     | 16.72               | 69538.085           |
4159.418

Based on these results, I’d prefer the hash lookup implementation, so I
think it makes sense to improve your patch further and bring it into
good shape. Shall I take care of that, or would you prefer to do it
yourself?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Вложения

Re: Use merge-based matching for MCVs in eqjoinsel

От
Ilia Evdokimov
Дата:
On 08.09.2025 13:35, Ilia Evdokimov wrote:
> Based on these results, I’d prefer the hash lookup implementation, so
> I think it makes sense to improve your patch further and bring it into
> good shape. Shall I take care of that, or would you prefer to do it
> yourself?



I realized I mistakenly copied the wrong results for the hash-map
version in my previous draft. Sorry about that. Here are the correct
benchmark results:

Merge

default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1892.627            |
1898.622
1000                      | 1.12                | 2286.922            |
2033.553
2500                      | 1.92                | 4647.167            |
2423.552
5000                      | 5.94                | 17964.779           |
3025.739
7500                      | 10.48               | 38622.443           |
3684.262
10000                     | 16.72               | 69538.085           |
4159.418


Hash-Map

default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1892.627            |
1886.969
1000                      | 1.09                | 2286.922            |
2100.099
2500                      | 1.94                | 4647.167            |
2400.711
5000                      | 6.15                | 17964.779           |
2919.914
7500                      | 10.58               | 38622.443           |
3650.375
10000                     | 16.33               | 69538.085           |
4257.864

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com




Re: Use merge-based matching for MCVs in eqjoinsel

От
David Geier
Дата:
Hi Ilia!

> I have read all the previous messages - and yes, you are right. I don’t
> know why I didn’t consider using a hash table approach initially. Your
> idea makes a lot of sense.

Your solution would be beneficial on top, for cases where the data type
is not hashable. But I think that's overkill for a v1. I would start
with the hash-based version.

> To evaluate it, I ran benchmarks on JOB with three variants:
> 
> $ ./benchmark.sh master
> $ ./benchmark.sh merge
> $ ./benchmark.sh hash
> 
> I compared total planning time across all 113 queries.

Was this running with optimizations? How did you extract the planning time?

> 
> $ python3 planning_time.py master hash
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100                       | 1.00                | 1892.627            |
> 1886.969
> 1000                      | 1.09                | 2286.922            |
> 2100.099
> 2500                      | 1.94                | 4647.167            |
> 2400.711
> 5000                      | 6.15                | 17964.779           |
> 2919.914
> 7500                      | 10.58               | 38622.443           |
> 3650.375
> 10000                     | 16.33               | 69538.085           |
> 4257.864
> 
> $ python3 planning_time.py master merge
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100                       | 1.00                | 1892.627            |
> 1898.622
> 1000                      | 1.12                | 2286.922            |
> 2033.553
> 2500                      | 1.92                | 4647.167            |
> 2423.552
> 5000                      | 5.94                | 17964.779           |
> 3025.739
> 7500                      | 10.48               | 38622.443           |
> 3684.262
> 10000                     | 16.72               | 69538.085           |
> 4159.418

I would have expected the delta between the "merge" and "hash" variant
to be bigger, especially for default_statistics_target=10000. My small
test also showed that. Any idea why this is not showing in your results?

> Based on these results, I’d prefer the hash lookup implementation, so I
> think it makes sense to improve your patch further and bring it into
> good shape. Shall I take care of that, or would you prefer to do it
> yourself?

I think process-wise it's best if you review my code and I do the changes.

Could you as part of your review test tables with just a few MCVs to
make sure we're not regressing "small" cases? For now I'm only bailing
if one of the two MCV lists has just a single value. I'm expecting the
gains from fine tuning this value to be not measurable but let's double
check.

--
David Geier



Re: Use merge-based matching for MCVs in eqjoinsel

От
David Geier
Дата:
On 08.09.2025 12:45, Ilia Evdokimov wrote:
> 
> I realized I mistakenly copied the wrong results for the hash-map
> version in my previous draft. Sorry about that. Here are the correct
> benchmark results:
> 
> Merge
> 
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100                       | 1.00                | 1892.627            |
> 1898.622
> 1000                      | 1.12                | 2286.922            |
> 2033.553
> 2500                      | 1.92                | 4647.167            |
> 2423.552
> 5000                      | 5.94                | 17964.779           |
> 3025.739
> 7500                      | 10.48               | 38622.443           |
> 3684.262
> 10000                     | 16.72               | 69538.085           |
> 4159.418
> 
> 
> Hash-Map
> 
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100                       | 1.00                | 1892.627            |
> 1886.969
> 1000                      | 1.09                | 2286.922            |
> 2100.099
> 2500                      | 1.94                | 4647.167            |
> 2400.711
> 5000                      | 6.15                | 17964.779           |
> 2919.914
> 7500                      | 10.58               | 38622.443           |
> 3650.375
> 10000                     | 16.33               | 69538.085           |
> 4257.864
> 

It still seems to me like something is fishy with the numbers or
something in the benchmark adds a lot over overhead so that small
differences in eqjoinsel_inner() don't show here.

The delta between "hash" and "merge" for default_statistics_target=10000
should be the biggest but it's actually slower.

--
David Geier



Re: Use merge-based matching for MCVs in eqjoinsel

От
Ilia Evdokimov
Дата:


On 08.09.2025 13:56, David Geier wrote:
To evaluate it, I ran benchmarks on JOB with three variants:

$ ./benchmark.sh master
$ ./benchmark.sh merge
$ ./benchmark.sh hash

I compared total planning time across all 113 queries.
Was this running with optimizations? How did you extract the planning time?


I save all query plans using EXPLAIN SUMMARY, then go through all the plans, read the 'Planning Time' for each, and sum them up.

I would have expected the delta between the "merge" and "hash" variant
to be bigger, especially for default_statistics_target=10000. My small
test also showed that. Any idea why this is not showing in your results?


So would I. With default_statistics_target = 10000 and the selectivity in the JOB queries being close to zero, the difference should be noticeable. I can only explain the previous results by cache-related effects on my machine.

I reran the benchmark on a clean cluster and collected the top slowest JOB queries — now the effect is clearly visible.

Merge (sum of all JOB queries)
==================
default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1888.105            | 1879.431
1000                      | 1.14                | 2282.239            | 2009.114
2500                      | 2.10                | 5595.030            | 2668.530
5000                      | 5.56                | 18544.933           | 3333.252
7500                      | 9.17                | 37390.956           | 4076.390
10000                     | 16.10               | 69319.479           | 4306.417

HashMap (sum of all JOB queries)
==================
default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms)
--------------------------------------------------------------------------------
100                     | 1.03                | 1888.105            | 1828.088
1000                    | 1.18                | 2282.239            | 1939.884
2500                    | 2.64                | 5595.030            | 2117.872
5000                    | 7.80                | 18544.933           | 2377.206
7500                    | 13.80               | 37390.956           | 2709.973
10000                   | 23.32               | 69319.479           | 2973.073

Top 10 slowest JOB queries (default_statistics_target = 10000)
Query | master (ms) | merge (ms) | Hash (ms)
------+-------------+------------+-----------
29c   | 1904.586    | 144.135    | 100.473
29b   | 1881.392    | 117.891    | 89.028
29a   | 1868.805    | 112.242    | 83.913
31c   | 1867.234    | 76.498     | 56.140
30c   | 1646.630    | 88.494     | 62.549
30b   | 1608.820    | 84.821     | 64.603
31a   | 1573.964    | 75.978     | 56.140
28a   | 1457.738    | 95.939     | 77.309
28b   | 1455.052    | 99.383     | 73.065
30a   | 1416.699    | 91.057     | 62.549


BTW, the hashmap from your patch could also be applied to eqjoinsel_semi() function.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Re: Use merge-based matching for MCVs in eqjoinsel

От
David Geier
Дата:
Hi!

On 08.09.2025 15:45, Ilia Evdokimov wrote:
> I reran the benchmark on a clean cluster and collected the top slowest
> JOB queries — now the effect is clearly visible.
> 
> Merge (sum of all JOB queries)
> ==================
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100                       | *1.00*                | 1888.105           
> | 1879.431
> 1000                      | *1.14*                | 2282.239           
> | 2009.114
> 2500                      | *2.10*                | 5595.030           
> | 2668.530
> 5000                      | *5.56*                | 18544.933          
> | 3333.252
> 7500                      | *9.17*                | 37390.956          
> | 4076.390
> 10000                     | *16.10*               | 69319.479          
> | 4306.417
> 
> HashMap (sum of all JOB queries)
> ==================
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100                     | *1.03*                | 1888.105            |
> 1828.088
> 1000                    | *1.18*                | 2282.239            |
> 1939.884
> 2500                    | *2.64*                | 5595.030            |
> 2117.872
> 5000                    | *7.80*                | 18544.933           |
> 2377.206
> 7500                    | *13.80*               | 37390.956           |
> 2709.973
> 10000                   | *23.32*               | 69319.479           |
> 2973.073
> 
> Top 10 slowest JOB queries (default_statistics_target = 10000)
> Query | master (ms) | merge (ms) | Hash (ms)
> ------+-------------+------------+-----------
> 29c   | 1904.586    | 144.135    | 100.473
> 29b   | 1881.392    | 117.891    | 89.028
> 29a   | 1868.805    | 112.242    | 83.913
> 31c   | 1867.234    | 76.498     | 56.140
> 30c   | 1646.630    | 88.494     | 62.549
> 30b   | 1608.820    | 84.821     | 64.603
> 31a   | 1573.964    | 75.978     | 56.140
> 28a   | 1457.738    | 95.939     | 77.309
> 28b   | 1455.052    | 99.383     | 73.065
> 30a   | 1416.699    | 91.057     | 62.549

This looks much better. Very nice!

> 
> BTW, the hashmap from your patch could also be applied to
> eqjoinsel_semi() function.
> 

Yep. The inner loop only runs until clamped_nvalues2 and it doesn't
compute matchprodfreq. I'll try to modify the patch such that it
accounts for these differences without being too hard to read.

Do you think anything else needs changes in the patch? Did you have a
chance to check tables with just few MCVs or are there any queries in
the JOB which regress with very small default_statistics_target?

--
David Geier



Re: Use merge-based matching for MCVs in eqjoinsel

От
Ilia Evdokimov
Дата:

Hi David,

On 08.09.2025 17:36, David Geier wrote:
Do you think anything else needs changes in the patch?


From an architectural perspective, I think the patch is already in good shape. However, I have some suggestions regarding code style:

  1. I would move McvHashEntry, McvHashContext, he new hash table definition, hash_mcv and are_mcvs_equal to the top.
  2. I’m not sure get_hash_func_oid() is needed at all – it seems we could do without it.
  3. It would be better to name the parameters matchProductFrequencies -> matchprodfreq, nMatches -> nmatches, hasMatch1 -> hasmatch1, hasMatch2 -> hasmatch2 in eqjoinsel_inner_with_hashtable().
  4. As I wrote earlier, since we now have a dedicated function eqjoinsel_inner_with_hashtable(), perhaps it could be used in both eqjoinsel_inner() and eqjoinsel_semi(). And since the hash-based search was factored out, maybe it would make sense to also factor out the O(N^2) nested loop implementation?
  5. I think it would be helpful to add a comment explaining that using a hash table is not efficient when the MCV array length equals 1:

if (Min(statsSlot1->nvalues, statsSlot2->nvalues) == 1)
    return false;

Did you have a
chance to check tables with just few MCVs or are there any queries in
the JOB which regress with very small default_statistics_target?


Sure. I need some time to check this.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Re: Use merge-based matching for MCVs in eqjoinsel

От
David Geier
Дата:
Hi!

Thanks for the review.

On 09.09.2025 09:29, Ilia Evdokimov wrote:
> From an architectural perspective, I think the patch is already in good
> shape. However, I have some suggestions regarding code style:
> 
> 1. I would move McvHashEntry, McvHashContext, he new hash table
>    definition, hash_mcv and are_mcvs_equal to the top.
Done. I've also moved up try_eqjoinsel_with_hashtable().

> 2. I’m not sure get_hash_func_oid() is needed at all – it seems we
>    could do without it.
Removed.

> 3. It would be better to name the parameters matchProductFrequencies ->
>    matchprodfreq, nMatches -> nmatches, hasMatch1 -> hasmatch1,
>    hasMatch2 -> hasmatch2 in eqjoinsel_inner_with_hashtable().
Done.

> 4. As I wrote earlier, since we now have a dedicated function
>    eqjoinsel_inner_with_hashtable(), perhaps it could be used in both
>    eqjoinsel_inner() and eqjoinsel_semi(). And since the hash-based
Done.

The gains for SEMI join are even bigger because the function is called 3
times for e.g. EXPLAIN SELECT * FROM foo f WHERE EXISTS (SELECT 1 FROM
bar b where f.col = b.col); For that query the planning time for me goes
from ~1300 ms -> 12 ms.

>    search was factored out, maybe it would make sense to also factor
>    out the O(N^2) nested loop implementation?
Generally agreed and while tempting, I've refrained from doing the
refactoring. Let's better do this in a separate patch to keep things simple.

> 5. I think it would be helpful to add a comment explaining that using a
>    hash table is not efficient when the MCV array length equals 1:
> 
> if (Min(statsSlot1->nvalues, statsSlot2->nvalues) == 1)
>     return false;
Done.
>> Did you have a
>> chance to check tables with just few MCVs or are there any queries in
>> the JOB which regress with very small default_statistics_target?
> 
> 
> Sure. I need some time to check this.
> 
Could you please do that with the latest attached patch so that we test
it once more?

Could you also run once more the JOB benchmark to get some test coverage
on the SEMI join code (assuming it also uses SEMI joins)?

Once we've concluded on above and there are no objections, I will
register this patch in the commit fest.

--
David Geier
Вложения

Re: Use merge-based matching for MCVs in eqjoinsel

От
Ilia Evdokimov
Дата:
Hi!

On 09.09.2025 12:22, David Geier wrote:
> Could you please do that with the latest attached patch so that we test
> it once more?


LGTM. Yes, I'll test this patch.

>
> Could you also run once more the JOB benchmark to get some test coverage
> on the SEMI join code (assuming it also uses SEMI joins)?


Unfortunately, the JOB benchmark does not contain semi join nodes. 
However, TPC-DS does. I'll look for the queries with slowest planner 
times there and check them.

I'll need some time to check both join and semi join cases with small 
and large default_statistics_target. I'll share the results later.

>
> Once we've concluded on above and there are no objections, I will
> register this patch in the commit fest.


Sure. No problem.

-- 
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com




Re: Use merge-based matching for MCVs in eqjoinsel

От
Ilia Evdokimov
Дата:
Hi hackers,

On 10.09.2025 16:56, Ilia Evdokimov wrote:
> Unfortunately, the JOB benchmark does not contain semi join nodes. 
> However, TPC-DS does. I'll look for the queries with slowest planner 
> times there and check them.
>
> I'll need some time to check both join and semi join cases with small 
> and large default_statistics_target. I'll share the results later.

JOIN
==============================

I’ve benchmarked the new implementation of eqjoinsel() with different 
values of default_statistics_target. On small targets (1, 5, 10, 25, 50, 
75, 100) the results are all within statistical noise, and I did not 
observe any regressions. In my view, it’s reasonable to keep the current 
condition that the hash table is not used for default_statistics_target 
= 1. Raising that threshold does not seem useful.

Here are the results for JOB queries (where the effect of semi join is 
not visible due to different data distributions):

default_statistics_target | Planner Speedup (×) | Planner Before (ms) | 
Planner After (ms)
------------------------------------------------------------------------------------------
1                         | 1.00                | 1846.643            | 
1847.409
5                         | 1.00                | 1836.391            | 
1828.318
10                        | 0.95                | 1841.750            | 
1929.722
25                        | 0.99                | 1873.172            | 
1890.741
50                        | 0.98                | 1869.897            | 
1898.470
75                        | 1.02                | 1969.368            | 
1929.521
100                       | 0.97                | 1857.890            | 
1921.207
1000                      | 1.14                | 2279.700            | 
1997.102
2500                      | 1.78                | 4682.658            | 
2636.202
5000                      | 6.45                | 15943.696           | 
2471.242
7500                      | 12.45               | 34350.855           | 
2758.565
10000                     | 20.52               | 62519.342           | 
3046.819

SEMI JOIN
==============================

Unfortunately, in TPC-DS it is not possible to clearly see improvements 
for semi joins. To address this, I designed a synthetic example where 
the data distribution forces the loop to run fully, without exiting 
early, which makes the effect on semi joins more visible. In this setup, 
I also ensured that the length of the MCV array is equal to the chosen 
default_statistics_target.

CREATE TABLE t1 AS
SELECT CASE
          WHEN g <= 3000000 * 0.9 THEN (g % 10000) + 1
          ELSE (g % 1000000) + 10000
        END AS id
FROM generate_series(1, 3000000) g;

CREATE TABLE t2 AS
SELECT CASE
          WHEN g <= 3000000 * 0.9 THEN (g % 10000) + 10001
          ELSE (g % 1000000) + 20000
        END AS id
FROM generate_series(1, 3000000) g;

ANALYZE t1, t2;

The results of the query are:

SELECT * FROM t1
WHERE id IN (SELECT id FROM t2);

default_statistics_target | Planner Speedup (×) | Planner Before (ms) | 
Planner After (ms)
------------------------------------------------------------------------------------------
1                         | 1.12                | 1.191               | 
1.062
5                         | 1.02                | 0.493               | 
0.481
10                        | 0.92                | 0.431               | 
0.471
25                        | 1.27                | 0.393               | 
0.309
50                        | 1.04                | 0.432               | 
0.416
75                        | 0.96                | 0.398               | 
0.415
100                       | 0.95                | 0.450               | 
0.473
1000                      | 9.42                | 6.742               | 
0.716
2500                      | 19.15               | 21.621              | 
1.129
5000                      | 46.74               | 85.667              | 
1.833
7500                      | 73.26               | 194.806             | 
2.659
10000                     | 107.95              | 349.981             | 
3.242

-- 
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com




Re: Use merge-based matching for MCVs in eqjoinsel

От
Ilia Evdokimov
Дата:
Hi David,

In v2 patch, when the join is reversed we pass the commutator operator 
Oid to eqjoinsel_semi(), and inside that function we immediately call 
get_opcode(<commutator operator Oid>). Did you mean for the function to 
take an operator Oid instead of an here?

If that was unintentional, perhaps the cleanest fix is to add a new 
'operator' parameter to eqjoinsel_semi() so we can keep passing 
'opfuncoid' as before and avoid changing the behavior.

-- 
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com