Обсуждение: Use merge-based matching for MCVs in eqjoinsel

Поиск

Список

Период

Сортировка

Use merge-based matching for MCVs in eqjoinsel

От

Ilia Evdokimov

Дата:

21 июля, 16:55:56

Hi hackers,

While analyzing planner performance on JOB with default_statistics_target = 1000, I noticed that a significant portion of planning time is spent inside the eqjoinsel() function. According to perf, in most JOB queries at default_statistics_target = 1000, eqjoinsel() is the most expensive function during planning, accounting for approximately 8% of total CPU time. At default_statistics_target = 10000, the planner spend up to 75% of its time inside eqjoinsel(), making it one of the primary bottlenecks.

Еhis overhead is caused by the O(N^2) nested-loop comparison of MCVs in var1 = var2 clauses.

I propose an optimization: when the column datatype supports ordering(i.e., has < and >), we can sort both MCV lists and apply mege-style algorithm to detect matches. This reduces runtime from O(N^2) to O(NlogN), where N is the number of MCV entries. The patch also applies the same optimization to semi-join clauses, which show similar performance behavior.

On JOB, this changes reduce planner time in most queries with complex joins and large MCVs with no observable effect on plan quality. I’ve also attached bar charts showing per-query planner time before and after the patch for default_statistics_target = 100, 1000, 10000 along with query numbers for reference.

Any feedback or suggestions are welcome!

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Вложения

Re: Use merge-based matching for MCVs in eqjoinsel

От

Ilia Evdokimov

Дата:

29 июля, 17:07:13

On 21.07.2025 16:55, Ilia Evdokimov wrote:

While analyzing planner performance on JOB with default_statistics_target = 1000, I noticed that a significant portion of planning time is spent inside the eqjoinsel() function. According to perf, in most JOB queries at default_statistics_target = 1000, eqjoinsel() is the most expensive function during planning, accounting for approximately 8% of total CPU time. At default_statistics_target = 10000, the planner spend up to 75% of its time inside eqjoinsel(), making it one of the primary bottlenecks.
This overhead is caused by the O(N^2) nested-loop comparison of MCVs in var1 = var2 clauses.
I propose an optimization: when the column datatype supports ordering(i.e., has < and >), we can sort both MCV lists and apply mege-style algorithm to detect matches. This reduces runtime from O(N^2) to O(NlogN), where N is the number of MCV entries. The patch also applies the same optimization to semi-join clauses, which show similar performance behavior.

Following up on my previous message about optimizing eqjoinsel() for Var1 = Var2 and semijoin clauses, I’d like to share more detailed performance results across different values of default_statistics_target on the JOB benchmark.

The performance improvement grows as the number of MCV entries increases (i.e., with higher default_statistics_target). The table below shows total planner time summed over all 113 queries in JOB for each setting of default_statistics_target, before and after applying patch:

Total planner time across all JOB queries
=========================================
default_statistics_target | Before Patch (ms) | After Patch (ms)
--------------------------+-------------------+------------------
                      100 |          1828.433 |         1820.556
                     1000 |          2194.282 |         1963.110
                     2500 |          4606.705 |         2140.126
                     5000 |         16661.581 |         2616.109
                     7500 |         35988.569 |         3061.161
                    10000 |         66616.620 |         3504.144

For default_statistics_target < 1000, the planning time remains the same before and after the patch. The optimization reduces planner time substantially - by up to 13X at default_statistics_target = 10000 - while the generated plans and selectivity calculations remain unchanged. To illustrate this, the table below shows the 10 slowest JOB queries (by planning time), along with their planning times before and after applying the patch.

Top 10 slowest queries at default_statistics_target = 10000
===========================================================
Query | Before Patch (ms) | After Patch (ms)
------+--------------------+-------------------
29c |           1939.282 |           111.219
29b |           1939.237 |           100.109
29a |           1931.870 |           100.224
31c |           1622.255 |            67.609
30c |           1602.156 |            70.942
28b |           1521.026 |            84.058
30b |           1519.972 |            68.777
30a |           1518.014 |            70.529
28a |           1514.908 |            86.662
31a |           1507.303 |            63.579

As shown, the total planner time for these top 10 queries drops substantially with the optimization.

I’ve identified and fixed two issues in the original v1 patch: In 'eqjoinsel_semi' the second MCV array was allocated with an incorrect size. And the initialization of FunctionCallInfoData was moved outside the comparator compare_mcv_items() to avoid repeated setup during sorting. I've attached the updated v2 patch with changes.

Any suggestions?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Вложения

v2-0001-Optimize-selectivity-estimation-for-Var-Var-clauses.patch

Re: Use merge-based matching for MCVs in eqjoinsel

От

Ilia Evdokimov

Дата:

03 сентября, 19:53:51

Following up on my previous messages about optimizing eqjoinsel() and eqjoinsel_semi() for Var1 = Var2 clauses, I’d like to share detailed profiling results showing the effect of the patch on JOB for different values of default_statistics_target.

The first table shows the total planner time (summed over all 113 queries) before and after applying the patch, along with the speedup achieved:

default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms)
--------------------------+---------------------+---------------------+--------------------
                     100 |         1.00x       |        1828.433     |        1820.556
                    1000 |         1.12x       |        2194.282     |        1963.110
                    2500 |         2.15x       |        4606.705     |        2140.126
                    5000 |         6.37x       |       16661.581     |        2616.109
                    7500 |        11.76x       |       35988.569     |        3061.161
                   10000 |        19.01x       |       66616.620     |        3504.144

The second table shows the profiling of eqjoinsel() using perf, demonstrating that the function, which dominates planning at high statistics targets, becomes essentially negligible after the patch:

default_statistics_target | eqjoinsel() Before (perf) | eqjoinsel() After (perf)
--------------------------+---------------------------+--------------------------
                     100 |                     0.01% |                     0.04%
                    1000 |                     6.23% |                     0.06%
                    2500 |                    35.45% |                     0.23%
                    5000 |                    66.14% |                     0.53%
                    7500 |                    72.70% |                     0.97%
                   10000 |                    75.42% |                     1.25%

I’ve attached v3 of the patch. This version adds a check for NULL values when comparing MCV entries, ensuring correctness in edge cases.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Вложения

v3-0001-Optimize-selectivity-estimation-for-Var-Var-clauses.patch

Re: Use merge-based matching for MCVs in eqjoinsel

От

Tom Lane

Дата:

03 сентября, 23:26:26

Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes:
> I’ve attached v3 of the patch. This version adds a check for NULL values 
> when comparing MCV entries, ensuring correctness in edge cases.

Um ... what edge cases would those be?  We do not put NULL into
MCV arrays.

            regards, tom lane

Re: Use merge-based matching for MCVs in eqjoinsel

От

Ilia Evdokimov

Дата:

04 сентября, 14:05:50

On 03.09.2025 23:26, Tom Lane wrote:
> Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes:
>> I’ve attached v3 of the patch. This version adds a check for NULL values
>> when comparing MCV entries, ensuring correctness in edge cases.
> Um ... what edge cases would those be?  We do not put NULL into
> MCV arrays.

You're right - MCV arrays never contain NULLs. However, comparing two
MCV values could theoretically return NULL, even though this is very
unlikely. This check existed even before my changes, and similar checks
are used in other selectivity-estimation functions in 'selfuncs.c'.

...
fcinfo->isnull = false;
fresult = FunctionCallInvoke(fcinfo);
if (!fcinfo->isnull && DatumGetBool(fresult))
...

By "edge cases" I was referring to this situation; I probably did not
choose the best wording.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Re: Use merge-based matching for MCVs in eqjoinsel

От

David Geier

Дата:

05 сентября, 17:03:02

Hi Ilia!

On 29.07.2025 16:07, Ilia Evdokimov wrote:
> 
> On 21.07.2025 16:55, Ilia Evdokimov wrote:
>>
>> While analyzing planner performance on JOB with
>> default_statistics_target = 1000, I noticed that a significant portion
>> of planning time is spent inside the eqjoinsel() function. According
>> to perf, in most JOB queries at default_statistics_target = 1000,
>> eqjoinsel() is the most expensive function during planning, accounting
>> for approximately 8% of total CPU time. At default_statistics_target =
>> 10000, the planner spend up to 75% of its time inside eqjoinsel(),
>> making it one of the primary bottlenecks.
>>
>> This overhead is caused by the O(N^2) nested-loop comparison of MCVs
>> in var1 = var2 clauses.

Thanks for working on this. I've wanted to submit a patch for the very
same issue for a while. I've come across this issue multiple times in
the field.

>> I propose an optimization: when the column datatype supports
>> ordering(i.e., has < and >), we can sort both MCV lists and apply
>> mege-style algorithm to detect matches. This reduces runtime from
>> O(N^2) to O(NlogN), where N is the number of MCV entries. The patch
>> also applies the same optimization to semi-join clauses, which show
>> similar performance behavior.

Why do you sort both lists and then merge instead of putting the smaller
list into a hash map and then doing hash lookups (if the type is hashable)?

There are more problems like this in the planner. For example col IN
(many values) is also quadratic because for every value in the IN list
all MCVs are checked. It would be great to fix this as well.

--
David Geier

Re: Use merge-based matching for MCVs in eqjoinsel

От

David Geier

Дата:

08 сентября, 10:11:24

Hi!

On 03.09.2025 18:53, Ilia Evdokimov wrote:
> Following up on my previous messages about optimizing eqjoinsel() and
> eqjoinsel_semi() for Var1 = Var2 clauses, I’d like to share detailed
> profiling results showing the effect of the patch on JOB for different
> values of default_statistics_target.
>
> The first table shows the total planner time (summed over all 113
> queries) before and after applying the patch, along with the speedup
> achieved:
>
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------+---------------------+---------------------
> +--------------------
>                      100  | *1.00x*       | 1828.433     |        1820.556
>                     1000  | *1.12x*       | 2194.282     |        1963.110
>                     2500  | *2.15x*       | 4606.705     |        2140.126
>                     5000  | *6.37x*       | 16661.581     |
2616.109
>                     7500  | *11.76x*       | 35988.569     |
> 3061.161
>                    10000  | *19.01x*       | 66616.620     |
> 3504.144
>

It looks to me like these results are with optimizations disabled?

Can you share the SQL script you used for testing?

--
David Geier

Re: Use merge-based matching for MCVs in eqjoinsel

От

David Geier

Дата:

08 сентября, 13:08:59

Hi Ilia!

On 05.09.2025 16:03, David Geier wrote:
>>> I propose an optimization: when the column datatype supports
>>> ordering(i.e., has < and >), we can sort both MCV lists and apply
>>> mege-style algorithm to detect matches. This reduces runtime from
>>> O(N^2) to O(NlogN), where N is the number of MCV entries. The patch
>>> also applies the same optimization to semi-join clauses, which show
>>> similar performance behavior.
> 
> Why do you sort both lists and then merge instead of putting the smaller
> list into a hash map and then doing hash lookups (if the type is hashable)?

I've tested your and my code with the following script:

CREATE TABLE foo(col TEXT);
CREATE TABLE bar(col TEXT);
SET default_statistics_target = 10000;

-- Generate MCV values. PostgreSQL doesn't store MCVs if the table has
-- only a single value or every value has exactly the same cardinality.
DO $$
BEGIN
  FOR i IN 1..10000 LOOP
    FOR j IN 1..least(i, 50) LOOP
      INSERT INTO foo VALUES ('aaaaaaaaaaaaaaaaaaaa' || i::TEXT);
      INSERT INTO bar VALUES ('aaaaaaaaaaaaaaaaaaaa' || (i + 100000)::TEXT);
    END LOOP;
  END LOOP;
END;
$$;

ANALYZE foo, bar;
\timing on
EXPLAIN SELECT * FROM foo f, bar b WHERE f.col = b.col;

Results are:

- master: 433 ms
- Order+Merge: 11 ms
- Hash map: 4 ms

I've attached my draft patch.

--
David Geier

Вложения

0001-Optimize-eqoinsel_inner-with-hash-table.patch

Re: Use merge-based matching for MCVs in eqjoinsel

От

Ilia Evdokimov

Дата:

08 сентября, 13:35:50

On 08.09.2025 13:08, David Geier wrote:
> Hi Ilia!
>
> On 05.09.2025 16:03, David Geier wrote:
>>>> I propose an optimization: when the column datatype supports
>>>> ordering(i.e., has < and >), we can sort both MCV lists and apply
>>>> mege-style algorithm to detect matches. This reduces runtime from
>>>> O(N^2) to O(NlogN), where N is the number of MCV entries. The patch
>>>> also applies the same optimization to semi-join clauses, which show
>>>> similar performance behavior.
>> Why do you sort both lists and then merge instead of putting the smaller
>> list into a hash map and then doing hash lookups (if the type is hashable)?
> I've tested your and my code with the following script:
>
> CREATE TABLE foo(col TEXT);
> CREATE TABLE bar(col TEXT);
> SET default_statistics_target = 10000;
>
> -- Generate MCV values. PostgreSQL doesn't store MCVs if the table has
> -- only a single value or every value has exactly the same cardinality.
> DO $$
> BEGIN
>    FOR i IN 1..10000 LOOP
>      FOR j IN 1..least(i, 50) LOOP
>        INSERT INTO foo VALUES ('aaaaaaaaaaaaaaaaaaaa' || i::TEXT);
>        INSERT INTO bar VALUES ('aaaaaaaaaaaaaaaaaaaa' || (i + 100000)::TEXT);
>      END LOOP;
>    END LOOP;
> END;
> $$;
>
> ANALYZE foo, bar;
> \timing on
> EXPLAIN SELECT * FROM foo f, bar b WHERE f.col = b.col;
>
> Results are:
>
> - master: 433 ms
> - Order+Merge: 11 ms
> - Hash map: 4 ms
>
> I've attached my draft patch.
>
> --
> David Geier


Hi David,

Thank you for reviewing.

I have read all the previous messages - and yes, you are right. I don’t
know why I didn’t consider using a hash table approach initially. Your
idea makes a lot of sense.

To evaluate it, I ran benchmarks on JOB with three variants:

$ ./benchmark.sh master
$ ./benchmark.sh merge
$ ./benchmark.sh hash

I compared total planning time across all 113 queries.

$ python3 planning_time.py master hash
default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1892.627            |
1886.969
1000                      | 1.09                | 2286.922            |
2100.099
2500                      | 1.94                | 4647.167            |
2400.711
5000                      | 6.15                | 17964.779           |
2919.914
7500                      | 10.58               | 38622.443           |
3650.375
10000                     | 16.33               | 69538.085           |
4257.864

$ python3 planning_time.py master merge
default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1892.627            |
1898.622
1000                      | 1.12                | 2286.922            |
2033.553
2500                      | 1.92                | 4647.167            |
2423.552
5000                      | 5.94                | 17964.779           |
3025.739
7500                      | 10.48               | 38622.443           |
3684.262
10000                     | 16.72               | 69538.085           |
4159.418

Based on these results, I’d prefer the hash lookup implementation, so I
think it makes sense to improve your patch further and bring it into
good shape. Shall I take care of that, or would you prefer to do it
yourself?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Вложения

benchmark.zip

Re: Use merge-based matching for MCVs in eqjoinsel

От

Ilia Evdokimov

Дата:

08 сентября, 13:45:06

On 08.09.2025 13:35, Ilia Evdokimov wrote:
> Based on these results, I’d prefer the hash lookup implementation, so
> I think it makes sense to improve your patch further and bring it into
> good shape. Shall I take care of that, or would you prefer to do it
> yourself?



I realized I mistakenly copied the wrong results for the hash-map
version in my previous draft. Sorry about that. Here are the correct
benchmark results:

Merge

default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1892.627            |
1898.622
1000                      | 1.12                | 2286.922            |
2033.553
2500                      | 1.92                | 4647.167            |
2423.552
5000                      | 5.94                | 17964.779           |
3025.739
7500                      | 10.48               | 38622.443           |
3684.262
10000                     | 16.72               | 69538.085           |
4159.418


Hash-Map

default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1892.627            |
1886.969
1000                      | 1.09                | 2286.922            |
2100.099
2500                      | 1.94                | 4647.167            |
2400.711
5000                      | 6.15                | 17964.779           |
2919.914
7500                      | 10.58               | 38622.443           |
3650.375
10000                     | 16.33               | 69538.085           |
4257.864

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Re: Use merge-based matching for MCVs in eqjoinsel

От

David Geier

Дата:

08 сентября, 13:56:07

Hi Ilia!

> I have read all the previous messages - and yes, you are right. I don’t
> know why I didn’t consider using a hash table approach initially. Your
> idea makes a lot of sense.

Your solution would be beneficial on top, for cases where the data type
is not hashable. But I think that's overkill for a v1. I would start
with the hash-based version.

> To evaluate it, I ran benchmarks on JOB with three variants:
> 
> $ ./benchmark.sh master
> $ ./benchmark.sh merge
> $ ./benchmark.sh hash
> 
> I compared total planning time across all 113 queries.

Was this running with optimizations? How did you extract the planning time?

> 
> $ python3 planning_time.py master hash
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100                       | 1.00                | 1892.627            |
> 1886.969
> 1000                      | 1.09                | 2286.922            |
> 2100.099
> 2500                      | 1.94                | 4647.167            |
> 2400.711
> 5000                      | 6.15                | 17964.779           |
> 2919.914
> 7500                      | 10.58               | 38622.443           |
> 3650.375
> 10000                     | 16.33               | 69538.085           |
> 4257.864
> 
> $ python3 planning_time.py master merge
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100                       | 1.00                | 1892.627            |
> 1898.622
> 1000                      | 1.12                | 2286.922            |
> 2033.553
> 2500                      | 1.92                | 4647.167            |
> 2423.552
> 5000                      | 5.94                | 17964.779           |
> 3025.739
> 7500                      | 10.48               | 38622.443           |
> 3684.262
> 10000                     | 16.72               | 69538.085           |
> 4159.418

I would have expected the delta between the "merge" and "hash" variant
to be bigger, especially for default_statistics_target=10000. My small
test also showed that. Any idea why this is not showing in your results?

> Based on these results, I’d prefer the hash lookup implementation, so I
> think it makes sense to improve your patch further and bring it into
> good shape. Shall I take care of that, or would you prefer to do it
> yourself?

I think process-wise it's best if you review my code and I do the changes.

Could you as part of your review test tables with just a few MCVs to
make sure we're not regressing "small" cases? For now I'm only bailing
if one of the two MCV lists has just a single value. I'm expecting the
gains from fine tuning this value to be not measurable but let's double
check.

--
David Geier

Re: Use merge-based matching for MCVs in eqjoinsel

От

David Geier

Дата:

08 сентября, 14:00:10

On 08.09.2025 12:45, Ilia Evdokimov wrote:
> 
> I realized I mistakenly copied the wrong results for the hash-map
> version in my previous draft. Sorry about that. Here are the correct
> benchmark results:
> 
> Merge
> 
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100                       | 1.00                | 1892.627            |
> 1898.622
> 1000                      | 1.12                | 2286.922            |
> 2033.553
> 2500                      | 1.92                | 4647.167            |
> 2423.552
> 5000                      | 5.94                | 17964.779           |
> 3025.739
> 7500                      | 10.48               | 38622.443           |
> 3684.262
> 10000                     | 16.72               | 69538.085           |
> 4159.418
> 
> 
> Hash-Map
> 
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100                       | 1.00                | 1892.627            |
> 1886.969
> 1000                      | 1.09                | 2286.922            |
> 2100.099
> 2500                      | 1.94                | 4647.167            |
> 2400.711
> 5000                      | 6.15                | 17964.779           |
> 2919.914
> 7500                      | 10.58               | 38622.443           |
> 3650.375
> 10000                     | 16.33               | 69538.085           |
> 4257.864
> 

It still seems to me like something is fishy with the numbers or
something in the benchmark adds a lot over overhead so that small
differences in eqjoinsel_inner() don't show here.

The delta between "hash" and "merge" for default_statistics_target=10000
should be the biggest but it's actually slower.

--
David Geier

Re: Use merge-based matching for MCVs in eqjoinsel

От

Ilia Evdokimov

Дата:

08 сентября, 16:45:27

On 08.09.2025 13:56, David Geier wrote:

To evaluate it, I ran benchmarks on JOB with three variants:

$ ./benchmark.sh master
$ ./benchmark.sh merge
$ ./benchmark.sh hash

I compared total planning time across all 113 queries.

Was this running with optimizations? How did you extract the planning time?

I save all query plans using EXPLAIN SUMMARY, then go through all the plans, read the 'Planning Time' for each, and sum them up.

I would have expected the delta between the "merge" and "hash" variant
to be bigger, especially for default_statistics_target=10000. My small
test also showed that. Any idea why this is not showing in your results?

So would I. With default_statistics_target = 10000 and the selectivity in the JOB queries being close to zero, the difference should be noticeable. I can only explain the previous results by cache-related effects on my machine.

I reran the benchmark on a clean cluster and collected the top slowest JOB queries — now the effect is clearly visible.

Merge (sum of all JOB queries)
==================
default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1888.105            | 1879.431
1000                      | 1.14                | 2282.239            | 2009.114
2500                      | 2.10                | 5595.030            | 2668.530
5000                      | 5.56                | 18544.933           | 3333.252
7500                      | 9.17                | 37390.956           | 4076.390
10000                     | 16.10               | 69319.479           | 4306.417

HashMap (sum of all JOB queries)
==================
default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms)
--------------------------------------------------------------------------------
100                     | 1.03                | 1888.105            | 1828.088
1000                    | 1.18                | 2282.239            | 1939.884
2500                    | 2.64                | 5595.030            | 2117.872
5000                    | 7.80                | 18544.933           | 2377.206
7500                    | 13.80               | 37390.956           | 2709.973
10000                   | 23.32               | 69319.479           | 2973.073

Top 10 slowest JOB queries (default_statistics_target = 10000)
Query | master (ms) | merge (ms) | Hash (ms)
------+-------------+------------+-----------
29c   | 1904.586    | 144.135    | 100.473
29b   | 1881.392    | 117.891    | 89.028
29a   | 1868.805    | 112.242    | 83.913
31c   | 1867.234    | 76.498     | 56.140
30c   | 1646.630    | 88.494     | 62.549
30b   | 1608.820    | 84.821     | 64.603
31a   | 1573.964    | 75.978     | 56.140
28a   | 1457.738    | 95.939     | 77.309
28b   | 1455.052    | 99.383     | 73.065
30a   | 1416.699    | 91.057     | 62.549

BTW, the hashmap from your patch could also be applied to eqjoinsel_semi() function.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Re: Use merge-based matching for MCVs in eqjoinsel

От

David Geier

Дата:

08 сентября, 17:36:35

Hi!

On 08.09.2025 15:45, Ilia Evdokimov wrote:
> I reran the benchmark on a clean cluster and collected the top slowest
> JOB queries — now the effect is clearly visible.
> 
> Merge (sum of all JOB queries)
> ==================
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100                       | *1.00*                | 1888.105           
> | 1879.431
> 1000                      | *1.14*                | 2282.239           
> | 2009.114
> 2500                      | *2.10*                | 5595.030           
> | 2668.530
> 5000                      | *5.56*                | 18544.933          
> | 3333.252
> 7500                      | *9.17*                | 37390.956          
> | 4076.390
> 10000                     | *16.10*               | 69319.479          
> | 4306.417
> 
> HashMap (sum of all JOB queries)
> ==================
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100                     | *1.03*                | 1888.105            |
> 1828.088
> 1000                    | *1.18*                | 2282.239            |
> 1939.884
> 2500                    | *2.64*                | 5595.030            |
> 2117.872
> 5000                    | *7.80*                | 18544.933           |
> 2377.206
> 7500                    | *13.80*               | 37390.956           |
> 2709.973
> 10000                   | *23.32*               | 69319.479           |
> 2973.073
> 
> Top 10 slowest JOB queries (default_statistics_target = 10000)
> Query | master (ms) | merge (ms) | Hash (ms)
> ------+-------------+------------+-----------
> 29c   | 1904.586    | 144.135    | 100.473
> 29b   | 1881.392    | 117.891    | 89.028
> 29a   | 1868.805    | 112.242    | 83.913
> 31c   | 1867.234    | 76.498     | 56.140
> 30c   | 1646.630    | 88.494     | 62.549
> 30b   | 1608.820    | 84.821     | 64.603
> 31a   | 1573.964    | 75.978     | 56.140
> 28a   | 1457.738    | 95.939     | 77.309
> 28b   | 1455.052    | 99.383     | 73.065
> 30a   | 1416.699    | 91.057     | 62.549

This looks much better. Very nice!

> 
> BTW, the hashmap from your patch could also be applied to
> eqjoinsel_semi() function.
> 

Yep. The inner loop only runs until clamped_nvalues2 and it doesn't
compute matchprodfreq. I'll try to modify the patch such that it
accounts for these differences without being too hard to read.

Do you think anything else needs changes in the patch? Did you have a
chance to check tables with just few MCVs or are there any queries in
the JOB which regress with very small default_statistics_target?

--
David Geier

Re: Use merge-based matching for MCVs in eqjoinsel

От

Ilia Evdokimov

Дата:

09 сентября, 10:29:36

Hi David,

On 08.09.2025 17:36, David Geier wrote:

Do you think anything else needs changes in the patch?

From an architectural perspective, I think the patch is already in good shape. However, I have some suggestions regarding code style:

I would move McvHashEntry, McvHashContext, he new hash table definition, hash_mcv and are_mcvs_equal to the top.
I’m not sure get_hash_func_oid() is needed at all – it seems we could do without it.
It would be better to name the parameters matchProductFrequencies -> matchprodfreq, nMatches -> nmatches, hasMatch1 -> hasmatch1, hasMatch2 -> hasmatch2 in eqjoinsel_inner_with_hashtable().
As I wrote earlier, since we now have a dedicated function eqjoinsel_inner_with_hashtable(), perhaps it could be used in both eqjoinsel_inner() and eqjoinsel_semi(). And since the hash-based search was factored out, maybe it would make sense to also factor out the O(N^2) nested loop implementation?
I think it would be helpful to add a comment explaining that using a hash table is not efficient when the MCV array length equals 1:

if (Min(statsSlot1->nvalues, statsSlot2->nvalues) == 1)
return false;

Did you have a
chance to check tables with just few MCVs or are there any queries in
the JOB which regress with very small default_statistics_target?

Sure. I need some time to check this.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Re: Use merge-based matching for MCVs in eqjoinsel

От

David Geier

Дата:

09 сентября, 12:22:32

Hi!

Thanks for the review.

On 09.09.2025 09:29, Ilia Evdokimov wrote:
> From an architectural perspective, I think the patch is already in good
> shape. However, I have some suggestions regarding code style:
> 
> 1. I would move McvHashEntry, McvHashContext, he new hash table
>    definition, hash_mcv and are_mcvs_equal to the top.
Done. I've also moved up try_eqjoinsel_with_hashtable().

> 2. I’m not sure get_hash_func_oid() is needed at all – it seems we
>    could do without it.
Removed.

> 3. It would be better to name the parameters matchProductFrequencies ->
>    matchprodfreq, nMatches -> nmatches, hasMatch1 -> hasmatch1,
>    hasMatch2 -> hasmatch2 in eqjoinsel_inner_with_hashtable().
Done.

> 4. As I wrote earlier, since we now have a dedicated function
>    eqjoinsel_inner_with_hashtable(), perhaps it could be used in both
>    eqjoinsel_inner() and eqjoinsel_semi(). And since the hash-based
Done.

The gains for SEMI join are even bigger because the function is called 3
times for e.g. EXPLAIN SELECT * FROM foo f WHERE EXISTS (SELECT 1 FROM
bar b where f.col = b.col); For that query the planning time for me goes
from ~1300 ms -> 12 ms.

>    search was factored out, maybe it would make sense to also factor
>    out the O(N^2) nested loop implementation?
Generally agreed and while tempting, I've refrained from doing the
refactoring. Let's better do this in a separate patch to keep things simple.

> 5. I think it would be helpful to add a comment explaining that using a
>    hash table is not efficient when the MCV array length equals 1:
> 
> if (Min(statsSlot1->nvalues, statsSlot2->nvalues) == 1)
>     return false;
Done.
>> Did you have a
>> chance to check tables with just few MCVs or are there any queries in
>> the JOB which regress with very small default_statistics_target?
> 
> 
> Sure. I need some time to check this.
> 
Could you please do that with the latest attached patch so that we test
it once more?

Could you also run once more the JOB benchmark to get some test coverage
on the SEMI join code (assuming it also uses SEMI joins)?

Once we've concluded on above and there are no objections, I will
register this patch in the commit fest.

--
David Geier

Вложения

0002-Optimize-eqjoinsel_inner-and-eqjoinsel_semi.patch

Re: Use merge-based matching for MCVs in eqjoinsel

От

Ilia Evdokimov

Дата:

10 сентября, 16:56:03

Hi!

On 09.09.2025 12:22, David Geier wrote:
> Could you please do that with the latest attached patch so that we test
> it once more?

LGTM. Yes, I'll test this patch.

>
> Could you also run once more the JOB benchmark to get some test coverage
> on the SEMI join code (assuming it also uses SEMI joins)?

Unfortunately, the JOB benchmark does not contain semi join nodes. 
However, TPC-DS does. I'll look for the queries with slowest planner 
times there and check them.

I'll need some time to check both join and semi join cases with small 
and large default_statistics_target. I'll share the results later.

>
> Once we've concluded on above and there are no objections, I will
> register this patch in the commit fest.

Sure. No problem.

-- 
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Re: Use merge-based matching for MCVs in eqjoinsel

От

Ilia Evdokimov

Дата:

16 сентября, 18:52:47

Hi hackers,

On 10.09.2025 16:56, Ilia Evdokimov wrote:
> Unfortunately, the JOB benchmark does not contain semi join nodes. 
> However, TPC-DS does. I'll look for the queries with slowest planner 
> times there and check them.
>
> I'll need some time to check both join and semi join cases with small 
> and large default_statistics_target. I'll share the results later.

JOIN
==============================

I’ve benchmarked the new implementation of eqjoinsel() with different 
values of default_statistics_target. On small targets (1, 5, 10, 25, 50, 
75, 100) the results are all within statistical noise, and I did not 
observe any regressions. In my view, it’s reasonable to keep the current 
condition that the hash table is not used for default_statistics_target 
= 1. Raising that threshold does not seem useful.

Here are the results for JOB queries (where the effect of semi join is 
not visible due to different data distributions):

default_statistics_target | Planner Speedup (×) | Planner Before (ms) | 
Planner After (ms)
------------------------------------------------------------------------------------------
1                         | 1.00                | 1846.643            | 
1847.409
5                         | 1.00                | 1836.391            | 
1828.318
10                        | 0.95                | 1841.750            | 
1929.722
25                        | 0.99                | 1873.172            | 
1890.741
50                        | 0.98                | 1869.897            | 
1898.470
75                        | 1.02                | 1969.368            | 
1929.521
100                       | 0.97                | 1857.890            | 
1921.207
1000                      | 1.14                | 2279.700            | 
1997.102
2500                      | 1.78                | 4682.658            | 
2636.202
5000                      | 6.45                | 15943.696           | 
2471.242
7500                      | 12.45               | 34350.855           | 
2758.565
10000                     | 20.52               | 62519.342           | 
3046.819

SEMI JOIN
==============================

Unfortunately, in TPC-DS it is not possible to clearly see improvements 
for semi joins. To address this, I designed a synthetic example where 
the data distribution forces the loop to run fully, without exiting 
early, which makes the effect on semi joins more visible. In this setup, 
I also ensured that the length of the MCV array is equal to the chosen 
default_statistics_target.

CREATE TABLE t1 AS
SELECT CASE
          WHEN g <= 3000000 * 0.9 THEN (g % 10000) + 1
          ELSE (g % 1000000) + 10000
        END AS id
FROM generate_series(1, 3000000) g;

CREATE TABLE t2 AS
SELECT CASE
          WHEN g <= 3000000 * 0.9 THEN (g % 10000) + 10001
          ELSE (g % 1000000) + 20000
        END AS id
FROM generate_series(1, 3000000) g;

ANALYZE t1, t2;

The results of the query are:

SELECT * FROM t1
WHERE id IN (SELECT id FROM t2);

default_statistics_target | Planner Speedup (×) | Planner Before (ms) | 
Planner After (ms)
------------------------------------------------------------------------------------------
1                         | 1.12                | 1.191               | 
1.062
5                         | 1.02                | 0.493               | 
0.481
10                        | 0.92                | 0.431               | 
0.471
25                        | 1.27                | 0.393               | 
0.309
50                        | 1.04                | 0.432               | 
0.416
75                        | 0.96                | 0.398               | 
0.415
100                       | 0.95                | 0.450               | 
0.473
1000                      | 9.42                | 6.742               | 
0.716
2500                      | 19.15               | 21.621              | 
1.129
5000                      | 46.74               | 85.667              | 
1.833
7500                      | 73.26               | 194.806             | 
2.659
10000                     | 107.95              | 349.981             | 
3.242

-- 
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Re: Use merge-based matching for MCVs in eqjoinsel

От

Ilia Evdokimov

Дата:

17 сентября, 12:40:01

Hi David,

In v2 patch, when the join is reversed we pass the commutator operator 
Oid to eqjoinsel_semi(), and inside that function we immediately call 
get_opcode(<commutator operator Oid>). Did you mean for the function to 
take an operator Oid instead of an here?

If that was unintentional, perhaps the cleanest fix is to add a new 
'operator' parameter to eqjoinsel_semi() so we can keep passing 
'opfuncoid' as before and avoid changing the behavior.

-- 
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Re: Use merge-based matching for MCVs in eqjoinsel

От

Ilia Evdokimov

Дата:

13 октября, 13:08:02

On 17.09.2025 12:40, Ilia Evdokimov wrote:
> Hi David,
>
> In v2 patch, when the join is reversed we pass the commutator operator 
> Oid to eqjoinsel_semi(), and inside that function we immediately call 
> get_opcode(<commutator operator Oid>). Did you mean for the function 
> to take an operator Oid instead of an here?
>
> If that was unintentional, perhaps the cleanest fix is to add a new 
> 'operator' parameter to eqjoinsel_semi() so we can keep passing 
> 'opfuncoid' as before and avoid changing the behavior.
>

This v3 patch fixes the confusion between operator and function Oids in 
eqjoinsel_semi(). This version restores the previous behavior by keeping 
the function Oid as before and adds an explicit 'operator' parameter so 
both values are available without extra behavior changes.

Do you have any further comments or suggestions on this version?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Вложения

v3-0001-Optimize-eqjoinsel_inner-and-eqjoinsel_semi.patch

Re: Use merge-based matching for MCVs in eqjoinsel

От

David Geier

Дата:

27 октября, 18:50:33

Hi Ilia!

On 10.09.2025 15:56, Ilia Evdokimov wrote:
> 
> LGTM. Yes, I'll test this patch.
> 
> 
> 
> Unfortunately, the JOB benchmark does not contain semi join nodes.
> However, TPC-DS does. I'll look for the queries with slowest planner
> times there and check them.
> 
> I'll need some time to check both join and semi join cases with small
> and large default_statistics_target. I'll share the results later.
> 

Have you had a chance to test above things?

I've seen that there's already a commit fest entry. I've adapted the
entry (changed the title and added myself as author). Do you want to add
yourself as reviewer?

--
David Geier

Re: Use merge-based matching for MCVs in eqjoinsel

От

Ilia Evdokimov

Дата:

30 октября, 13:07:22

Hi David,

On 27.10.2025 18:50, David Geier wrote:
> Hi Ilia!
>
> On 10.09.2025 15:56, Ilia Evdokimov wrote:
>> LGTM. Yes, I'll test this patch.
>>
>>
>>
>> Unfortunately, the JOB benchmark does not contain semi join nodes.
>> However, TPC-DS does. I'll look for the queries with slowest planner
>> times there and check them.
>>
>> I'll need some time to check both join and semi join cases with small
>> and large default_statistics_target. I'll share the results later.
>>
> Have you had a chance to test above things?

Yes, I wrote about this here: 
https://www.postgresql.org/message-id/c3dbf2ab-d72d-4033-822a-60ad8023f499%40tantorlabs.com

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

Re: Use merge-based matching for MCVs in eqjoinsel

От

Tom Lane

Дата:

04 ноября, 00:55:45

Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes:
> On 27.10.2025 18:50, David Geier wrote:
>> On 10.09.2025 15:56, Ilia Evdokimov wrote:
>>> I'll need some time to check both join and semi join cases with small
>>> and large default_statistics_target. I'll share the results later.

>> Have you had a chance to test above things?

> Yes, I wrote about this here:
> https://www.postgresql.org/message-id/c3dbf2ab-d72d-4033-822a-60ad8023f499%40tantorlabs.com

Hmm.  Those results sure look like there is a performance regression
up to at least 100 MCVs ... not a large one, but consistently a few
percent.  That's a bit sad for a patch purporting to improve
performance.  It looks to me like perhaps we should stick to the old
algorithm up to 100 or possibly even more MCVs.  Certainly the
threshold needs to be higher than 1, as you have it now.

I eyeballed the patch itself very briefly, and have a couple
quick comments:

* Is hash_msv a typo for hash_mcv?  If not, maybe spell out what
it's supposed to mean.

* The patch would be easier to read if it didn't reindent a couple
large chunks of existing code.  Can we change the factorization
to avoid that?  If not, I'd recommend submitting without that
reindentation, and reminding the committer to reindent at the last
moment.

* The calculation loops in eqjoinsel_inner and eqjoinsel_semi
are not identical, which makes it look quite weird to be
writing just one function that conditionally replaces both.
I wonder if we should refactor to have just one copy (and
just eat the extra cycles of calculating matchprodfreq).

* In fact ... I wonder if we should try harder to not do essentially
identical calculations twice, remembering that eqjoinsel_semi is
always used alongside eqjoinsel_inner.  (Of course, we could only do
that if clamped_nvalues2 is the same as sslot2->nvalues, but that's
frequently true.)  I think the reason it's duplicative right now
is that we regarded this semijoin calculation as an experiment and
so didn't want to invest a lot of effort in it ... but this patch
is exactly a lot of effort, so maybe it's time to deal with that
unfinished business.

            regards, tom lane

Re: Use merge-based matching for MCVs in eqjoinsel

От

Ilia Evdokimov

Дата:

10 ноября, 17:20:42

Thanks for the detailed feedback!

On 04.11.2025 00:55, Tom Lane wrote:

Hmm.  Those results sure look like there is a performance regression
up to at least 100 MCVs ... not a large one, but consistently a few
percent.  That's a bit sad for a patch purporting to improve
performance.  It looks to me like perhaps we should stick to the old
algorithm up to 100 or possibly even more MCVs.  Certainly the
threshold needs to be higher than 1, as you have it now.

I re-ran the benchmark on JOB with a threshold of 100.Here are the updated results:

default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms)
--------------------------------------------------------------------------------
1                         | 1.00                | 2320.412            | 2318.377
5                         | 0.99                | 2335.894            | 2360.890
10                        | 1.00                | 2350.612            | 2347.154
25                        | 1.01                | 2365.977            | 2342.312
50                        | 0.99                | 2381.554            | 2405.262
75                        | 1.00                | 2396.481            | 2399.828
100                       | 1.00                | 2410.367            | 2412.456
1000                      | 1.11                | 2850.853            | 2564.303
2500                      | 2.04                | 5571.688            | 2731.545
5000                      | 6.05                | 18850.084           | 3114.692
7500                      | 11.96               | 39160.898           | 3273.688
10000                     | 19.04               | 71334.113           | 3745.955

I eyeballed the patch itself very briefly, and have a couple
quick comments:

* Is hash_msv a typo for hash_mcv?  If not, maybe spell out what
it's supposed to mean.

Yes, that was a typo — fixed.

* The patch would be easier to read if it didn't reindent a couple
large chunks of existing code.  Can we change the factorization
to avoid that?  If not, I'd recommend submitting without that
reindentation, and reminding the committer to reindent at the last
moment.

Fixed as well. I’ve removed all reindentation changes. I will keep that in mind for future submissions.

* The calculation loops in eqjoinsel_inner and eqjoinsel_semi
are not identical, which makes it look quite weird to be
writing just one function that conditionally replaces both.
I wonder if we should refactor to have just one copy (and
just eat the extra cycles of calculating matchprodfreq).

Agreed. I’ve dropped the attempt to merge them into a single function.


* In fact ... I wonder if we should try harder to not do essentially
identical calculations twice, remembering that eqjoinsel_semi is
always used alongside eqjoinsel_inner.  (Of course, we could only do
that if clamped_nvalues2 is the same as sslot2->nvalues, but that's
frequently true.)  I think the reason it's duplicative right now
is that we regarded this semijoin calculation as an experiment and
so didn't want to invest a lot of effort in it ... but this patch
is exactly a lot of effort, so maybe it's time to deal with that
unfinished business.
			regards, tom lane

Good point. I addressed this in a separate patch: eqjoinsel_inner() now saves matchfreq1, matchfreq2, nmatches so that eqjoinsel_semi() can reuse them when (clamped_nvalues2 == sslot2->nvalues). If the MCV list on the RHS is clamped, we still recompute locally. If you have a cleaner idea for how to share these values between the two functions without passing them explicitly, I’d be happy to consider it.

I’m attaching two patches:
1. v4-0001-Avoid-duplicate-MCV-matching-in-eqjoinsel_semi-an.patch - removes redundant MCV matching for semi/anti joins;
2. v4-0002-Optimize-MCV-matching-in-eqjoinsel_inner-and-eqjo.patch - adds hash-based MCV matching with a configurable threshold and includes fixes based on your comments.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

Вложения

Re: Use merge-based matching for MCVs in eqjoinsel

От

Tom Lane

Дата:

14 ноября, 01:21:54

Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes:
> Good point. I addressed this in a separate patch: eqjoinsel_inner() now
> saves matchfreq1, matchfreq2, nmatches so that eqjoinsel_semi() can
> reuse them when (clamped_nvalues2 == sslot2->nvalues). If the MCV list
> on the RHS is clamped, we still recompute locally. If you have a cleaner
> idea for how to share these values between the two functions without
> passing them explicitly, I’d be happy to consider it.

This didn't look very much like the refactorization that I had in
mind: I thought we should have one copy of the matching code, not two.
Also, after looking closer at your patch I realized you were just
punting for cross-type comparison operators, which I felt was kind
of sad.  It's a little bit tricky to get simplehash.h to go along
with cross-type hashing, because it wants to use just one hash and
one equality function.  But since those are interface routines we
are going to supply anyway, we can make them deal with the insert
and lookup cases differently.

So after a bit of hacking I ended up with the attached.  I split up
the refactorization into several steps to make it easier to review.
(But I'd anticipate squashing these into one commit in the end,
so I didn't spend a lot of time on the commit messages.)

Also, 0001 in this series is not meant to be committed; what it
does is to add some debug logging to ease comparing runtimes of
different versions of eqjoinsel.  I was able to use that to
convince myself that the refactoring steps didn't cost anything
meaningful in performance.  Perhaps we could use it to investigate
the right hashing threshold more carefully, too.

There are still a couple of XXX comments in the attached, denoting
loose ends to look at.  In particular, I wondered whether the
hash threshold check

        if (Min(sslot1.nvalues, sslot2.nvalues) >= EQJOINSEL_MCV_HASH_THRESHOLD)

should use Max() instead --- that is, it might be safer to hash
if either MCV list is long.  Or, holding one's head at a different
angle, perhaps the sum of the list lengths should be what's checked?

            regards, tom lane

From e4f70adb688b5abf400f797ba0f15290a9529ac1 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 13 Nov 2025 12:18:19 -0500
Subject: [PATCH v5 1/5] Add some test scaffolding to join_selectivity().

This not-meant-for-commit patch adds some instrumentation to
plancat.c's join_selectivity() to log the result and runtime
of a join selectivity function.  This is useful for manual
testing of performance patches in eqjoinsel().

To improve the accuracy of the runtime measurement, run the
function 1000 times in each call.  The regression tests still
take a reasonable amount of time with this overhead, although
it's noticeably more than usual.
---
 src/backend/optimizer/util/plancat.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d950bd93002..15c50e068a0 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -42,6 +42,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
+#include "portability/instr_time.h"
 #include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
 #include "statistics/statistics.h"
@@ -2123,6 +2124,8 @@ join_selectivity(PlannerInfo *root,
 {
     RegProcedure oprjoin = get_oprjoin(operatorid);
     float8        result;
+    instr_time    start_time,
+                end_time;

     /*
      * if the oprjoin procedure is missing for whatever reason, use a
@@ -2131,6 +2134,10 @@ join_selectivity(PlannerInfo *root,
     if (!oprjoin)
         return (Selectivity) 0.5;

+    INSTR_TIME_SET_CURRENT(start_time);
+
+    for (int i = 0; i < 1000; i++)
+    {
     result = DatumGetFloat8(OidFunctionCall5Coll(oprjoin,
                                                  inputcollid,
                                                  PointerGetDatum(root),
@@ -2138,6 +2145,21 @@ join_selectivity(PlannerInfo *root,
                                                  PointerGetDatum(args),
                                                  Int16GetDatum(jointype),
                                                  PointerGetDatum(sjinfo)));
+    CHECK_FOR_INTERRUPTS();
+    }
+
+    /* Don't be chatty during initdb */
+    if (IsUnderPostmaster)
+    {
+        INSTR_TIME_SET_CURRENT(end_time);
+
+        INSTR_TIME_SUBTRACT(end_time, start_time);
+
+        /* multiply by 1e6 to get microsecs, divide by 1000 to cancel loop */
+        elog(LOG, "join_selectivity(opr %u, jointype %d) = %f, time %g us",
+             operatorid, jointype, result,
+             INSTR_TIME_GET_DOUBLE(end_time) * 1e3);
+    }

     if (result < 0.0 || result > 1.0)
         elog(ERROR, "invalid join selectivity: %f", result);
--
2.43.7

From 162f58fe782e60d548a0598b0228c096438b9a44 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 13 Nov 2025 12:40:53 -0500
Subject: [PATCH v5 2/5] Factor out duplicative code in
 eqjoinsel_inner/eqjoinsel_semi.

These functions have essentially identical code for scanning the
two MCV lists and identifying which entries have matches in the
other list.  While it's not a huge amount of code, it's 50 or
so lines, and will be more after an upcoming patch to use a hash
table with many MCVs.  Let's reduce duplication by moving that
code into a common subroutine.

The one downside of doing this is that we must compute
sum(sslot1->numbers[i] * sslot2->numbers[j]) even though
eqjoinsel_semi won't need that.  But the cost of that appears
negligible, so I didn't trouble to invent a way of avoiding it.
---
 src/backend/utils/adt/selfuncs.c | 191 ++++++++++++++++---------------
 1 file changed, 99 insertions(+), 92 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cb23ad52782..590b3a0c078 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -163,6 +163,11 @@ static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
                              Form_pg_statistic stats1, Form_pg_statistic stats2,
                              bool have_mcvs1, bool have_mcvs2,
                              RelOptInfo *inner_rel);
+static void eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+                                   AttStatsSlot *sslot1, AttStatsSlot *sslot2,
+                                   int nvalues1, int nvalues2,
+                                   bool *hasmatch1, bool *hasmatch2,
+                                   int *p_nmatches, double *p_matchprodfreq);
 static bool estimate_multivariate_ndistinct(PlannerInfo *root,
                                             RelOptInfo *rel, List **varinfos, double *ndistinct);
 static bool convert_to_scalar(Datum value, Oid valuetypid, Oid collid,
@@ -2473,8 +2478,6 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
          * results", Technical Report 1018, Computer Science Dept., University
          * of Wisconsin, Madison, March 1991 (available from ftp.cs.wisc.edu).
          */
-        LOCAL_FCINFO(fcinfo, 2);
-        FmgrInfo    eqproc;
         bool       *hasmatch1;
         bool       *hasmatch2;
         double        nullfrac1 = stats1->stanullfrac;
@@ -2491,55 +2494,17 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
         int            i,
                     nmatches;

-        fmgr_info(opfuncoid, &eqproc);
-
-        /*
-         * Save a few cycles by setting up the fcinfo struct just once. Using
-         * FunctionCallInvoke directly also avoids failure if the eqproc
-         * returns NULL, though really equality functions should never do
-         * that.
-         */
-        InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
-                                 NULL, NULL);
-        fcinfo->args[0].isnull = false;
-        fcinfo->args[1].isnull = false;
-
+        /* Construct the match arrays */
         hasmatch1 = (bool *) palloc0(sslot1->nvalues * sizeof(bool));
         hasmatch2 = (bool *) palloc0(sslot2->nvalues * sizeof(bool));

-        /*
-         * Note we assume that each MCV will match at most one member of the
-         * other MCV list.  If the operator isn't really equality, there could
-         * be multiple matches --- but we don't look for them, both for speed
-         * and because the math wouldn't add up...
-         */
-        matchprodfreq = 0.0;
-        nmatches = 0;
-        for (i = 0; i < sslot1->nvalues; i++)
-        {
-            int            j;
-
-            fcinfo->args[0].value = sslot1->values[i];
-
-            for (j = 0; j < sslot2->nvalues; j++)
-            {
-                Datum        fresult;
-
-                if (hasmatch2[j])
-                    continue;
-                fcinfo->args[1].value = sslot2->values[j];
-                fcinfo->isnull = false;
-                fresult = FunctionCallInvoke(fcinfo);
-                if (!fcinfo->isnull && DatumGetBool(fresult))
-                {
-                    hasmatch1[i] = hasmatch2[j] = true;
-                    matchprodfreq += sslot1->numbers[i] * sslot2->numbers[j];
-                    nmatches++;
-                    break;
-                }
-            }
-        }
+        eqjoinsel_find_matches(opfuncoid, collation,
+                               sslot1, sslot2,
+                               sslot1->nvalues, sslot2->nvalues,
+                               hasmatch1, hasmatch2,
+                               &nmatches, &matchprodfreq);
         CLAMP_PROBABILITY(matchprodfreq);
+
         /* Sum up frequencies of matched and unmatched MCVs */
         matchfreq1 = unmatchfreq1 = 0.0;
         for (i = 0; i < sslot1->nvalues; i++)
@@ -2700,12 +2665,11 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
          * lists.  We still have to estimate for the remaining population, but
          * in a skewed distribution this gives us a big leg up in accuracy.
          */
-        LOCAL_FCINFO(fcinfo, 2);
-        FmgrInfo    eqproc;
         bool       *hasmatch1;
         bool       *hasmatch2;
         double        nullfrac1 = stats1->stanullfrac;
-        double        matchfreq1,
+        double        matchprodfreq,
+                    matchfreq1,
                     uncertainfrac,
                     uncertain;
         int            i,
@@ -2721,52 +2685,16 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
          */
         clamped_nvalues2 = Min(sslot2->nvalues, nd2);

-        fmgr_info(opfuncoid, &eqproc);
-
-        /*
-         * Save a few cycles by setting up the fcinfo struct just once. Using
-         * FunctionCallInvoke directly also avoids failure if the eqproc
-         * returns NULL, though really equality functions should never do
-         * that.
-         */
-        InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
-                                 NULL, NULL);
-        fcinfo->args[0].isnull = false;
-        fcinfo->args[1].isnull = false;
-
+        /* Construct the match arrays */
         hasmatch1 = (bool *) palloc0(sslot1->nvalues * sizeof(bool));
         hasmatch2 = (bool *) palloc0(clamped_nvalues2 * sizeof(bool));

-        /*
-         * Note we assume that each MCV will match at most one member of the
-         * other MCV list.  If the operator isn't really equality, there could
-         * be multiple matches --- but we don't look for them, both for speed
-         * and because the math wouldn't add up...
-         */
-        nmatches = 0;
-        for (i = 0; i < sslot1->nvalues; i++)
-        {
-            int            j;
-
-            fcinfo->args[0].value = sslot1->values[i];
-
-            for (j = 0; j < clamped_nvalues2; j++)
-            {
-                Datum        fresult;
+        eqjoinsel_find_matches(opfuncoid, collation,
+                               sslot1, sslot2,
+                               sslot1->nvalues, clamped_nvalues2,
+                               hasmatch1, hasmatch2,
+                               &nmatches, &matchprodfreq);

-                if (hasmatch2[j])
-                    continue;
-                fcinfo->args[1].value = sslot2->values[j];
-                fcinfo->isnull = false;
-                fresult = FunctionCallInvoke(fcinfo);
-                if (!fcinfo->isnull && DatumGetBool(fresult))
-                {
-                    hasmatch1[i] = hasmatch2[j] = true;
-                    nmatches++;
-                    break;
-                }
-            }
-        }
         /* Sum up frequencies of matched MCVs */
         matchfreq1 = 0.0;
         for (i = 0; i < sslot1->nvalues; i++)
@@ -2830,6 +2758,85 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
     return selec;
 }

+/*
+ * Identify matching MCVs for eqjoinsel_inner or eqjoinsel_semi.
+ *
+ * Inputs:
+ *    opfuncoid: OID of equality function to use (might be cross-type)
+ *    collation: OID of collation to use
+ *    sslot1, sslot2: MCV values for the lefthand and righthand inputs
+ *    nvalues1, nvalues2: number of values to be considered (can be less than
+ *        sslotN->nvalues, but not more)
+ * Outputs:
+ *    hasmatch1[], hasmatch2[]: pre-zeroed arrays of lengths nvalues1, nvalues2;
+ *        entries are set to true if that MCV has a match on the other side
+ *    *p_nmatches: receives number of MCV pairs that match
+ *    *p_matchprodfreq: receives sum(sslot1->numbers[i] * sslot2->numbers[j])
+ *        for matching MCVs
+ *
+ * Note we assume that each MCV will match at most one member of the other
+ * MCV list.  If the operator isn't really equality, there could be multiple
+ * matches --- but we don't look for them, both for speed and because the
+ * math wouldn't add up...
+ */
+static void
+eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+                       AttStatsSlot *sslot1, AttStatsSlot *sslot2,
+                       int nvalues1, int nvalues2,
+                       bool *hasmatch1, bool *hasmatch2,
+                       int *p_nmatches, double *p_matchprodfreq)
+{
+    LOCAL_FCINFO(fcinfo, 2);
+    FmgrInfo    eqproc;
+    double        matchprodfreq = 0.0;
+    int            nmatches = 0;
+
+    fmgr_info(opfuncoid, &eqproc);
+
+    /*
+     * Save a few cycles by setting up the fcinfo struct just once.  Using
+     * FunctionCallInvoke directly also avoids failure if the eqproc returns
+     * NULL, though really equality functions should never do that.
+     */
+    InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
+                             NULL, NULL);
+    fcinfo->args[0].isnull = false;
+    fcinfo->args[1].isnull = false;
+
+    /*
+     * The reason for this extra level of braces will become apparent later.
+     * For now, it just prevents having to re-indent this chunk of code moved
+     * from eqjoinsel_inner.
+     */
+    {
+        for (int i = 0; i < nvalues1; i++)
+        {
+            fcinfo->args[0].value = sslot1->values[i];
+
+            for (int j = 0; j < nvalues2; j++)
+            {
+                Datum        fresult;
+
+                if (hasmatch2[j])
+                    continue;
+                fcinfo->args[1].value = sslot2->values[j];
+                fcinfo->isnull = false;
+                fresult = FunctionCallInvoke(fcinfo);
+                if (!fcinfo->isnull && DatumGetBool(fresult))
+                {
+                    hasmatch1[i] = hasmatch2[j] = true;
+                    matchprodfreq += sslot1->numbers[i] * sslot2->numbers[j];
+                    nmatches++;
+                    break;
+                }
+            }
+        }
+    }
+
+    *p_nmatches = nmatches;
+    *p_matchprodfreq = matchprodfreq;
+}
+
 /*
  *        neqjoinsel        - Join selectivity of "!="
  */
--
2.43.7

From c476028f989dd01d27ffa4b2c1a175c18f379d4a Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 13 Nov 2025 12:50:44 -0500
Subject: [PATCH v5 3/5] Rethink eqjoinsel's handling of reversed joins.

Formerly, if we needed to deal with a "reversed" join where the
outer-side variable is on the right hand of the given operator,
we looked up the operator's commutator and applied that, so that
eqjoinsel_semi could always treat "sslot1" as the outer-side
variable of the semijoin.

This isn't great, because we ended up punting to a poor estimate
if no commutator is recorded.  It also doesn't play well with
later changes in this patch series.  Instead, let's handle the
case by swapping the left and right input values just before
we call the comparison operator.  While this theoretically adds
cycles to the inner comparison loop, with the coding proposed
here I don't see any real timing difference.  (But I only tested
it on x86_64.)
---
 src/backend/utils/adt/selfuncs.c | 44 +++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 590b3a0c078..53653d2d05b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -156,6 +156,7 @@ static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
                               Form_pg_statistic stats1, Form_pg_statistic stats2,
                               bool have_mcvs1, bool have_mcvs2);
 static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
+                             bool op_is_reversed,
                              VariableStatData *vardata1, VariableStatData *vardata2,
                              double nd1, double nd2,
                              bool isdefault1, bool isdefault2,
@@ -164,6 +165,7 @@ static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
                              bool have_mcvs1, bool have_mcvs2,
                              RelOptInfo *inner_rel);
 static void eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+                                   bool op_is_reversed,
                                    AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                                    int nvalues1, int nvalues2,
                                    bool *hasmatch1, bool *hasmatch2,
@@ -2394,6 +2396,7 @@ eqjoinsel(PG_FUNCTION_ARGS)

             if (!join_is_reversed)
                 selec = eqjoinsel_semi(opfuncoid, collation,
+                                       false,
                                        &vardata1, &vardata2,
                                        nd1, nd2,
                                        isdefault1, isdefault2,
@@ -2402,11 +2405,8 @@ eqjoinsel(PG_FUNCTION_ARGS)
                                        have_mcvs1, have_mcvs2,
                                        inner_rel);
             else
-            {
-                Oid            commop = get_commutator(operator);
-                Oid            commopfuncoid = OidIsValid(commop) ? get_opcode(commop) : InvalidOid;
-
-                selec = eqjoinsel_semi(commopfuncoid, collation,
+                selec = eqjoinsel_semi(opfuncoid, collation,
+                                       true,
                                        &vardata2, &vardata1,
                                        nd2, nd1,
                                        isdefault2, isdefault1,
@@ -2414,7 +2414,6 @@ eqjoinsel(PG_FUNCTION_ARGS)
                                        stats2, stats1,
                                        have_mcvs2, have_mcvs1,
                                        inner_rel);
-            }

             /*
              * We should never estimate the output of a semijoin to be more
@@ -2499,6 +2498,7 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
         hasmatch2 = (bool *) palloc0(sslot2->nvalues * sizeof(bool));

         eqjoinsel_find_matches(opfuncoid, collation,
+                               false,
                                sslot1, sslot2,
                                sslot1->nvalues, sslot2->nvalues,
                                hasmatch1, hasmatch2,
@@ -2607,11 +2607,13 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
  * eqjoinsel_semi --- eqjoinsel for semi join
  *
  * (Also used for anti join, which we are supposed to estimate the same way.)
- * Caller has ensured that vardata1 is the LHS variable.
- * Unlike eqjoinsel_inner, we have to cope with opfuncoid being InvalidOid.
+ * Caller has ensured that vardata1 is the LHS variable; however, opfuncoid
+ * is for the original join operator, which might now need to have the inputs
+ * swapped in order to apply correctly.
  */
 static double
 eqjoinsel_semi(Oid opfuncoid, Oid collation,
+               bool op_is_reversed,
                VariableStatData *vardata1, VariableStatData *vardata2,
                double nd1, double nd2,
                bool isdefault1, bool isdefault2,
@@ -2655,7 +2657,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
         isdefault2 = false;
     }

-    if (have_mcvs1 && have_mcvs2 && OidIsValid(opfuncoid))
+    if (have_mcvs1 && have_mcvs2)
     {
         /*
          * We have most-common-value lists for both relations.  Run through
@@ -2690,6 +2692,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
         hasmatch2 = (bool *) palloc0(clamped_nvalues2 * sizeof(bool));

         eqjoinsel_find_matches(opfuncoid, collation,
+                               op_is_reversed,
                                sslot1, sslot2,
                                sslot1->nvalues, clamped_nvalues2,
                                hasmatch1, hasmatch2,
@@ -2762,8 +2765,9 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
  * Identify matching MCVs for eqjoinsel_inner or eqjoinsel_semi.
  *
  * Inputs:
- *    opfuncoid: OID of equality function to use (might be cross-type)
+ *    opfuncoid: OID of equality function to use (might be reversed)
  *    collation: OID of collation to use
+ *    op_is_reversed: indicates that opfuncoid compares right type to left type
  *    sslot1, sslot2: MCV values for the lefthand and righthand inputs
  *    nvalues1, nvalues2: number of values to be considered (can be less than
  *        sslotN->nvalues, but not more)
@@ -2781,6 +2785,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
  */
 static void
 eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+                       bool op_is_reversed,
                        AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                        int nvalues1, int nvalues2,
                        bool *hasmatch1, bool *hasmatch2,
@@ -2809,9 +2814,24 @@ eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
      * from eqjoinsel_inner.
      */
     {
+        int            index1,
+                    index2;
+
+        /* Set up to supply the values in the order the operator expects */
+        if (op_is_reversed)
+        {
+            index1 = 1;
+            index2 = 0;
+        }
+        else
+        {
+            index1 = 0;
+            index2 = 1;
+        }
+
         for (int i = 0; i < nvalues1; i++)
         {
-            fcinfo->args[0].value = sslot1->values[i];
+            fcinfo->args[index1].value = sslot1->values[i];

             for (int j = 0; j < nvalues2; j++)
             {
@@ -2819,7 +2839,7 @@ eqjoinsel_find_matches(Oid opfuncoid, Oid collation,

                 if (hasmatch2[j])
                     continue;
-                fcinfo->args[1].value = sslot2->values[j];
+                fcinfo->args[index2].value = sslot2->values[j];
                 fcinfo->isnull = false;
                 fresult = FunctionCallInvoke(fcinfo);
                 if (!fcinfo->isnull && DatumGetBool(fresult))
--
2.43.7

From f9b45278ee55e03d363b45d8bce518e7695fb5f4 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 13 Nov 2025 12:59:21 -0500
Subject: [PATCH v5 4/5] Share more work between eqjoinsel_inner and
 eqjoinsel_semi.

Originally, only one of eqjoinsel_inner and eqjoinsel_semi was
invoked per eqjoinsel call, so the fact that they duplicated a
good deal of work was irrelevant to performance.  But since commit
a314c3407, the semi/antijoin case calls both, and that is really
expensive if there are a lot of MCVs to match.  Refactor so that
we can re-use eqjoinsel_inner's matching results except in the
(uncommon) case where eqjoinsel_semi clamps the RHS MCV list size
because it's less than the expected number of rows to be fetched
from the RHS rel.  This doesn't seem to create any performance
penalty for non-semijoin cases.

While at it, we can avoid doing fmgr_info twice too.
I considered also avoiding duplicate InitFunctionCallInfoData
calls, but desisted: that wouldn't save very much, and in my
tests it looks like there may be some performance advantage
if fcinfo is a local variable.
---
 src/backend/utils/adt/selfuncs.c | 115 +++++++++++++++++++------------
 1 file changed, 72 insertions(+), 43 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53653d2d05b..b4ed12a3737 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -148,14 +148,15 @@ get_relation_stats_hook_type get_relation_stats_hook = NULL;
 get_index_stats_hook_type get_index_stats_hook = NULL;

 static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
-static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
+static double eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
                               VariableStatData *vardata1, VariableStatData *vardata2,
                               double nd1, double nd2,
                               bool isdefault1, bool isdefault2,
                               AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                               Form_pg_statistic stats1, Form_pg_statistic stats2,
-                              bool have_mcvs1, bool have_mcvs2);
-static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
+                              bool have_mcvs1, bool have_mcvs2,
+                              bool *hasmatch1, bool *hasmatch2);
+static double eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
                              bool op_is_reversed,
                              VariableStatData *vardata1, VariableStatData *vardata2,
                              double nd1, double nd2,
@@ -163,8 +164,9 @@ static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
                              AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                              Form_pg_statistic stats1, Form_pg_statistic stats2,
                              bool have_mcvs1, bool have_mcvs2,
+                             bool *hasmatch1, bool *hasmatch2,
                              RelOptInfo *inner_rel);
-static void eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+static void eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
                                    bool op_is_reversed,
                                    AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                                    int nvalues1, int nvalues2,
@@ -2311,12 +2313,15 @@ eqjoinsel(PG_FUNCTION_ARGS)
     bool        isdefault1;
     bool        isdefault2;
     Oid            opfuncoid;
+    FmgrInfo    eqproc;
     AttStatsSlot sslot1;
     AttStatsSlot sslot2;
     Form_pg_statistic stats1 = NULL;
     Form_pg_statistic stats2 = NULL;
     bool        have_mcvs1 = false;
     bool        have_mcvs2 = false;
+    bool       *hasmatch1 = NULL;
+    bool       *hasmatch2 = NULL;
     bool        get_mcv_stats;
     bool        join_is_reversed;
     RelOptInfo *inner_rel;
@@ -2367,14 +2372,25 @@ eqjoinsel(PG_FUNCTION_ARGS)
                                           ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS);
     }

+    /* Prepare info usable by both eqjoinsel_inner and eqjoinsel_semi */
+    if (have_mcvs1 && have_mcvs2)
+    {
+        fmgr_info(opfuncoid, &eqproc);
+        hasmatch1 = (bool *) palloc0(sslot1.nvalues * sizeof(bool));
+        hasmatch2 = (bool *) palloc0(sslot2.nvalues * sizeof(bool));
+    }
+    else
+        memset(&eqproc, 0, sizeof(eqproc)); /* silence uninit-var warnings */
+
     /* We need to compute the inner-join selectivity in all cases */
-    selec_inner = eqjoinsel_inner(opfuncoid, collation,
+    selec_inner = eqjoinsel_inner(&eqproc, collation,
                                   &vardata1, &vardata2,
                                   nd1, nd2,
                                   isdefault1, isdefault2,
                                   &sslot1, &sslot2,
                                   stats1, stats2,
-                                  have_mcvs1, have_mcvs2);
+                                  have_mcvs1, have_mcvs2,
+                                  hasmatch1, hasmatch2);

     switch (sjinfo->jointype)
     {
@@ -2395,7 +2411,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
             inner_rel = find_join_input_rel(root, sjinfo->min_righthand);

             if (!join_is_reversed)
-                selec = eqjoinsel_semi(opfuncoid, collation,
+                selec = eqjoinsel_semi(&eqproc, collation,
                                        false,
                                        &vardata1, &vardata2,
                                        nd1, nd2,
@@ -2403,9 +2419,10 @@ eqjoinsel(PG_FUNCTION_ARGS)
                                        &sslot1, &sslot2,
                                        stats1, stats2,
                                        have_mcvs1, have_mcvs2,
+                                       hasmatch1, hasmatch2,
                                        inner_rel);
             else
-                selec = eqjoinsel_semi(opfuncoid, collation,
+                selec = eqjoinsel_semi(&eqproc, collation,
                                        true,
                                        &vardata2, &vardata1,
                                        nd2, nd1,
@@ -2413,6 +2430,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
                                        &sslot2, &sslot1,
                                        stats2, stats1,
                                        have_mcvs2, have_mcvs1,
+                                       hasmatch2, hasmatch1,
                                        inner_rel);

             /*
@@ -2441,6 +2459,11 @@ eqjoinsel(PG_FUNCTION_ARGS)
     ReleaseVariableStats(vardata1);
     ReleaseVariableStats(vardata2);

+    if (hasmatch1)
+        pfree(hasmatch1);
+    if (hasmatch2)
+        pfree(hasmatch2);
+
     CLAMP_PROBABILITY(selec);

     PG_RETURN_FLOAT8((float8) selec);
@@ -2449,17 +2472,22 @@ eqjoinsel(PG_FUNCTION_ARGS)
 /*
  * eqjoinsel_inner --- eqjoinsel for normal inner join
  *
+ * In addition to computing the selectivity estimate, this will fill
+ * the hasmatch1[] and hasmatch2[] arrays (if have_mcvs1 && have_mcvs2).
+ * We may be able to re-use that data in eqjoinsel_semi.
+ *
  * We also use this for LEFT/FULL outer joins; it's not presently clear
  * that it's worth trying to distinguish them here.
  */
 static double
-eqjoinsel_inner(Oid opfuncoid, Oid collation,
+eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
                 VariableStatData *vardata1, VariableStatData *vardata2,
                 double nd1, double nd2,
                 bool isdefault1, bool isdefault2,
                 AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                 Form_pg_statistic stats1, Form_pg_statistic stats2,
-                bool have_mcvs1, bool have_mcvs2)
+                bool have_mcvs1, bool have_mcvs2,
+                bool *hasmatch1, bool *hasmatch2)
 {
     double        selec;

@@ -2477,8 +2505,6 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
          * results", Technical Report 1018, Computer Science Dept., University
          * of Wisconsin, Madison, March 1991 (available from ftp.cs.wisc.edu).
          */
-        bool       *hasmatch1;
-        bool       *hasmatch2;
         double        nullfrac1 = stats1->stanullfrac;
         double        nullfrac2 = stats2->stanullfrac;
         double        matchprodfreq,
@@ -2493,11 +2519,8 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
         int            i,
                     nmatches;

-        /* Construct the match arrays */
-        hasmatch1 = (bool *) palloc0(sslot1->nvalues * sizeof(bool));
-        hasmatch2 = (bool *) palloc0(sslot2->nvalues * sizeof(bool));
-
-        eqjoinsel_find_matches(opfuncoid, collation,
+        /* Fill the match arrays */
+        eqjoinsel_find_matches(eqproc, collation,
                                false,
                                sslot1, sslot2,
                                sslot1->nvalues, sslot2->nvalues,
@@ -2526,8 +2549,6 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
         }
         CLAMP_PROBABILITY(matchfreq2);
         CLAMP_PROBABILITY(unmatchfreq2);
-        pfree(hasmatch1);
-        pfree(hasmatch2);

         /*
          * Compute total frequency of non-null values that are not in the MCV
@@ -2607,12 +2628,13 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
  * eqjoinsel_semi --- eqjoinsel for semi join
  *
  * (Also used for anti join, which we are supposed to estimate the same way.)
- * Caller has ensured that vardata1 is the LHS variable; however, opfuncoid
+ * Caller has ensured that vardata1 is the LHS variable; however, eqproc
  * is for the original join operator, which might now need to have the inputs
- * swapped in order to apply correctly.
+ * swapped in order to apply correctly.  Also, if have_mcvs1 && have_mcvs2
+ * then hasmatch1[] and hasmatch2[] were filled by eqjoinsel_inner.
  */
 static double
-eqjoinsel_semi(Oid opfuncoid, Oid collation,
+eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
                bool op_is_reversed,
                VariableStatData *vardata1, VariableStatData *vardata2,
                double nd1, double nd2,
@@ -2620,6 +2642,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
                AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                Form_pg_statistic stats1, Form_pg_statistic stats2,
                bool have_mcvs1, bool have_mcvs2,
+               bool *hasmatch1, bool *hasmatch2,
                RelOptInfo *inner_rel)
 {
     double        selec;
@@ -2667,8 +2690,6 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
          * lists.  We still have to estimate for the remaining population, but
          * in a skewed distribution this gives us a big leg up in accuracy.
          */
-        bool       *hasmatch1;
-        bool       *hasmatch2;
         double        nullfrac1 = stats1->stanullfrac;
         double        matchprodfreq,
                     matchfreq1,
@@ -2687,16 +2708,29 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
          */
         clamped_nvalues2 = Min(sslot2->nvalues, nd2);

-        /* Construct the match arrays */
-        hasmatch1 = (bool *) palloc0(sslot1->nvalues * sizeof(bool));
-        hasmatch2 = (bool *) palloc0(clamped_nvalues2 * sizeof(bool));
-
-        eqjoinsel_find_matches(opfuncoid, collation,
-                               op_is_reversed,
-                               sslot1, sslot2,
-                               sslot1->nvalues, clamped_nvalues2,
-                               hasmatch1, hasmatch2,
-                               &nmatches, &matchprodfreq);
+        /*
+         * If we did not set clamped_nvalues2 to less than sslot2->nvalues,
+         * then the hasmatch1[] and hasmatch2[] match flags computed by
+         * eqjoinsel_inner are still perfectly applicable, so we need not
+         * re-do the matching work.  Note that it does not matter if
+         * op_is_reversed: we'd get the same answers.
+         *
+         * If we did clamp, then a different set of sslot2 values is to be
+         * compared, so we have to re-do the matching.
+         */
+        if (clamped_nvalues2 != sslot2->nvalues)
+        {
+            /* Must re-zero the arrays */
+            memset(hasmatch1, 0, sslot1->nvalues * sizeof(bool));
+            memset(hasmatch2, 0, clamped_nvalues2 * sizeof(bool));
+            /* Re-fill the match arrays */
+            eqjoinsel_find_matches(eqproc, collation,
+                                   op_is_reversed,
+                                   sslot1, sslot2,
+                                   sslot1->nvalues, clamped_nvalues2,
+                                   hasmatch1, hasmatch2,
+                                   &nmatches, &matchprodfreq);
+        }

         /* Sum up frequencies of matched MCVs */
         matchfreq1 = 0.0;
@@ -2706,8 +2740,6 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
                 matchfreq1 += sslot1->numbers[i];
         }
         CLAMP_PROBABILITY(matchfreq1);
-        pfree(hasmatch1);
-        pfree(hasmatch2);

         /*
          * Now we need to estimate the fraction of relation 1 that has at
@@ -2765,9 +2797,9 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
  * Identify matching MCVs for eqjoinsel_inner or eqjoinsel_semi.
  *
  * Inputs:
- *    opfuncoid: OID of equality function to use (might be reversed)
+ *    eqproc: FmgrInfo for equality function to use (might be reversed)
  *    collation: OID of collation to use
- *    op_is_reversed: indicates that opfuncoid compares right type to left type
+ *    op_is_reversed: indicates that eqproc compares right type to left type
  *    sslot1, sslot2: MCV values for the lefthand and righthand inputs
  *    nvalues1, nvalues2: number of values to be considered (can be less than
  *        sslotN->nvalues, but not more)
@@ -2784,7 +2816,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
  * math wouldn't add up...
  */
 static void
-eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
                        bool op_is_reversed,
                        AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                        int nvalues1, int nvalues2,
@@ -2792,18 +2824,15 @@ eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
                        int *p_nmatches, double *p_matchprodfreq)
 {
     LOCAL_FCINFO(fcinfo, 2);
-    FmgrInfo    eqproc;
     double        matchprodfreq = 0.0;
     int            nmatches = 0;

-    fmgr_info(opfuncoid, &eqproc);
-
     /*
      * Save a few cycles by setting up the fcinfo struct just once.  Using
      * FunctionCallInvoke directly also avoids failure if the eqproc returns
      * NULL, though really equality functions should never do that.
      */
-    InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
+    InitFunctionCallInfoData(*fcinfo, eqproc, 2, collation,
                              NULL, NULL);
     fcinfo->args[0].isnull = false;
     fcinfo->args[1].isnull = false;
--
2.43.7

From 331e8a35118699cb6d4076aa65ddbfa44a905665 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 13 Nov 2025 16:55:43 -0500
Subject: [PATCH v5 5/5] Use hashing to avoid O(N^2) matching work in
 eqjoinsel.

Use a simplehash hash table if there are enough MCVs and the
join operator has associated hash functions.  The threshold
for switching to hash mode perhaps could use more research.
---
 src/backend/utils/adt/selfuncs.c | 245 ++++++++++++++++++++++++++++++-
 src/tools/pgindent/typedefs.list |   3 +
 2 files changed, 243 insertions(+), 5 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index b4ed12a3737..a68d5423874 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -143,12 +143,47 @@

 #define DEFAULT_PAGE_CPU_MULTIPLIER 50.0

+/*
+ * In production builds, switch to hash-based MCV matching when lists are
+ * large enough to amortize hash setup cost.  In debug builds, we use a
+ * smaller threshold so that the regression tests cover both paths well.
+ */
+#ifndef USE_ASSERT_CHECKING
+#define EQJOINSEL_MCV_HASH_THRESHOLD 100
+#else
+#define EQJOINSEL_MCV_HASH_THRESHOLD 10
+#endif
+
+/* Entries in the simplehash hash table used by eqjoinsel_find_matches */
+typedef struct McvHashEntry
+{
+    Datum        value;            /* the value represented by this entry */
+    int            index;            /* its index in the relevant AttStatsSlot */
+    uint32        hash;            /* hash code for the Datum */
+    char        status;            /* status code used by simplehash.h */
+} McvHashEntry;
+
+/* private_data for the simplehash hash table */
+typedef struct McvHashContext
+{
+    FunctionCallInfo equal_fcinfo;    /* the equality join operator */
+    FunctionCallInfo hash_fcinfo;    /* the hash function to use */
+    bool        op_is_reversed; /* equality compares hash type to probe type */
+    bool        insert_mode;    /* doing inserts or lookups? */
+    bool        hash_typbyval;    /* typbyval of hashed data type */
+    int16        hash_typlen;    /* typlen of hashed data type */
+} McvHashContext;
+
+/* forward reference */
+typedef struct McvHashTable_hash McvHashTable_hash;
+
 /* Hooks for plugins to get control when we ask for stats */
 get_relation_stats_hook_type get_relation_stats_hook = NULL;
 get_index_stats_hook_type get_index_stats_hook = NULL;

 static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
 static double eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
+                              Oid hashLeft, Oid hashRight,
                               VariableStatData *vardata1, VariableStatData *vardata2,
                               double nd1, double nd2,
                               bool isdefault1, bool isdefault2,
@@ -157,6 +192,7 @@ static double eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
                               bool have_mcvs1, bool have_mcvs2,
                               bool *hasmatch1, bool *hasmatch2);
 static double eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
+                             Oid hashLeft, Oid hashRight,
                              bool op_is_reversed,
                              VariableStatData *vardata1, VariableStatData *vardata2,
                              double nd1, double nd2,
@@ -167,11 +203,14 @@ static double eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
                              bool *hasmatch1, bool *hasmatch2,
                              RelOptInfo *inner_rel);
 static void eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
+                                   Oid hashLeft, Oid hashRight,
                                    bool op_is_reversed,
                                    AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                                    int nvalues1, int nvalues2,
                                    bool *hasmatch1, bool *hasmatch2,
                                    int *p_nmatches, double *p_matchprodfreq);
+static uint32 hash_mcv(McvHashTable_hash *tab, Datum key);
+static bool mcvs_equal(McvHashTable_hash *tab, Datum key0, Datum key1);
 static bool estimate_multivariate_ndistinct(PlannerInfo *root,
                                             RelOptInfo *rel, List **varinfos, double *ndistinct);
 static bool convert_to_scalar(Datum value, Oid valuetypid, Oid collid,
@@ -227,6 +266,20 @@ static RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
 static double btcost_correlation(IndexOptInfo *index,
                                  VariableStatData *vardata);

+/* Define support routines for MCV hash tables */
+#define SH_PREFIX                McvHashTable
+#define SH_ELEMENT_TYPE            McvHashEntry
+#define SH_KEY_TYPE                Datum
+#define SH_KEY                    value
+#define SH_HASH_KEY(tab,key)    hash_mcv(tab, key)
+#define SH_EQUAL(tab,key0,key1)    mcvs_equal(tab, key0, key1)
+#define SH_SCOPE                static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tab,ent)    (ent)->hash
+#define SH_DEFINE
+#define SH_DECLARE
+#include "lib/simplehash.h"
+

 /*
  *        eqsel            - Selectivity of "=" for any data types.
@@ -2314,6 +2367,8 @@ eqjoinsel(PG_FUNCTION_ARGS)
     bool        isdefault2;
     Oid            opfuncoid;
     FmgrInfo    eqproc;
+    Oid            hashLeft = InvalidOid;
+    Oid            hashRight = InvalidOid;
     AttStatsSlot sslot1;
     AttStatsSlot sslot2;
     Form_pg_statistic stats1 = NULL;
@@ -2378,12 +2433,20 @@ eqjoinsel(PG_FUNCTION_ARGS)
         fmgr_info(opfuncoid, &eqproc);
         hasmatch1 = (bool *) palloc0(sslot1.nvalues * sizeof(bool));
         hasmatch2 = (bool *) palloc0(sslot2.nvalues * sizeof(bool));
+
+        /*
+         * If the MCV lists are long enough to justify hashing, try to look up
+         * hash functions for the join operator.  XXX should this be Max()?
+         */
+        if (Min(sslot1.nvalues, sslot2.nvalues) >= EQJOINSEL_MCV_HASH_THRESHOLD)
+            (void) get_op_hash_functions(operator, &hashLeft, &hashRight);
     }
     else
         memset(&eqproc, 0, sizeof(eqproc)); /* silence uninit-var warnings */

     /* We need to compute the inner-join selectivity in all cases */
     selec_inner = eqjoinsel_inner(&eqproc, collation,
+                                  hashLeft, hashRight,
                                   &vardata1, &vardata2,
                                   nd1, nd2,
                                   isdefault1, isdefault2,
@@ -2412,6 +2475,7 @@ eqjoinsel(PG_FUNCTION_ARGS)

             if (!join_is_reversed)
                 selec = eqjoinsel_semi(&eqproc, collation,
+                                       hashLeft, hashRight,
                                        false,
                                        &vardata1, &vardata2,
                                        nd1, nd2,
@@ -2423,6 +2487,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
                                        inner_rel);
             else
                 selec = eqjoinsel_semi(&eqproc, collation,
+                                       hashLeft, hashRight,
                                        true,
                                        &vardata2, &vardata1,
                                        nd2, nd1,
@@ -2481,6 +2546,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
  */
 static double
 eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
+                Oid hashLeft, Oid hashRight,
                 VariableStatData *vardata1, VariableStatData *vardata2,
                 double nd1, double nd2,
                 bool isdefault1, bool isdefault2,
@@ -2521,6 +2587,7 @@ eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,

         /* Fill the match arrays */
         eqjoinsel_find_matches(eqproc, collation,
+                               hashLeft, hashRight,
                                false,
                                sslot1, sslot2,
                                sslot1->nvalues, sslot2->nvalues,
@@ -2635,6 +2702,7 @@ eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
  */
 static double
 eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
+               Oid hashLeft, Oid hashRight,
                bool op_is_reversed,
                VariableStatData *vardata1, VariableStatData *vardata2,
                double nd1, double nd2,
@@ -2725,6 +2793,7 @@ eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
             memset(hasmatch2, 0, clamped_nvalues2 * sizeof(bool));
             /* Re-fill the match arrays */
             eqjoinsel_find_matches(eqproc, collation,
+                                   hashLeft, hashRight,
                                    op_is_reversed,
                                    sslot1, sslot2,
                                    sslot1->nvalues, clamped_nvalues2,
@@ -2799,6 +2868,8 @@ eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
  * Inputs:
  *    eqproc: FmgrInfo for equality function to use (might be reversed)
  *    collation: OID of collation to use
+ *    hashLeft, hashRight: OIDs of hash functions associated with equality op,
+ *        or InvalidOid if we're not to use hashing
  *    op_is_reversed: indicates that eqproc compares right type to left type
  *    sslot1, sslot2: MCV values for the lefthand and righthand inputs
  *    nvalues1, nvalues2: number of values to be considered (can be less than
@@ -2810,6 +2881,9 @@ eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
  *    *p_matchprodfreq: receives sum(sslot1->numbers[i] * sslot2->numbers[j])
  *        for matching MCVs
  *
+ * Note that hashLeft is for the eqproc's left-hand input type, hashRight
+ * for its right, regardless of op_is_reversed.
+ *
  * Note we assume that each MCV will match at most one member of the other
  * MCV list.  If the operator isn't really equality, there could be multiple
  * matches --- but we don't look for them, both for speed and because the
@@ -2817,6 +2891,7 @@ eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
  */
 static void
 eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
+                       Oid hashLeft, Oid hashRight,
                        bool op_is_reversed,
                        AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                        int nvalues1, int nvalues2,
@@ -2837,12 +2912,111 @@ eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
     fcinfo->args[0].isnull = false;
     fcinfo->args[1].isnull = false;

-    /*
-     * The reason for this extra level of braces will become apparent later.
-     * For now, it just prevents having to re-indent this chunk of code moved
-     * from eqjoinsel_inner.
-     */
+    if (OidIsValid(hashLeft) && OidIsValid(hashRight))
+    {
+        /* Use a hash table to speed up the matching */
+        LOCAL_FCINFO(hash_fcinfo, 1);
+        FmgrInfo    hash_proc;
+        McvHashContext hashContext;
+        McvHashTable_hash *hashTable;
+        AttStatsSlot *statsProbe;
+        AttStatsSlot *statsHash;
+        bool       *hasMatchProbe;
+        bool       *hasMatchHash;
+        int            nvaluesProbe;
+        int            nvaluesHash;
+
+        /* Make sure we build the hash table on the smaller array. */
+        if (sslot1->nvalues >= sslot2->nvalues)
+        {
+            statsProbe = sslot1;
+            statsHash = sslot2;
+            hasMatchProbe = hasmatch1;
+            hasMatchHash = hasmatch2;
+            nvaluesProbe = nvalues1;
+            nvaluesHash = nvalues2;
+        }
+        else
+        {
+            /* We'll have to reverse the direction of use of the operator. */
+            op_is_reversed = !op_is_reversed;
+            statsProbe = sslot2;
+            statsHash = sslot1;
+            hasMatchProbe = hasmatch2;
+            hasMatchHash = hasmatch1;
+            nvaluesProbe = nvalues2;
+            nvaluesHash = nvalues1;
+        }
+
+        /*
+         * Build the hash table on the smaller array, using the appropriate
+         * hash function for its data type.
+         */
+        fmgr_info(op_is_reversed ? hashLeft : hashRight, &hash_proc);
+        InitFunctionCallInfoData(*hash_fcinfo, &hash_proc, 1, collation,
+                                 NULL, NULL);
+        hash_fcinfo->args[0].isnull = false;
+
+        hashContext.equal_fcinfo = fcinfo;
+        hashContext.hash_fcinfo = hash_fcinfo;
+        hashContext.op_is_reversed = op_is_reversed;
+        hashContext.insert_mode = true;
+        get_typlenbyval(statsHash->valuetype,
+                        &hashContext.hash_typlen,
+                        &hashContext.hash_typbyval);
+
+        hashTable = McvHashTable_create(CurrentMemoryContext,
+                                        nvaluesHash,
+                                        &hashContext);
+
+        for (int i = 0; i < nvaluesHash; i++)
+        {
+            bool        found = false;
+            McvHashEntry *entry = McvHashTable_insert(hashTable,
+                                                      statsHash->values[i],
+                                                      &found);
+
+            Assert(!found);        /* XXX seems possibly unsafe */
+            entry->index = i;
+        }
+
+        /*
+         * Prepare to probe the hash table.  If the probe values are of a
+         * different data type, then we need to change hash functions.  (This
+         * code relies on the assumption that since we defined SH_STORE_HASH,
+         * simplehash.h will never need to compute hash values for existing
+         * hash table entries.)
+         */
+        hashContext.insert_mode = false;
+        if (hashLeft != hashRight)
+        {
+            fmgr_info(op_is_reversed ? hashRight : hashLeft, &hash_proc);
+            /* Resetting hash_fcinfo is probably unnecessary, but be safe */
+            InitFunctionCallInfoData(*hash_fcinfo, &hash_proc, 1, collation,
+                                     NULL, NULL);
+            hash_fcinfo->args[0].isnull = false;
+        }
+
+        /* Look up each probe value in turn. */
+        for (int i = 0; i < nvaluesProbe; i++)
+        {
+            McvHashEntry *entry = McvHashTable_lookup(hashTable,
+                                                      statsProbe->values[i]);
+
+            /* As in the other code path, skip already-matched hash entries */
+            if (entry != NULL && !hasMatchHash[entry->index])
+            {
+                hasMatchHash[entry->index] = hasMatchProbe[i] = true;
+                nmatches++;
+                matchprodfreq += statsHash->numbers[entry->index] * statsProbe->numbers[i];
+            }
+        }
+
+        McvHashTable_destroy(hashTable);
+    }
+    else
     {
+        /* We're not to use hashing, so do it the O(N^2) way */
         int            index1,
                     index2;

@@ -2886,6 +3060,67 @@ eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
     *p_matchprodfreq = matchprodfreq;
 }

+/*
+ * Support functions for the hash tables used by eqjoinsel_find_matches
+ */
+static uint32
+hash_mcv(McvHashTable_hash *tab, Datum key)
+{
+    McvHashContext *context = (McvHashContext *) tab->private_data;
+    FunctionCallInfo fcinfo = context->hash_fcinfo;
+    Datum        fresult;
+
+    fcinfo->args[0].value = key;
+    fcinfo->isnull = false;
+    fresult = FunctionCallInvoke(fcinfo);
+    Assert(!fcinfo->isnull);
+    return DatumGetUInt32(fresult);
+}
+
+static bool
+mcvs_equal(McvHashTable_hash *tab, Datum key0, Datum key1)
+{
+    McvHashContext *context = (McvHashContext *) tab->private_data;
+
+    if (context->insert_mode)
+    {
+        /*
+         * During the insertion step, any comparisons will be between two
+         * Datums of the hash table's data type, so if the given operator is
+         * cross-type it will be the wrong thing to use.  Fortunately, we can
+         * use datum_image_eq instead.  The MCV values should all be distinct
+         * anyway, so it's mostly pro-forma to compare them at all.
+         */
+        return datum_image_eq(key0, key1,
+                              context->hash_typbyval, context->hash_typlen);
+    }
+    else
+    {
+        FunctionCallInfo fcinfo = context->equal_fcinfo;
+        Datum        fresult;
+
+        /*
+         * Apply the operator the correct way around.  Although simplehash.h
+         * doesn't document this explicitly, during lookups key0 is from the
+         * hash table while key1 is the probe value, so we should compare them
+         * in that order only if op_is_reversed.
+         */
+        if (context->op_is_reversed)
+        {
+            fcinfo->args[0].value = key0;
+            fcinfo->args[1].value = key1;
+        }
+        else
+        {
+            fcinfo->args[0].value = key1;
+            fcinfo->args[1].value = key0;
+        }
+        fcinfo->isnull = false;
+        fresult = FunctionCallInvoke(fcinfo);
+        return (!fcinfo->isnull && DatumGetBool(fresult));
+    }
+}
+
 /*
  *        neqjoinsel        - Join selectivity of "!="
  */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 23bce72ae64..ac5c6ba9833 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1666,6 +1666,9 @@ ManyTestResourceKind
 Material
 MaterialPath
 MaterialState
+McvHashContext
+McvHashEntry
+McvHashTable_hash
 MdPathStr
 MdfdVec
 Memoize
--
2.43.7

Re: Use merge-based matching for MCVs in eqjoinsel

От

David Geier

Дата:

14 ноября, 20:37:33

Hi Ilia!

On 13.10.2025 12:08, Ilia Evdokimov wrote:
> 
> On 17.09.2025 12:40, Ilia Evdokimov wrote:
>> Hi David,
>>
>> In v2 patch, when the join is reversed we pass the commutator operator
>> Oid to eqjoinsel_semi(), and inside that function we immediately call
>> get_opcode(<commutator operator Oid>). Did you mean for the function
>> to take an operator Oid instead of an here?
>>
>> If that was unintentional, perhaps the cleanest fix is to add a new
>> 'operator' parameter to eqjoinsel_semi() so we can keep passing
>> 'opfuncoid' as before and avoid changing the behavior.
>>
> 
> This v3 patch fixes the confusion between operator and function Oids in
> eqjoinsel_semi(). This version restores the previous behavior by keeping
> the function Oid as before and adds an explicit 'operator' parameter so
> both values are available without extra behavior changes.
> 
> Do you have any further comments or suggestions on this version?
> 

I'm sorry for missing your email with the test results. I'll read up on
it as well as the v3 patch early next week and reply.

--
David Geier

Re: Use merge-based matching for MCVs in eqjoinsel

От

Ilia Evdokimov

Дата:

17 ноября, 18:17:25

On 14.11.2025 01:21, Tom Lane wrote:

This didn't look very much like the refactorization that I had in
mind: I thought we should have one copy of the matching code, not two.
Also, after looking closer at your patch I realized you were just
punting for cross-type comparison operators, which I felt was kind
of sad.  It's a little bit tricky to get simplehash.h to go along
with cross-type hashing, because it wants to use just one hash and
one equality function.  But since those are interface routines we
are going to supply anyway, we can make them deal with the insert
and lookup cases differently.

I had considered the cross-type comparison operators and I didn’t see a clean way to support them, so I intentionally excluded cross-type cases from hash probing. Your suggestion to switch the hash function in probe mode is clearly a more correct approach than simply rejecting those cases. Thanks for the explanation.


So after a bit of hacking I ended up with the attached.  I split up
the refactorization into several steps to make it easier to review.
(But I'd anticipate squashing these into one commit in the end,
so I didn't spend a lot of time on the commit messages.)

I reviewed patches 0002-0004 with the refactoring, and I think the overall approach is excellent. However, I noticed one issue: in eqjoinsel_semi() the variable 'nmatches' is not initialized and can lead to undefined behavior when clamped_nvalues2 == sslot2->nvalues. Before the refactoring it was initialized by zero.

Also, 0001 in this series is not meant to be committed; what it
does is to add some debug logging to ease comparing runtimes of
different versions of eqjoinsel.  I was able to use that to
convince myself that the refactoring steps didn't cost anything
meaningful in performance.  Perhaps we could use it to investigate
the right hashing threshold more carefully, too.

With 0001 patch I tested the selectivity calculation time for SEMI JOIN after applying patches 0002-0004, and the time was cut in half. Thank you for the work on that.


There are still a couple of XXX comments in the attached, denoting
loose ends to look at.  In particular, I wondered whether the
hash threshold check
        if (Min(sslot1.nvalues, sslot2.nvalues) >= EQJOINSEL_MCV_HASH_THRESHOLD)

should use Max() instead --- that is, it might be safer to hash
if either MCV list is long.  Or, holding one's head at a different
angle, perhaps the sum of the list lengths should be what's checked?
			regards, tom lane

Hmm… using the sum actually seems like a good idea for me. It may provide a smoother switch-over point between the two MCV-scanning algorithms when both list lengths are below the threshold. But this definitely needs to be validated by measuring different MCV lengths below the threshold using the 0001 patch.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

Re: Use merge-based matching for MCVs in eqjoinsel

От

David Geier

Дата:

17 ноября, 18:25:35

Hi Ilia!

On 13.10.2025 12:08, Ilia Evdokimov wrote:
> 
> On 17.09.2025 12:40, Ilia Evdokimov wrote:
>> In v2 patch, when the join is reversed we pass the commutator operator
>> Oid to eqjoinsel_semi(), and inside that function we immediately call
>> get_opcode(<commutator operator Oid>). Did you mean for the function
>> to take an operator Oid instead of an here?
>>
>> If that was unintentional, perhaps the cleanest fix is to add a new
>> 'operator' parameter to eqjoinsel_semi() so we can keep passing
>> 'opfuncoid' as before and avoid changing the behavior.
>>
> 
> This v3 patch fixes the confusion between operator and function Oids in
> eqjoinsel_semi(). This version restores the previous behavior by keeping
> the function Oid as before and adds an explicit 'operator' parameter so
> both values are available without extra behavior changes.
> 
> Do you have any further comments or suggestions on this version?

I believe it doesn't matter. Because in try_eqjoinsel_with_hashtable()
we bail if the hash function of the LHS and the RHS of the equality
operator is not the same. Hence, if we use the commutator operator or
not doesn't matter.

But maybe I'm overlooking something. Can you provide an example that
fails without your change? If you can, we could add that also to the
regression tests.

Beyond that everything looks good to me. The regression tests also
passed for me.

--
David Geier

Re: Use merge-based matching for MCVs in eqjoinsel

От

David Geier

Дата:

17 ноября, 18:28:23

Hi Ilia!

On 16.09.2025 17:52, Ilia Evdokimov wrote:
> Hi hackers,
> 
> On 10.09.2025 16:56, Ilia Evdokimov wrote:
>> Unfortunately, the JOB benchmark does not contain semi join nodes.
>> However, TPC-DS does. I'll look for the queries with slowest planner
>> times there and check them.
>>
>> I'll need some time to check both join and semi join cases with small
>> and large default_statistics_target. I'll share the results later.
> 
> JOIN
> ==============================
> 
> I’ve benchmarked the new implementation of eqjoinsel() with different
> values of default_statistics_target. On small targets (1, 5, 10, 25, 50,
> 75, 100) the results are all within statistical noise, and I did not
> observe any regressions. In my view, it’s reasonable to keep the current
> condition that the hash table is not used for default_statistics_target
> = 1. Raising that threshold does not seem useful.
> 
> Here are the results for JOB queries (where the effect of semi join is
> not visible due to different data distributions):
> 
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> ------------------------------------------------------------------------------------------
> 1                         | 1.00                | 1846.643            |
> 1847.409
> 5                         | 1.00                | 1836.391            |
> 1828.318
> 10                        | 0.95                | 1841.750            |
> 1929.722
> 25                        | 0.99                | 1873.172            |
> 1890.741
> 50                        | 0.98                | 1869.897            |
> 1898.470
> 75                        | 1.02                | 1969.368            |
> 1929.521
> 100                       | 0.97                | 1857.890            |
> 1921.207
> 1000                      | 1.14                | 2279.700            |
> 1997.102
> 2500                      | 1.78                | 4682.658            |
> 2636.202
> 5000                      | 6.45                | 15943.696           |
> 2471.242
> 7500                      | 12.45               | 34350.855           |
> 2758.565
> 10000                     | 20.52               | 62519.342           |
> 3046.819
> 

Good that we've confirmed that.

> SEMI JOIN
> ==============================
> 
> Unfortunately, in TPC-DS it is not possible to clearly see improvements
> for semi joins. To address this, I designed a synthetic example where
> the data distribution forces the loop to run fully, without exiting
> early, which makes the effect on semi joins more visible. In this setup,
> I also ensured that the length of the MCV array is equal to the chosen
> default_statistics_target.
> 
> CREATE TABLE t1 AS
> SELECT CASE
>          WHEN g <= 3000000 * 0.9 THEN (g % 10000) + 1
>          ELSE (g % 1000000) + 10000
>        END AS id
> FROM generate_series(1, 3000000) g;
> 
> CREATE TABLE t2 AS
> SELECT CASE
>          WHEN g <= 3000000 * 0.9 THEN (g % 10000) + 10001
>          ELSE (g % 1000000) + 20000
>        END AS id
> FROM generate_series(1, 3000000) g;
> 
> ANALYZE t1, t2;
> 
> The results of the query are:
> 
> SELECT * FROM t1
> WHERE id IN (SELECT id FROM t2);
> 
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> ------------------------------------------------------------------------------------------
> 1                         | 1.12                | 1.191               |
> 1.062
> 5                         | 1.02                | 0.493               |
> 0.481
> 10                        | 0.92                | 0.431               |
> 0.471
> 25                        | 1.27                | 0.393               |
> 0.309
> 50                        | 1.04                | 0.432               |
> 0.416
> 75                        | 0.96                | 0.398               |
> 0.415
> 100                       | 0.95                | 0.450               |
> 0.473
> 1000                      | 9.42                | 6.742               |
> 0.716
> 2500                      | 19.15               | 21.621              |
> 1.129
> 5000                      | 46.74               | 85.667              |
> 1.833
> 7500                      | 73.26               | 194.806             |
> 2.659
> 10000                     | 107.95              | 349.981             |
> 3.242
> 

That's some decent speedups, considering that it's planning time.

Thanks for testing the code!

--
David Geier

Re: Use merge-based matching for MCVs in eqjoinsel

От

Ilia Evdokimov

Дата:

17 ноября, 18:42:38

Hi David,

It looks like there are some technical issues, but Tom and I are 
currently discussing version 5 of the patches. Here is the link to the 
ongoing discussion [0]. If you have any suggestions or feedback about 
the patches, feel free to share them.

[0]: 
https://www.postgresql.org/message-id/3026409.1763072514%40sss.pgh.pa.us

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

Re: Use merge-based matching for MCVs in eqjoinsel

От

Tom Lane

Дата:

17 ноября, 21:30:13

Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes:
> On 14.11.2025 01:21, Tom Lane wrote:
>> So after a bit of hacking I ended up with the attached.  I split up
>> the refactorization into several steps to make it easier to review.

> I reviewed patches 0002-0004 with the refactoring, and I think the 
> overall approach is excellent. However, I noticed one issue: in 
> eqjoinsel_semi() the variable 'nmatches' is not initialized and can lead 
> to undefined behavior when clamped_nvalues2 == sslot2->nvalues. Before 
> the refactoring it was initialized by zero.

Argh!  Can't believe I missed that.

I experimented with combining all of eqjoinsel_find_matches' outputs
into one struct, but decided that that was uglier than just adding one
more pass-by-reference argument to eqjoinsel_inner/semi.

>> There are still a couple of XXX comments in the attached, denoting
>> loose ends to look at.

In the attached v6, I cleaned up one of the XXX items, deciding that
duplicate entries in the hash table should be coped with not asserted
against.  The reason is that we might be working with a comparison
operator that is not exactly the one used to build the MCV list:
the planner is usually pretty cavalier about applying stats that are
only fuzzy matches to what it needs.  So it seems possible that we
could find entries that are equal according to the operator we're
using, even though they're unequal according to what ANALYZE used to
compute the MCV list.  I didn't actively try to hit that Assert, but
I think a counterexample could be built by using a case-insensitive
collation in a join query.

>> In particular, I wondered whether the
>> hash threshold check
>> if (Min(sslot1.nvalues, sslot2.nvalues) >= EQJOINSEL_MCV_HASH_THRESHOLD)
>> should use Max() instead --- that is, it might be safer to hash
>> if either MCV list is long.  Or, holding one's head at a different
>> angle, perhaps the sum of the list lengths should be what's checked?

> Hmm… using the sum actually seems like a good idea for me.

Actually, after sleeping on it it seems like the obvious thing is
to test "sslot1.nvalues * sslot2.nvalues", since the work we are
thinking about saving scales as that product.  But I'm not sure
what threshold value to use if we do that.  Maybe around 10000?

            regards, tom lane

From 15f6c1df696438f5a8f84c8fe3bcf2b1dd1ea475 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon, 17 Nov 2025 12:06:49 -0500
Subject: [PATCH v6 1/5] Add some test scaffolding to join_selectivity().

This not-meant-for-commit patch adds some instrumentation to
plancat.c's join_selectivity() to log the result and runtime
of a join selectivity function.  This is useful for manual
testing of performance patches in eqjoinsel().

To improve the accuracy of the runtime measurement, run the
function 1000 times in each call.  The regression tests still
take a reasonable amount of time with this overhead, although
it's noticeably more than usual.
---
 src/backend/optimizer/util/plancat.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d950bd93002..15c50e068a0 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -42,6 +42,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
+#include "portability/instr_time.h"
 #include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
 #include "statistics/statistics.h"
@@ -2123,6 +2124,8 @@ join_selectivity(PlannerInfo *root,
 {
     RegProcedure oprjoin = get_oprjoin(operatorid);
     float8        result;
+    instr_time    start_time,
+                end_time;

     /*
      * if the oprjoin procedure is missing for whatever reason, use a
@@ -2131,6 +2134,10 @@ join_selectivity(PlannerInfo *root,
     if (!oprjoin)
         return (Selectivity) 0.5;

+    INSTR_TIME_SET_CURRENT(start_time);
+
+    for (int i = 0; i < 1000; i++)
+    {
     result = DatumGetFloat8(OidFunctionCall5Coll(oprjoin,
                                                  inputcollid,
                                                  PointerGetDatum(root),
@@ -2138,6 +2145,21 @@ join_selectivity(PlannerInfo *root,
                                                  PointerGetDatum(args),
                                                  Int16GetDatum(jointype),
                                                  PointerGetDatum(sjinfo)));
+    CHECK_FOR_INTERRUPTS();
+    }
+
+    /* Don't be chatty during initdb */
+    if (IsUnderPostmaster)
+    {
+        INSTR_TIME_SET_CURRENT(end_time);
+
+        INSTR_TIME_SUBTRACT(end_time, start_time);
+
+        /* multiply by 1e6 to get microsecs, divide by 1000 to cancel loop */
+        elog(LOG, "join_selectivity(opr %u, jointype %d) = %f, time %g us",
+             operatorid, jointype, result,
+             INSTR_TIME_GET_DOUBLE(end_time) * 1e3);
+    }

     if (result < 0.0 || result > 1.0)
         elog(ERROR, "invalid join selectivity: %f", result);
--
2.43.7

From 571d2ea49ec9388010fa26a8fa165a7faac94c94 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon, 17 Nov 2025 12:08:39 -0500
Subject: [PATCH v6 2/5] Factor out duplicative code in
 eqjoinsel_inner/eqjoinsel_semi.

These functions have essentially identical code for scanning the
two MCV lists and identifying which entries have matches in the
other list.  While it's not a huge amount of code, it's 50 or
so lines, and will be more after an upcoming patch to use a hash
table with many MCVs.  Let's reduce duplication by moving that
code into a common subroutine.

The one downside of doing this is that we must compute
sum(sslot1->numbers[i] * sslot2->numbers[j]) even though
eqjoinsel_semi won't need that.  But the cost of that appears
negligible, so I didn't trouble to invent a way of avoiding it.
---
 src/backend/utils/adt/selfuncs.c | 191 ++++++++++++++++---------------
 1 file changed, 99 insertions(+), 92 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cb23ad52782..590b3a0c078 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -163,6 +163,11 @@ static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
                              Form_pg_statistic stats1, Form_pg_statistic stats2,
                              bool have_mcvs1, bool have_mcvs2,
                              RelOptInfo *inner_rel);
+static void eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+                                   AttStatsSlot *sslot1, AttStatsSlot *sslot2,
+                                   int nvalues1, int nvalues2,
+                                   bool *hasmatch1, bool *hasmatch2,
+                                   int *p_nmatches, double *p_matchprodfreq);
 static bool estimate_multivariate_ndistinct(PlannerInfo *root,
                                             RelOptInfo *rel, List **varinfos, double *ndistinct);
 static bool convert_to_scalar(Datum value, Oid valuetypid, Oid collid,
@@ -2473,8 +2478,6 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
          * results", Technical Report 1018, Computer Science Dept., University
          * of Wisconsin, Madison, March 1991 (available from ftp.cs.wisc.edu).
          */
-        LOCAL_FCINFO(fcinfo, 2);
-        FmgrInfo    eqproc;
         bool       *hasmatch1;
         bool       *hasmatch2;
         double        nullfrac1 = stats1->stanullfrac;
@@ -2491,55 +2494,17 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
         int            i,
                     nmatches;

-        fmgr_info(opfuncoid, &eqproc);
-
-        /*
-         * Save a few cycles by setting up the fcinfo struct just once. Using
-         * FunctionCallInvoke directly also avoids failure if the eqproc
-         * returns NULL, though really equality functions should never do
-         * that.
-         */
-        InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
-                                 NULL, NULL);
-        fcinfo->args[0].isnull = false;
-        fcinfo->args[1].isnull = false;
-
+        /* Construct the match arrays */
         hasmatch1 = (bool *) palloc0(sslot1->nvalues * sizeof(bool));
         hasmatch2 = (bool *) palloc0(sslot2->nvalues * sizeof(bool));

-        /*
-         * Note we assume that each MCV will match at most one member of the
-         * other MCV list.  If the operator isn't really equality, there could
-         * be multiple matches --- but we don't look for them, both for speed
-         * and because the math wouldn't add up...
-         */
-        matchprodfreq = 0.0;
-        nmatches = 0;
-        for (i = 0; i < sslot1->nvalues; i++)
-        {
-            int            j;
-
-            fcinfo->args[0].value = sslot1->values[i];
-
-            for (j = 0; j < sslot2->nvalues; j++)
-            {
-                Datum        fresult;
-
-                if (hasmatch2[j])
-                    continue;
-                fcinfo->args[1].value = sslot2->values[j];
-                fcinfo->isnull = false;
-                fresult = FunctionCallInvoke(fcinfo);
-                if (!fcinfo->isnull && DatumGetBool(fresult))
-                {
-                    hasmatch1[i] = hasmatch2[j] = true;
-                    matchprodfreq += sslot1->numbers[i] * sslot2->numbers[j];
-                    nmatches++;
-                    break;
-                }
-            }
-        }
+        eqjoinsel_find_matches(opfuncoid, collation,
+                               sslot1, sslot2,
+                               sslot1->nvalues, sslot2->nvalues,
+                               hasmatch1, hasmatch2,
+                               &nmatches, &matchprodfreq);
         CLAMP_PROBABILITY(matchprodfreq);
+
         /* Sum up frequencies of matched and unmatched MCVs */
         matchfreq1 = unmatchfreq1 = 0.0;
         for (i = 0; i < sslot1->nvalues; i++)
@@ -2700,12 +2665,11 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
          * lists.  We still have to estimate for the remaining population, but
          * in a skewed distribution this gives us a big leg up in accuracy.
          */
-        LOCAL_FCINFO(fcinfo, 2);
-        FmgrInfo    eqproc;
         bool       *hasmatch1;
         bool       *hasmatch2;
         double        nullfrac1 = stats1->stanullfrac;
-        double        matchfreq1,
+        double        matchprodfreq,
+                    matchfreq1,
                     uncertainfrac,
                     uncertain;
         int            i,
@@ -2721,52 +2685,16 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
          */
         clamped_nvalues2 = Min(sslot2->nvalues, nd2);

-        fmgr_info(opfuncoid, &eqproc);
-
-        /*
-         * Save a few cycles by setting up the fcinfo struct just once. Using
-         * FunctionCallInvoke directly also avoids failure if the eqproc
-         * returns NULL, though really equality functions should never do
-         * that.
-         */
-        InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
-                                 NULL, NULL);
-        fcinfo->args[0].isnull = false;
-        fcinfo->args[1].isnull = false;
-
+        /* Construct the match arrays */
         hasmatch1 = (bool *) palloc0(sslot1->nvalues * sizeof(bool));
         hasmatch2 = (bool *) palloc0(clamped_nvalues2 * sizeof(bool));

-        /*
-         * Note we assume that each MCV will match at most one member of the
-         * other MCV list.  If the operator isn't really equality, there could
-         * be multiple matches --- but we don't look for them, both for speed
-         * and because the math wouldn't add up...
-         */
-        nmatches = 0;
-        for (i = 0; i < sslot1->nvalues; i++)
-        {
-            int            j;
-
-            fcinfo->args[0].value = sslot1->values[i];
-
-            for (j = 0; j < clamped_nvalues2; j++)
-            {
-                Datum        fresult;
+        eqjoinsel_find_matches(opfuncoid, collation,
+                               sslot1, sslot2,
+                               sslot1->nvalues, clamped_nvalues2,
+                               hasmatch1, hasmatch2,
+                               &nmatches, &matchprodfreq);

-                if (hasmatch2[j])
-                    continue;
-                fcinfo->args[1].value = sslot2->values[j];
-                fcinfo->isnull = false;
-                fresult = FunctionCallInvoke(fcinfo);
-                if (!fcinfo->isnull && DatumGetBool(fresult))
-                {
-                    hasmatch1[i] = hasmatch2[j] = true;
-                    nmatches++;
-                    break;
-                }
-            }
-        }
         /* Sum up frequencies of matched MCVs */
         matchfreq1 = 0.0;
         for (i = 0; i < sslot1->nvalues; i++)
@@ -2830,6 +2758,85 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
     return selec;
 }

+/*
+ * Identify matching MCVs for eqjoinsel_inner or eqjoinsel_semi.
+ *
+ * Inputs:
+ *    opfuncoid: OID of equality function to use (might be cross-type)
+ *    collation: OID of collation to use
+ *    sslot1, sslot2: MCV values for the lefthand and righthand inputs
+ *    nvalues1, nvalues2: number of values to be considered (can be less than
+ *        sslotN->nvalues, but not more)
+ * Outputs:
+ *    hasmatch1[], hasmatch2[]: pre-zeroed arrays of lengths nvalues1, nvalues2;
+ *        entries are set to true if that MCV has a match on the other side
+ *    *p_nmatches: receives number of MCV pairs that match
+ *    *p_matchprodfreq: receives sum(sslot1->numbers[i] * sslot2->numbers[j])
+ *        for matching MCVs
+ *
+ * Note we assume that each MCV will match at most one member of the other
+ * MCV list.  If the operator isn't really equality, there could be multiple
+ * matches --- but we don't look for them, both for speed and because the
+ * math wouldn't add up...
+ */
+static void
+eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+                       AttStatsSlot *sslot1, AttStatsSlot *sslot2,
+                       int nvalues1, int nvalues2,
+                       bool *hasmatch1, bool *hasmatch2,
+                       int *p_nmatches, double *p_matchprodfreq)
+{
+    LOCAL_FCINFO(fcinfo, 2);
+    FmgrInfo    eqproc;
+    double        matchprodfreq = 0.0;
+    int            nmatches = 0;
+
+    fmgr_info(opfuncoid, &eqproc);
+
+    /*
+     * Save a few cycles by setting up the fcinfo struct just once.  Using
+     * FunctionCallInvoke directly also avoids failure if the eqproc returns
+     * NULL, though really equality functions should never do that.
+     */
+    InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
+                             NULL, NULL);
+    fcinfo->args[0].isnull = false;
+    fcinfo->args[1].isnull = false;
+
+    /*
+     * The reason for this extra level of braces will become apparent later.
+     * For now, it just prevents having to re-indent this chunk of code moved
+     * from eqjoinsel_inner.
+     */
+    {
+        for (int i = 0; i < nvalues1; i++)
+        {
+            fcinfo->args[0].value = sslot1->values[i];
+
+            for (int j = 0; j < nvalues2; j++)
+            {
+                Datum        fresult;
+
+                if (hasmatch2[j])
+                    continue;
+                fcinfo->args[1].value = sslot2->values[j];
+                fcinfo->isnull = false;
+                fresult = FunctionCallInvoke(fcinfo);
+                if (!fcinfo->isnull && DatumGetBool(fresult))
+                {
+                    hasmatch1[i] = hasmatch2[j] = true;
+                    matchprodfreq += sslot1->numbers[i] * sslot2->numbers[j];
+                    nmatches++;
+                    break;
+                }
+            }
+        }
+    }
+
+    *p_nmatches = nmatches;
+    *p_matchprodfreq = matchprodfreq;
+}
+
 /*
  *        neqjoinsel        - Join selectivity of "!="
  */
--
2.43.7

From 60093f713fdecd329bc394cc0f864cec16cd8e79 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon, 17 Nov 2025 12:09:24 -0500
Subject: [PATCH v6 3/5] Rethink eqjoinsel's handling of reversed joins.

Formerly, if we needed to deal with a "reversed" join where the
outer-side variable is on the right hand of the given operator,
we looked up the operator's commutator and applied that, so that
eqjoinsel_semi could always treat "sslot1" as the outer-side
variable of the semijoin.

This isn't great, because we ended up punting to a poor estimate
if no commutator is recorded.  It also doesn't play well with
later changes in this patch series.  Instead, let's handle the
case by swapping the left and right input values just before
we call the comparison operator.  While this theoretically adds
cycles to the inner comparison loop, with the coding proposed
here I don't see any real timing difference.  (But I only tested
it on x86_64.)
---
 src/backend/utils/adt/selfuncs.c | 44 +++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 590b3a0c078..53653d2d05b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -156,6 +156,7 @@ static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
                               Form_pg_statistic stats1, Form_pg_statistic stats2,
                               bool have_mcvs1, bool have_mcvs2);
 static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
+                             bool op_is_reversed,
                              VariableStatData *vardata1, VariableStatData *vardata2,
                              double nd1, double nd2,
                              bool isdefault1, bool isdefault2,
@@ -164,6 +165,7 @@ static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
                              bool have_mcvs1, bool have_mcvs2,
                              RelOptInfo *inner_rel);
 static void eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+                                   bool op_is_reversed,
                                    AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                                    int nvalues1, int nvalues2,
                                    bool *hasmatch1, bool *hasmatch2,
@@ -2394,6 +2396,7 @@ eqjoinsel(PG_FUNCTION_ARGS)

             if (!join_is_reversed)
                 selec = eqjoinsel_semi(opfuncoid, collation,
+                                       false,
                                        &vardata1, &vardata2,
                                        nd1, nd2,
                                        isdefault1, isdefault2,
@@ -2402,11 +2405,8 @@ eqjoinsel(PG_FUNCTION_ARGS)
                                        have_mcvs1, have_mcvs2,
                                        inner_rel);
             else
-            {
-                Oid            commop = get_commutator(operator);
-                Oid            commopfuncoid = OidIsValid(commop) ? get_opcode(commop) : InvalidOid;
-
-                selec = eqjoinsel_semi(commopfuncoid, collation,
+                selec = eqjoinsel_semi(opfuncoid, collation,
+                                       true,
                                        &vardata2, &vardata1,
                                        nd2, nd1,
                                        isdefault2, isdefault1,
@@ -2414,7 +2414,6 @@ eqjoinsel(PG_FUNCTION_ARGS)
                                        stats2, stats1,
                                        have_mcvs2, have_mcvs1,
                                        inner_rel);
-            }

             /*
              * We should never estimate the output of a semijoin to be more
@@ -2499,6 +2498,7 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
         hasmatch2 = (bool *) palloc0(sslot2->nvalues * sizeof(bool));

         eqjoinsel_find_matches(opfuncoid, collation,
+                               false,
                                sslot1, sslot2,
                                sslot1->nvalues, sslot2->nvalues,
                                hasmatch1, hasmatch2,
@@ -2607,11 +2607,13 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
  * eqjoinsel_semi --- eqjoinsel for semi join
  *
  * (Also used for anti join, which we are supposed to estimate the same way.)
- * Caller has ensured that vardata1 is the LHS variable.
- * Unlike eqjoinsel_inner, we have to cope with opfuncoid being InvalidOid.
+ * Caller has ensured that vardata1 is the LHS variable; however, opfuncoid
+ * is for the original join operator, which might now need to have the inputs
+ * swapped in order to apply correctly.
  */
 static double
 eqjoinsel_semi(Oid opfuncoid, Oid collation,
+               bool op_is_reversed,
                VariableStatData *vardata1, VariableStatData *vardata2,
                double nd1, double nd2,
                bool isdefault1, bool isdefault2,
@@ -2655,7 +2657,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
         isdefault2 = false;
     }

-    if (have_mcvs1 && have_mcvs2 && OidIsValid(opfuncoid))
+    if (have_mcvs1 && have_mcvs2)
     {
         /*
          * We have most-common-value lists for both relations.  Run through
@@ -2690,6 +2692,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
         hasmatch2 = (bool *) palloc0(clamped_nvalues2 * sizeof(bool));

         eqjoinsel_find_matches(opfuncoid, collation,
+                               op_is_reversed,
                                sslot1, sslot2,
                                sslot1->nvalues, clamped_nvalues2,
                                hasmatch1, hasmatch2,
@@ -2762,8 +2765,9 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
  * Identify matching MCVs for eqjoinsel_inner or eqjoinsel_semi.
  *
  * Inputs:
- *    opfuncoid: OID of equality function to use (might be cross-type)
+ *    opfuncoid: OID of equality function to use (might be reversed)
  *    collation: OID of collation to use
+ *    op_is_reversed: indicates that opfuncoid compares right type to left type
  *    sslot1, sslot2: MCV values for the lefthand and righthand inputs
  *    nvalues1, nvalues2: number of values to be considered (can be less than
  *        sslotN->nvalues, but not more)
@@ -2781,6 +2785,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
  */
 static void
 eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+                       bool op_is_reversed,
                        AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                        int nvalues1, int nvalues2,
                        bool *hasmatch1, bool *hasmatch2,
@@ -2809,9 +2814,24 @@ eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
      * from eqjoinsel_inner.
      */
     {
+        int            index1,
+                    index2;
+
+        /* Set up to supply the values in the order the operator expects */
+        if (op_is_reversed)
+        {
+            index1 = 1;
+            index2 = 0;
+        }
+        else
+        {
+            index1 = 0;
+            index2 = 1;
+        }
+
         for (int i = 0; i < nvalues1; i++)
         {
-            fcinfo->args[0].value = sslot1->values[i];
+            fcinfo->args[index1].value = sslot1->values[i];

             for (int j = 0; j < nvalues2; j++)
             {
@@ -2819,7 +2839,7 @@ eqjoinsel_find_matches(Oid opfuncoid, Oid collation,

                 if (hasmatch2[j])
                     continue;
-                fcinfo->args[1].value = sslot2->values[j];
+                fcinfo->args[index2].value = sslot2->values[j];
                 fcinfo->isnull = false;
                 fresult = FunctionCallInvoke(fcinfo);
                 if (!fcinfo->isnull && DatumGetBool(fresult))
--
2.43.7

From 8a0544353014d60979bca6a83c909bb680c5b043 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon, 17 Nov 2025 12:45:35 -0500
Subject: [PATCH v6 4/5] Share more work between eqjoinsel_inner and
 eqjoinsel_semi.

Originally, only one of eqjoinsel_inner and eqjoinsel_semi was
invoked per eqjoinsel call, so the fact that they duplicated a
good deal of work was irrelevant to performance.  But since commit
a314c3407, the semi/antijoin case calls both, and that is really
expensive if there are a lot of MCVs to match.  Refactor so that
we can re-use eqjoinsel_inner's matching results except in the
(uncommon) case where eqjoinsel_semi clamps the RHS MCV list size
because it's less than the expected number of rows to be fetched
from the RHS rel.  This doesn't seem to create any performance
penalty for non-semijoin cases.

While at it, we can avoid doing fmgr_info twice too.
I considered also avoiding duplicate InitFunctionCallInfoData
calls, but desisted: that wouldn't save very much, and in my
tests it looks like there may be some performance advantage
if fcinfo is a local variable.
---
 src/backend/utils/adt/selfuncs.c | 128 ++++++++++++++++++++-----------
 1 file changed, 84 insertions(+), 44 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53653d2d05b..4335daf9a80 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -148,14 +148,16 @@ get_relation_stats_hook_type get_relation_stats_hook = NULL;
 get_index_stats_hook_type get_index_stats_hook = NULL;

 static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
-static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
+static double eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
                               VariableStatData *vardata1, VariableStatData *vardata2,
                               double nd1, double nd2,
                               bool isdefault1, bool isdefault2,
                               AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                               Form_pg_statistic stats1, Form_pg_statistic stats2,
-                              bool have_mcvs1, bool have_mcvs2);
-static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
+                              bool have_mcvs1, bool have_mcvs2,
+                              bool *hasmatch1, bool *hasmatch2,
+                              int *p_nmatches);
+static double eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
                              bool op_is_reversed,
                              VariableStatData *vardata1, VariableStatData *vardata2,
                              double nd1, double nd2,
@@ -163,8 +165,10 @@ static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
                              AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                              Form_pg_statistic stats1, Form_pg_statistic stats2,
                              bool have_mcvs1, bool have_mcvs2,
+                             bool *hasmatch1, bool *hasmatch2,
+                             int *p_nmatches,
                              RelOptInfo *inner_rel);
-static void eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+static void eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
                                    bool op_is_reversed,
                                    AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                                    int nvalues1, int nvalues2,
@@ -2311,12 +2315,16 @@ eqjoinsel(PG_FUNCTION_ARGS)
     bool        isdefault1;
     bool        isdefault2;
     Oid            opfuncoid;
+    FmgrInfo    eqproc;
     AttStatsSlot sslot1;
     AttStatsSlot sslot2;
     Form_pg_statistic stats1 = NULL;
     Form_pg_statistic stats2 = NULL;
     bool        have_mcvs1 = false;
     bool        have_mcvs2 = false;
+    bool       *hasmatch1 = NULL;
+    bool       *hasmatch2 = NULL;
+    int            nmatches = 0;
     bool        get_mcv_stats;
     bool        join_is_reversed;
     RelOptInfo *inner_rel;
@@ -2367,14 +2375,26 @@ eqjoinsel(PG_FUNCTION_ARGS)
                                           ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS);
     }

+    /* Prepare info usable by both eqjoinsel_inner and eqjoinsel_semi */
+    if (have_mcvs1 && have_mcvs2)
+    {
+        fmgr_info(opfuncoid, &eqproc);
+        hasmatch1 = (bool *) palloc0(sslot1.nvalues * sizeof(bool));
+        hasmatch2 = (bool *) palloc0(sslot2.nvalues * sizeof(bool));
+    }
+    else
+        memset(&eqproc, 0, sizeof(eqproc)); /* silence uninit-var warnings */
+
     /* We need to compute the inner-join selectivity in all cases */
-    selec_inner = eqjoinsel_inner(opfuncoid, collation,
+    selec_inner = eqjoinsel_inner(&eqproc, collation,
                                   &vardata1, &vardata2,
                                   nd1, nd2,
                                   isdefault1, isdefault2,
                                   &sslot1, &sslot2,
                                   stats1, stats2,
-                                  have_mcvs1, have_mcvs2);
+                                  have_mcvs1, have_mcvs2,
+                                  hasmatch1, hasmatch2,
+                                  &nmatches);

     switch (sjinfo->jointype)
     {
@@ -2395,7 +2415,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
             inner_rel = find_join_input_rel(root, sjinfo->min_righthand);

             if (!join_is_reversed)
-                selec = eqjoinsel_semi(opfuncoid, collation,
+                selec = eqjoinsel_semi(&eqproc, collation,
                                        false,
                                        &vardata1, &vardata2,
                                        nd1, nd2,
@@ -2403,9 +2423,11 @@ eqjoinsel(PG_FUNCTION_ARGS)
                                        &sslot1, &sslot2,
                                        stats1, stats2,
                                        have_mcvs1, have_mcvs2,
+                                       hasmatch1, hasmatch2,
+                                       &nmatches,
                                        inner_rel);
             else
-                selec = eqjoinsel_semi(opfuncoid, collation,
+                selec = eqjoinsel_semi(&eqproc, collation,
                                        true,
                                        &vardata2, &vardata1,
                                        nd2, nd1,
@@ -2413,6 +2435,8 @@ eqjoinsel(PG_FUNCTION_ARGS)
                                        &sslot2, &sslot1,
                                        stats2, stats1,
                                        have_mcvs2, have_mcvs1,
+                                       hasmatch2, hasmatch1,
+                                       &nmatches,
                                        inner_rel);

             /*
@@ -2441,6 +2465,11 @@ eqjoinsel(PG_FUNCTION_ARGS)
     ReleaseVariableStats(vardata1);
     ReleaseVariableStats(vardata2);

+    if (hasmatch1)
+        pfree(hasmatch1);
+    if (hasmatch2)
+        pfree(hasmatch2);
+
     CLAMP_PROBABILITY(selec);

     PG_RETURN_FLOAT8((float8) selec);
@@ -2449,17 +2478,23 @@ eqjoinsel(PG_FUNCTION_ARGS)
 /*
  * eqjoinsel_inner --- eqjoinsel for normal inner join
  *
+ * In addition to computing the selectivity estimate, this will fill
+ * hasmatch1[], hasmatch2[], and *p_nmatches (if have_mcvs1 && have_mcvs2).
+ * We may be able to re-use that data in eqjoinsel_semi.
+ *
  * We also use this for LEFT/FULL outer joins; it's not presently clear
  * that it's worth trying to distinguish them here.
  */
 static double
-eqjoinsel_inner(Oid opfuncoid, Oid collation,
+eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
                 VariableStatData *vardata1, VariableStatData *vardata2,
                 double nd1, double nd2,
                 bool isdefault1, bool isdefault2,
                 AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                 Form_pg_statistic stats1, Form_pg_statistic stats2,
-                bool have_mcvs1, bool have_mcvs2)
+                bool have_mcvs1, bool have_mcvs2,
+                bool *hasmatch1, bool *hasmatch2,
+                int *p_nmatches)
 {
     double        selec;

@@ -2477,8 +2512,6 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
          * results", Technical Report 1018, Computer Science Dept., University
          * of Wisconsin, Madison, March 1991 (available from ftp.cs.wisc.edu).
          */
-        bool       *hasmatch1;
-        bool       *hasmatch2;
         double        nullfrac1 = stats1->stanullfrac;
         double        nullfrac2 = stats2->stanullfrac;
         double        matchprodfreq,
@@ -2493,16 +2526,14 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
         int            i,
                     nmatches;

-        /* Construct the match arrays */
-        hasmatch1 = (bool *) palloc0(sslot1->nvalues * sizeof(bool));
-        hasmatch2 = (bool *) palloc0(sslot2->nvalues * sizeof(bool));
-
-        eqjoinsel_find_matches(opfuncoid, collation,
+        /* Fill the match arrays */
+        eqjoinsel_find_matches(eqproc, collation,
                                false,
                                sslot1, sslot2,
                                sslot1->nvalues, sslot2->nvalues,
                                hasmatch1, hasmatch2,
-                               &nmatches, &matchprodfreq);
+                               p_nmatches, &matchprodfreq);
+        nmatches = *p_nmatches;
         CLAMP_PROBABILITY(matchprodfreq);

         /* Sum up frequencies of matched and unmatched MCVs */
@@ -2526,8 +2557,6 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
         }
         CLAMP_PROBABILITY(matchfreq2);
         CLAMP_PROBABILITY(unmatchfreq2);
-        pfree(hasmatch1);
-        pfree(hasmatch2);

         /*
          * Compute total frequency of non-null values that are not in the MCV
@@ -2607,12 +2636,14 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
  * eqjoinsel_semi --- eqjoinsel for semi join
  *
  * (Also used for anti join, which we are supposed to estimate the same way.)
- * Caller has ensured that vardata1 is the LHS variable; however, opfuncoid
+ * Caller has ensured that vardata1 is the LHS variable; however, eqproc
  * is for the original join operator, which might now need to have the inputs
- * swapped in order to apply correctly.
+ * swapped in order to apply correctly.  Also, if have_mcvs1 && have_mcvs2
+ * then hasmatch1[], hasmatch2[], and *p_nmatches were filled by
+ * eqjoinsel_inner.
  */
 static double
-eqjoinsel_semi(Oid opfuncoid, Oid collation,
+eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
                bool op_is_reversed,
                VariableStatData *vardata1, VariableStatData *vardata2,
                double nd1, double nd2,
@@ -2620,6 +2651,8 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
                AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                Form_pg_statistic stats1, Form_pg_statistic stats2,
                bool have_mcvs1, bool have_mcvs2,
+               bool *hasmatch1, bool *hasmatch2,
+               int *p_nmatches,
                RelOptInfo *inner_rel)
 {
     double        selec;
@@ -2667,8 +2700,6 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
          * lists.  We still have to estimate for the remaining population, but
          * in a skewed distribution this gives us a big leg up in accuracy.
          */
-        bool       *hasmatch1;
-        bool       *hasmatch2;
         double        nullfrac1 = stats1->stanullfrac;
         double        matchprodfreq,
                     matchfreq1,
@@ -2687,16 +2718,30 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
          */
         clamped_nvalues2 = Min(sslot2->nvalues, nd2);

-        /* Construct the match arrays */
-        hasmatch1 = (bool *) palloc0(sslot1->nvalues * sizeof(bool));
-        hasmatch2 = (bool *) palloc0(clamped_nvalues2 * sizeof(bool));
-
-        eqjoinsel_find_matches(opfuncoid, collation,
-                               op_is_reversed,
-                               sslot1, sslot2,
-                               sslot1->nvalues, clamped_nvalues2,
-                               hasmatch1, hasmatch2,
-                               &nmatches, &matchprodfreq);
+        /*
+         * If we did not set clamped_nvalues2 to less than sslot2->nvalues,
+         * then the hasmatch1[] and hasmatch2[] match flags computed by
+         * eqjoinsel_inner are still perfectly applicable, so we need not
+         * re-do the matching work.  Note that it does not matter if
+         * op_is_reversed: we'd get the same answers.
+         *
+         * If we did clamp, then a different set of sslot2 values is to be
+         * compared, so we have to re-do the matching.
+         */
+        if (clamped_nvalues2 != sslot2->nvalues)
+        {
+            /* Must re-zero the arrays */
+            memset(hasmatch1, 0, sslot1->nvalues * sizeof(bool));
+            memset(hasmatch2, 0, clamped_nvalues2 * sizeof(bool));
+            /* Re-fill the match arrays */
+            eqjoinsel_find_matches(eqproc, collation,
+                                   op_is_reversed,
+                                   sslot1, sslot2,
+                                   sslot1->nvalues, clamped_nvalues2,
+                                   hasmatch1, hasmatch2,
+                                   p_nmatches, &matchprodfreq);
+        }
+        nmatches = *p_nmatches;

         /* Sum up frequencies of matched MCVs */
         matchfreq1 = 0.0;
@@ -2706,8 +2751,6 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
                 matchfreq1 += sslot1->numbers[i];
         }
         CLAMP_PROBABILITY(matchfreq1);
-        pfree(hasmatch1);
-        pfree(hasmatch2);

         /*
          * Now we need to estimate the fraction of relation 1 that has at
@@ -2765,9 +2808,9 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
  * Identify matching MCVs for eqjoinsel_inner or eqjoinsel_semi.
  *
  * Inputs:
- *    opfuncoid: OID of equality function to use (might be reversed)
+ *    eqproc: FmgrInfo for equality function to use (might be reversed)
  *    collation: OID of collation to use
- *    op_is_reversed: indicates that opfuncoid compares right type to left type
+ *    op_is_reversed: indicates that eqproc compares right type to left type
  *    sslot1, sslot2: MCV values for the lefthand and righthand inputs
  *    nvalues1, nvalues2: number of values to be considered (can be less than
  *        sslotN->nvalues, but not more)
@@ -2784,7 +2827,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
  * math wouldn't add up...
  */
 static void
-eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
                        bool op_is_reversed,
                        AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                        int nvalues1, int nvalues2,
@@ -2792,18 +2835,15 @@ eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
                        int *p_nmatches, double *p_matchprodfreq)
 {
     LOCAL_FCINFO(fcinfo, 2);
-    FmgrInfo    eqproc;
     double        matchprodfreq = 0.0;
     int            nmatches = 0;

-    fmgr_info(opfuncoid, &eqproc);
-
     /*
      * Save a few cycles by setting up the fcinfo struct just once.  Using
      * FunctionCallInvoke directly also avoids failure if the eqproc returns
      * NULL, though really equality functions should never do that.
      */
-    InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
+    InitFunctionCallInfoData(*fcinfo, eqproc, 2, collation,
                              NULL, NULL);
     fcinfo->args[0].isnull = false;
     fcinfo->args[1].isnull = false;
--
2.43.7

From 28a9e2dd123aee5e3c1217fb90bf5b51fdb5d427 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon, 17 Nov 2025 13:12:18 -0500
Subject: [PATCH v6 5/5] Use hashing to avoid O(N^2) matching work in
 eqjoinsel.

Use a simplehash hash table if there are enough MCVs and the
join operator has associated hash functions.  The threshold
for switching to hash mode perhaps could use more research.
---
 src/backend/utils/adt/selfuncs.c | 253 ++++++++++++++++++++++++++++++-
 src/tools/pgindent/typedefs.list |   3 +
 2 files changed, 251 insertions(+), 5 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 4335daf9a80..3354fea6a76 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -143,12 +143,47 @@

 #define DEFAULT_PAGE_CPU_MULTIPLIER 50.0

+/*
+ * In production builds, switch to hash-based MCV matching when lists are
+ * large enough to amortize hash setup cost.  In debug builds, we use a
+ * smaller threshold so that the regression tests cover both paths well.
+ */
+#ifndef USE_ASSERT_CHECKING
+#define EQJOINSEL_MCV_HASH_THRESHOLD 100
+#else
+#define EQJOINSEL_MCV_HASH_THRESHOLD 10
+#endif
+
+/* Entries in the simplehash hash table used by eqjoinsel_find_matches */
+typedef struct MCVHashEntry
+{
+    Datum        value;            /* the value represented by this entry */
+    int            index;            /* its index in the relevant AttStatsSlot */
+    uint32        hash;            /* hash code for the Datum */
+    char        status;            /* status code used by simplehash.h */
+} MCVHashEntry;
+
+/* private_data for the simplehash hash table */
+typedef struct MCVHashContext
+{
+    FunctionCallInfo equal_fcinfo;    /* the equality join operator */
+    FunctionCallInfo hash_fcinfo;    /* the hash function to use */
+    bool        op_is_reversed; /* equality compares hash type to probe type */
+    bool        insert_mode;    /* doing inserts or lookups? */
+    bool        hash_typbyval;    /* typbyval of hashed data type */
+    int16        hash_typlen;    /* typlen of hashed data type */
+} MCVHashContext;
+
+/* forward reference */
+typedef struct MCVHashTable_hash MCVHashTable_hash;
+
 /* Hooks for plugins to get control when we ask for stats */
 get_relation_stats_hook_type get_relation_stats_hook = NULL;
 get_index_stats_hook_type get_index_stats_hook = NULL;

 static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
 static double eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
+                              Oid hashLeft, Oid hashRight,
                               VariableStatData *vardata1, VariableStatData *vardata2,
                               double nd1, double nd2,
                               bool isdefault1, bool isdefault2,
@@ -158,6 +193,7 @@ static double eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
                               bool *hasmatch1, bool *hasmatch2,
                               int *p_nmatches);
 static double eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
+                             Oid hashLeft, Oid hashRight,
                              bool op_is_reversed,
                              VariableStatData *vardata1, VariableStatData *vardata2,
                              double nd1, double nd2,
@@ -169,11 +205,14 @@ static double eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
                              int *p_nmatches,
                              RelOptInfo *inner_rel);
 static void eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
+                                   Oid hashLeft, Oid hashRight,
                                    bool op_is_reversed,
                                    AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                                    int nvalues1, int nvalues2,
                                    bool *hasmatch1, bool *hasmatch2,
                                    int *p_nmatches, double *p_matchprodfreq);
+static uint32 hash_mcv(MCVHashTable_hash *tab, Datum key);
+static bool mcvs_equal(MCVHashTable_hash *tab, Datum key0, Datum key1);
 static bool estimate_multivariate_ndistinct(PlannerInfo *root,
                                             RelOptInfo *rel, List **varinfos, double *ndistinct);
 static bool convert_to_scalar(Datum value, Oid valuetypid, Oid collid,
@@ -229,6 +268,20 @@ static RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
 static double btcost_correlation(IndexOptInfo *index,
                                  VariableStatData *vardata);

+/* Define support routines for MCV hash tables */
+#define SH_PREFIX                MCVHashTable
+#define SH_ELEMENT_TYPE            MCVHashEntry
+#define SH_KEY_TYPE                Datum
+#define SH_KEY                    value
+#define SH_HASH_KEY(tab,key)    hash_mcv(tab, key)
+#define SH_EQUAL(tab,key0,key1)    mcvs_equal(tab, key0, key1)
+#define SH_SCOPE                static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tab,ent)    (ent)->hash
+#define SH_DEFINE
+#define SH_DECLARE
+#include "lib/simplehash.h"
+

 /*
  *        eqsel            - Selectivity of "=" for any data types.
@@ -2316,6 +2369,8 @@ eqjoinsel(PG_FUNCTION_ARGS)
     bool        isdefault2;
     Oid            opfuncoid;
     FmgrInfo    eqproc;
+    Oid            hashLeft = InvalidOid;
+    Oid            hashRight = InvalidOid;
     AttStatsSlot sslot1;
     AttStatsSlot sslot2;
     Form_pg_statistic stats1 = NULL;
@@ -2381,12 +2436,20 @@ eqjoinsel(PG_FUNCTION_ARGS)
         fmgr_info(opfuncoid, &eqproc);
         hasmatch1 = (bool *) palloc0(sslot1.nvalues * sizeof(bool));
         hasmatch2 = (bool *) palloc0(sslot2.nvalues * sizeof(bool));
+
+        /*
+         * If the MCV lists are long enough to justify hashing, try to look up
+         * hash functions for the join operator.  XXX should this be Max()?
+         */
+        if (Min(sslot1.nvalues, sslot2.nvalues) >= EQJOINSEL_MCV_HASH_THRESHOLD)
+            (void) get_op_hash_functions(operator, &hashLeft, &hashRight);
     }
     else
         memset(&eqproc, 0, sizeof(eqproc)); /* silence uninit-var warnings */

     /* We need to compute the inner-join selectivity in all cases */
     selec_inner = eqjoinsel_inner(&eqproc, collation,
+                                  hashLeft, hashRight,
                                   &vardata1, &vardata2,
                                   nd1, nd2,
                                   isdefault1, isdefault2,
@@ -2416,6 +2479,7 @@ eqjoinsel(PG_FUNCTION_ARGS)

             if (!join_is_reversed)
                 selec = eqjoinsel_semi(&eqproc, collation,
+                                       hashLeft, hashRight,
                                        false,
                                        &vardata1, &vardata2,
                                        nd1, nd2,
@@ -2428,6 +2492,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
                                        inner_rel);
             else
                 selec = eqjoinsel_semi(&eqproc, collation,
+                                       hashLeft, hashRight,
                                        true,
                                        &vardata2, &vardata1,
                                        nd2, nd1,
@@ -2487,6 +2552,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
  */
 static double
 eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
+                Oid hashLeft, Oid hashRight,
                 VariableStatData *vardata1, VariableStatData *vardata2,
                 double nd1, double nd2,
                 bool isdefault1, bool isdefault2,
@@ -2528,6 +2594,7 @@ eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,

         /* Fill the match arrays */
         eqjoinsel_find_matches(eqproc, collation,
+                               hashLeft, hashRight,
                                false,
                                sslot1, sslot2,
                                sslot1->nvalues, sslot2->nvalues,
@@ -2644,6 +2711,7 @@ eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
  */
 static double
 eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
+               Oid hashLeft, Oid hashRight,
                bool op_is_reversed,
                VariableStatData *vardata1, VariableStatData *vardata2,
                double nd1, double nd2,
@@ -2735,6 +2803,7 @@ eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
             memset(hasmatch2, 0, clamped_nvalues2 * sizeof(bool));
             /* Re-fill the match arrays */
             eqjoinsel_find_matches(eqproc, collation,
+                                   hashLeft, hashRight,
                                    op_is_reversed,
                                    sslot1, sslot2,
                                    sslot1->nvalues, clamped_nvalues2,
@@ -2810,6 +2879,8 @@ eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
  * Inputs:
  *    eqproc: FmgrInfo for equality function to use (might be reversed)
  *    collation: OID of collation to use
+ *    hashLeft, hashRight: OIDs of hash functions associated with equality op,
+ *        or InvalidOid if we're not to use hashing
  *    op_is_reversed: indicates that eqproc compares right type to left type
  *    sslot1, sslot2: MCV values for the lefthand and righthand inputs
  *    nvalues1, nvalues2: number of values to be considered (can be less than
@@ -2821,6 +2892,9 @@ eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
  *    *p_matchprodfreq: receives sum(sslot1->numbers[i] * sslot2->numbers[j])
  *        for matching MCVs
  *
+ * Note that hashLeft is for the eqproc's left-hand input type, hashRight
+ * for its right, regardless of op_is_reversed.
+ *
  * Note we assume that each MCV will match at most one member of the other
  * MCV list.  If the operator isn't really equality, there could be multiple
  * matches --- but we don't look for them, both for speed and because the
@@ -2828,6 +2902,7 @@ eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
  */
 static void
 eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
+                       Oid hashLeft, Oid hashRight,
                        bool op_is_reversed,
                        AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                        int nvalues1, int nvalues2,
@@ -2848,12 +2923,119 @@ eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
     fcinfo->args[0].isnull = false;
     fcinfo->args[1].isnull = false;

-    /*
-     * The reason for this extra level of braces will become apparent later.
-     * For now, it just prevents having to re-indent this chunk of code moved
-     * from eqjoinsel_inner.
-     */
+    if (OidIsValid(hashLeft) && OidIsValid(hashRight))
+    {
+        /* Use a hash table to speed up the matching */
+        LOCAL_FCINFO(hash_fcinfo, 1);
+        FmgrInfo    hash_proc;
+        MCVHashContext hashContext;
+        MCVHashTable_hash *hashTable;
+        AttStatsSlot *statsProbe;
+        AttStatsSlot *statsHash;
+        bool       *hasMatchProbe;
+        bool       *hasMatchHash;
+        int            nvaluesProbe;
+        int            nvaluesHash;
+
+        /* Make sure we build the hash table on the smaller array. */
+        if (sslot1->nvalues >= sslot2->nvalues)
+        {
+            statsProbe = sslot1;
+            statsHash = sslot2;
+            hasMatchProbe = hasmatch1;
+            hasMatchHash = hasmatch2;
+            nvaluesProbe = nvalues1;
+            nvaluesHash = nvalues2;
+        }
+        else
+        {
+            /* We'll have to reverse the direction of use of the operator. */
+            op_is_reversed = !op_is_reversed;
+            statsProbe = sslot2;
+            statsHash = sslot1;
+            hasMatchProbe = hasmatch2;
+            hasMatchHash = hasmatch1;
+            nvaluesProbe = nvalues2;
+            nvaluesHash = nvalues1;
+        }
+
+        /*
+         * Build the hash table on the smaller array, using the appropriate
+         * hash function for its data type.
+         */
+        fmgr_info(op_is_reversed ? hashLeft : hashRight, &hash_proc);
+        InitFunctionCallInfoData(*hash_fcinfo, &hash_proc, 1, collation,
+                                 NULL, NULL);
+        hash_fcinfo->args[0].isnull = false;
+
+        hashContext.equal_fcinfo = fcinfo;
+        hashContext.hash_fcinfo = hash_fcinfo;
+        hashContext.op_is_reversed = op_is_reversed;
+        hashContext.insert_mode = true;
+        get_typlenbyval(statsHash->valuetype,
+                        &hashContext.hash_typlen,
+                        &hashContext.hash_typbyval);
+
+        hashTable = MCVHashTable_create(CurrentMemoryContext,
+                                        nvaluesHash,
+                                        &hashContext);
+
+        for (int i = 0; i < nvaluesHash; i++)
+        {
+            bool        found = false;
+            MCVHashEntry *entry = MCVHashTable_insert(hashTable,
+                                                      statsHash->values[i],
+                                                      &found);
+
+            /*
+             * We're not really expecting duplicates in the MCV list, but it
+             * seems possible that there could be entries that are equal
+             * according to the operator we're testing even though they are
+             * unequal according to the datatype's default opclass.  If we
+             * find a dup, just ignore it, leaving the hash entry's index
+             * pointing at the first occurrence.
+             */
+            if (!found)
+                entry->index = i;
+        }
+
+        /*
+         * Prepare to probe the hash table.  If the probe values are of a
+         * different data type, then we need to change hash functions.  (This
+         * code relies on the assumption that since we defined SH_STORE_HASH,
+         * simplehash.h will never need to compute hash values for existing
+         * hash table entries.)
+         */
+        hashContext.insert_mode = false;
+        if (hashLeft != hashRight)
+        {
+            fmgr_info(op_is_reversed ? hashRight : hashLeft, &hash_proc);
+            /* Resetting hash_fcinfo is probably unnecessary, but be safe */
+            InitFunctionCallInfoData(*hash_fcinfo, &hash_proc, 1, collation,
+                                     NULL, NULL);
+            hash_fcinfo->args[0].isnull = false;
+        }
+
+        /* Look up each probe value in turn. */
+        for (int i = 0; i < nvaluesProbe; i++)
+        {
+            MCVHashEntry *entry = MCVHashTable_lookup(hashTable,
+                                                      statsProbe->values[i]);
+
+            /* As in the other code path, skip already-matched hash entries */
+            if (entry != NULL && !hasMatchHash[entry->index])
+            {
+                hasMatchHash[entry->index] = hasMatchProbe[i] = true;
+                nmatches++;
+                matchprodfreq += statsHash->numbers[entry->index] * statsProbe->numbers[i];
+            }
+        }
+
+        MCVHashTable_destroy(hashTable);
+    }
+    else
     {
+        /* We're not to use hashing, so do it the O(N^2) way */
         int            index1,
                     index2;

@@ -2897,6 +3079,67 @@ eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
     *p_matchprodfreq = matchprodfreq;
 }

+/*
+ * Support functions for the hash tables used by eqjoinsel_find_matches
+ */
+static uint32
+hash_mcv(MCVHashTable_hash *tab, Datum key)
+{
+    MCVHashContext *context = (MCVHashContext *) tab->private_data;
+    FunctionCallInfo fcinfo = context->hash_fcinfo;
+    Datum        fresult;
+
+    fcinfo->args[0].value = key;
+    fcinfo->isnull = false;
+    fresult = FunctionCallInvoke(fcinfo);
+    Assert(!fcinfo->isnull);
+    return DatumGetUInt32(fresult);
+}
+
+static bool
+mcvs_equal(MCVHashTable_hash *tab, Datum key0, Datum key1)
+{
+    MCVHashContext *context = (MCVHashContext *) tab->private_data;
+
+    if (context->insert_mode)
+    {
+        /*
+         * During the insertion step, any comparisons will be between two
+         * Datums of the hash table's data type, so if the given operator is
+         * cross-type it will be the wrong thing to use.  Fortunately, we can
+         * use datum_image_eq instead.  The MCV values should all be distinct
+         * anyway, so it's mostly pro-forma to compare them at all.
+         */
+        return datum_image_eq(key0, key1,
+                              context->hash_typbyval, context->hash_typlen);
+    }
+    else
+    {
+        FunctionCallInfo fcinfo = context->equal_fcinfo;
+        Datum        fresult;
+
+        /*
+         * Apply the operator the correct way around.  Although simplehash.h
+         * doesn't document this explicitly, during lookups key0 is from the
+         * hash table while key1 is the probe value, so we should compare them
+         * in that order only if op_is_reversed.
+         */
+        if (context->op_is_reversed)
+        {
+            fcinfo->args[0].value = key0;
+            fcinfo->args[1].value = key1;
+        }
+        else
+        {
+            fcinfo->args[0].value = key1;
+            fcinfo->args[1].value = key0;
+        }
+        fcinfo->isnull = false;
+        fresult = FunctionCallInvoke(fcinfo);
+        return (!fcinfo->isnull && DatumGetBool(fresult));
+    }
+}
+
 /*
  *        neqjoinsel        - Join selectivity of "!="
  */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 23bce72ae64..57f2a9ccdc5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1649,6 +1649,9 @@ LtreeGistOptions
 LtreeSignature
 MAGIC
 MBuf
+MCVHashContext
+MCVHashEntry
+MCVHashTable_hash
 MCVItem
 MCVList
 MEMORY_BASIC_INFORMATION
--
2.43.7

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Use merge-based matching for MCVs in eqjoinsel

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения