Обсуждение: Use merge-based matching for MCVs in eqjoinsel
Hi hackers,
While analyzing planner performance on JOB with default_statistics_target = 1000, I noticed that a significant portion of planning time is spent inside the eqjoinsel() function. According to perf, in most JOB queries at default_statistics_target = 1000, eqjoinsel() is the most expensive function during planning, accounting for approximately 8% of total CPU time. At default_statistics_target = 10000, the planner spend up to 75% of its time inside eqjoinsel(), making it one of the primary bottlenecks.
Еhis overhead is caused by the O(N^2) nested-loop comparison of MCVs in var1 = var2 clauses.
I propose an optimization: when the column datatype supports ordering(i.e., has < and >), we can sort both MCV lists and apply mege-style algorithm to detect matches. This reduces runtime from O(N^2) to O(NlogN), where N is the number of MCV entries. The patch also applies the same optimization to semi-join clauses, which show similar performance behavior.
On JOB, this changes reduce planner time in most queries with complex joins and large MCVs with no observable effect on plan quality. I’ve also attached bar charts showing per-query planner time before and after the patch for default_statistics_target = 100, 1000, 10000 along with query numbers for reference.
Any feedback or suggestions are welcome!
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
Вложения
While analyzing planner performance on JOB with default_statistics_target = 1000, I noticed that a significant portion of planning time is spent inside the eqjoinsel() function. According to perf, in most JOB queries at default_statistics_target = 1000, eqjoinsel() is the most expensive function during planning, accounting for approximately 8% of total CPU time. At default_statistics_target = 10000, the planner spend up to 75% of its time inside eqjoinsel(), making it one of the primary bottlenecks.
This overhead is caused by the O(N^2) nested-loop comparison of MCVs in var1 = var2 clauses.
I propose an optimization: when the column datatype supports ordering(i.e., has < and >), we can sort both MCV lists and apply mege-style algorithm to detect matches. This reduces runtime from O(N^2) to O(NlogN), where N is the number of MCV entries. The patch also applies the same optimization to semi-join clauses, which show similar performance behavior.
Following up on my previous message about optimizing eqjoinsel() for Var1 = Var2 and semijoin clauses, I’d like to share more detailed performance results across different values of default_statistics_target on the JOB benchmark.
The performance improvement grows as the number of MCV entries increases (i.e., with higher default_statistics_target). The table below shows total planner time summed over all 113 queries in JOB for each setting of default_statistics_target, before and after applying patch:
Total planner time across all JOB queries
=========================================
default_statistics_target | Before Patch (ms) | After Patch (ms)
--------------------------+-------------------+------------------
100 | 1828.433 | 1820.556
1000 | 2194.282 | 1963.110
2500 | 4606.705 | 2140.126
5000 | 16661.581 | 2616.109
7500 | 35988.569 | 3061.161
10000 | 66616.620 | 3504.144
For default_statistics_target < 1000, the planning time remains the same before and after the patch. The optimization reduces planner time substantially - by up to 13X at default_statistics_target = 10000 - while the generated plans and selectivity calculations remain unchanged. To illustrate this, the table below shows the 10 slowest JOB queries (by planning time), along with their planning times before and after applying the patch.
Top 10 slowest queries at default_statistics_target = 10000
===========================================================
Query | Before Patch (ms) | After Patch (ms)
------+--------------------+-------------------
29c | 1939.282 | 111.219
29b | 1939.237 | 100.109
29a | 1931.870 | 100.224
31c | 1622.255 | 67.609
30c | 1602.156 | 70.942
28b | 1521.026 | 84.058
30b | 1519.972 | 68.777
30a | 1518.014 | 70.529
28a | 1514.908 | 86.662
31a | 1507.303 | 63.579
As shown, the total planner time for these top 10 queries drops substantially with the optimization.
I’ve identified and fixed two issues in the original v1 patch: In 'eqjoinsel_semi' the second MCV array was allocated with an incorrect size. And the initialization of FunctionCallInfoData was moved outside the comparator compare_mcv_items() to avoid repeated setup during sorting. I've attached the updated v2 patch with changes.
Any suggestions?
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
Вложения
Following up on my previous messages about optimizing eqjoinsel() and eqjoinsel_semi() for Var1 = Var2 clauses, I’d like to share detailed profiling results showing the effect of the patch on JOB for different values of default_statistics_target.
The first table shows the total planner time (summed over all 113 queries) before and after applying the patch, along with the speedup achieved:
default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms)
--------------------------+---------------------+---------------------+--------------------
100 | 1.00x | 1828.433 | 1820.556
1000 | 1.12x | 2194.282 | 1963.110
2500 | 2.15x | 4606.705 | 2140.126
5000 | 6.37x | 16661.581 | 2616.109
7500 | 11.76x | 35988.569 | 3061.161
10000 | 19.01x | 66616.620 | 3504.144
The second table shows the profiling of eqjoinsel() using perf, demonstrating that the function, which dominates planning at high statistics targets, becomes essentially negligible after the patch:
default_statistics_target | eqjoinsel() Before (perf) | eqjoinsel() After (perf)
--------------------------+---------------------------+--------------------------
100 | 0.01% | 0.04%
1000 | 6.23% | 0.06%
2500 | 35.45% | 0.23%
5000 | 66.14% | 0.53%
7500 | 72.70% | 0.97%
10000 | 75.42% | 1.25%
I’ve attached v3 of the patch. This version adds a check for NULL values when comparing MCV entries, ensuring correctness in edge cases.
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com
Вложения
Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes:
> I’ve attached v3 of the patch. This version adds a check for NULL values
> when comparing MCV entries, ensuring correctness in edge cases.
Um ... what edge cases would those be? We do not put NULL into
MCV arrays.
regards, tom lane
On 03.09.2025 23:26, Tom Lane wrote: > Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes: >> I’ve attached v3 of the patch. This version adds a check for NULL values >> when comparing MCV entries, ensuring correctness in edge cases. > Um ... what edge cases would those be? We do not put NULL into > MCV arrays. You're right - MCV arrays never contain NULLs. However, comparing two MCV values could theoretically return NULL, even though this is very unlikely. This check existed even before my changes, and similar checks are used in other selectivity-estimation functions in 'selfuncs.c'. ... fcinfo->isnull = false; fresult = FunctionCallInvoke(fcinfo); if (!fcinfo->isnull && DatumGetBool(fresult)) ... By "edge cases" I was referring to this situation; I probably did not choose the best wording. -- Best regards, Ilia Evdokimov, Tantor Labs LLC, https://tantorlabs.com
Hi Ilia! On 29.07.2025 16:07, Ilia Evdokimov wrote: > > On 21.07.2025 16:55, Ilia Evdokimov wrote: >> >> While analyzing planner performance on JOB with >> default_statistics_target = 1000, I noticed that a significant portion >> of planning time is spent inside the eqjoinsel() function. According >> to perf, in most JOB queries at default_statistics_target = 1000, >> eqjoinsel() is the most expensive function during planning, accounting >> for approximately 8% of total CPU time. At default_statistics_target = >> 10000, the planner spend up to 75% of its time inside eqjoinsel(), >> making it one of the primary bottlenecks. >> >> This overhead is caused by the O(N^2) nested-loop comparison of MCVs >> in var1 = var2 clauses. Thanks for working on this. I've wanted to submit a patch for the very same issue for a while. I've come across this issue multiple times in the field. >> I propose an optimization: when the column datatype supports >> ordering(i.e., has < and >), we can sort both MCV lists and apply >> mege-style algorithm to detect matches. This reduces runtime from >> O(N^2) to O(NlogN), where N is the number of MCV entries. The patch >> also applies the same optimization to semi-join clauses, which show >> similar performance behavior. Why do you sort both lists and then merge instead of putting the smaller list into a hash map and then doing hash lookups (if the type is hashable)? There are more problems like this in the planner. For example col IN (many values) is also quadratic because for every value in the IN list all MCVs are checked. It would be great to fix this as well. -- David Geier
Hi! On 03.09.2025 18:53, Ilia Evdokimov wrote: > Following up on my previous messages about optimizing eqjoinsel() and > eqjoinsel_semi() for Var1 = Var2 clauses, I’d like to share detailed > profiling results showing the effect of the patch on JOB for different > values of default_statistics_target. > > The first table shows the total planner time (summed over all 113 > queries) before and after applying the patch, along with the speedup > achieved: > > default_statistics_target | Planner Speedup (×) | Planner Before (ms) | > Planner After (ms) > --------------------------+---------------------+--------------------- > +-------------------- > 100 | *1.00x* | 1828.433 | 1820.556 > 1000 | *1.12x* | 2194.282 | 1963.110 > 2500 | *2.15x* | 4606.705 | 2140.126 > 5000 | *6.37x* | 16661.581 | 2616.109 > 7500 | *11.76x* | 35988.569 | > 3061.161 > 10000 | *19.01x* | 66616.620 | > 3504.144 > It looks to me like these results are with optimizations disabled? Can you share the SQL script you used for testing? -- David Geier
Hi Ilia!
On 05.09.2025 16:03, David Geier wrote:
>>> I propose an optimization: when the column datatype supports
>>> ordering(i.e., has < and >), we can sort both MCV lists and apply
>>> mege-style algorithm to detect matches. This reduces runtime from
>>> O(N^2) to O(NlogN), where N is the number of MCV entries. The patch
>>> also applies the same optimization to semi-join clauses, which show
>>> similar performance behavior.
>
> Why do you sort both lists and then merge instead of putting the smaller
> list into a hash map and then doing hash lookups (if the type is hashable)?
I've tested your and my code with the following script:
CREATE TABLE foo(col TEXT);
CREATE TABLE bar(col TEXT);
SET default_statistics_target = 10000;
-- Generate MCV values. PostgreSQL doesn't store MCVs if the table has
-- only a single value or every value has exactly the same cardinality.
DO $$
BEGIN
FOR i IN 1..10000 LOOP
FOR j IN 1..least(i, 50) LOOP
INSERT INTO foo VALUES ('aaaaaaaaaaaaaaaaaaaa' || i::TEXT);
INSERT INTO bar VALUES ('aaaaaaaaaaaaaaaaaaaa' || (i + 100000)::TEXT);
END LOOP;
END LOOP;
END;
$$;
ANALYZE foo, bar;
\timing on
EXPLAIN SELECT * FROM foo f, bar b WHERE f.col = b.col;
Results are:
- master: 433 ms
- Order+Merge: 11 ms
- Hash map: 4 ms
I've attached my draft patch.
--
David Geier
Вложения
On 08.09.2025 13:08, David Geier wrote:
> Hi Ilia!
>
> On 05.09.2025 16:03, David Geier wrote:
>>>> I propose an optimization: when the column datatype supports
>>>> ordering(i.e., has < and >), we can sort both MCV lists and apply
>>>> mege-style algorithm to detect matches. This reduces runtime from
>>>> O(N^2) to O(NlogN), where N is the number of MCV entries. The patch
>>>> also applies the same optimization to semi-join clauses, which show
>>>> similar performance behavior.
>> Why do you sort both lists and then merge instead of putting the smaller
>> list into a hash map and then doing hash lookups (if the type is hashable)?
> I've tested your and my code with the following script:
>
> CREATE TABLE foo(col TEXT);
> CREATE TABLE bar(col TEXT);
> SET default_statistics_target = 10000;
>
> -- Generate MCV values. PostgreSQL doesn't store MCVs if the table has
> -- only a single value or every value has exactly the same cardinality.
> DO $$
> BEGIN
> FOR i IN 1..10000 LOOP
> FOR j IN 1..least(i, 50) LOOP
> INSERT INTO foo VALUES ('aaaaaaaaaaaaaaaaaaaa' || i::TEXT);
> INSERT INTO bar VALUES ('aaaaaaaaaaaaaaaaaaaa' || (i + 100000)::TEXT);
> END LOOP;
> END LOOP;
> END;
> $$;
>
> ANALYZE foo, bar;
> \timing on
> EXPLAIN SELECT * FROM foo f, bar b WHERE f.col = b.col;
>
> Results are:
>
> - master: 433 ms
> - Order+Merge: 11 ms
> - Hash map: 4 ms
>
> I've attached my draft patch.
>
> --
> David Geier
Hi David,
Thank you for reviewing.
I have read all the previous messages - and yes, you are right. I don’t
know why I didn’t consider using a hash table approach initially. Your
idea makes a lot of sense.
To evaluate it, I ran benchmarks on JOB with three variants:
$ ./benchmark.sh master
$ ./benchmark.sh merge
$ ./benchmark.sh hash
I compared total planning time across all 113 queries.
$ python3 planning_time.py master hash
default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100 | 1.00 | 1892.627 |
1886.969
1000 | 1.09 | 2286.922 |
2100.099
2500 | 1.94 | 4647.167 |
2400.711
5000 | 6.15 | 17964.779 |
2919.914
7500 | 10.58 | 38622.443 |
3650.375
10000 | 16.33 | 69538.085 |
4257.864
$ python3 planning_time.py master merge
default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100 | 1.00 | 1892.627 |
1898.622
1000 | 1.12 | 2286.922 |
2033.553
2500 | 1.92 | 4647.167 |
2423.552
5000 | 5.94 | 17964.779 |
3025.739
7500 | 10.48 | 38622.443 |
3684.262
10000 | 16.72 | 69538.085 |
4159.418
Based on these results, I’d prefer the hash lookup implementation, so I
think it makes sense to improve your patch further and bring it into
good shape. Shall I take care of that, or would you prefer to do it
yourself?
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com
Вложения
On 08.09.2025 13:35, Ilia Evdokimov wrote: > Based on these results, I’d prefer the hash lookup implementation, so > I think it makes sense to improve your patch further and bring it into > good shape. Shall I take care of that, or would you prefer to do it > yourself? I realized I mistakenly copied the wrong results for the hash-map version in my previous draft. Sorry about that. Here are the correct benchmark results: Merge default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms) -------------------------------------------------------------------------------- 100 | 1.00 | 1892.627 | 1898.622 1000 | 1.12 | 2286.922 | 2033.553 2500 | 1.92 | 4647.167 | 2423.552 5000 | 5.94 | 17964.779 | 3025.739 7500 | 10.48 | 38622.443 | 3684.262 10000 | 16.72 | 69538.085 | 4159.418 Hash-Map default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms) -------------------------------------------------------------------------------- 100 | 1.00 | 1892.627 | 1886.969 1000 | 1.09 | 2286.922 | 2100.099 2500 | 1.94 | 4647.167 | 2400.711 5000 | 6.15 | 17964.779 | 2919.914 7500 | 10.58 | 38622.443 | 3650.375 10000 | 16.33 | 69538.085 | 4257.864 -- Best regards, Ilia Evdokimov, Tantor Labs LLC, https://tantorlabs.com
Hi Ilia! > I have read all the previous messages - and yes, you are right. I don’t > know why I didn’t consider using a hash table approach initially. Your > idea makes a lot of sense. Your solution would be beneficial on top, for cases where the data type is not hashable. But I think that's overkill for a v1. I would start with the hash-based version. > To evaluate it, I ran benchmarks on JOB with three variants: > > $ ./benchmark.sh master > $ ./benchmark.sh merge > $ ./benchmark.sh hash > > I compared total planning time across all 113 queries. Was this running with optimizations? How did you extract the planning time? > > $ python3 planning_time.py master hash > default_statistics_target | Planner Speedup (×) | Planner Before (ms) | > Planner After (ms) > -------------------------------------------------------------------------------- > 100 | 1.00 | 1892.627 | > 1886.969 > 1000 | 1.09 | 2286.922 | > 2100.099 > 2500 | 1.94 | 4647.167 | > 2400.711 > 5000 | 6.15 | 17964.779 | > 2919.914 > 7500 | 10.58 | 38622.443 | > 3650.375 > 10000 | 16.33 | 69538.085 | > 4257.864 > > $ python3 planning_time.py master merge > default_statistics_target | Planner Speedup (×) | Planner Before (ms) | > Planner After (ms) > -------------------------------------------------------------------------------- > 100 | 1.00 | 1892.627 | > 1898.622 > 1000 | 1.12 | 2286.922 | > 2033.553 > 2500 | 1.92 | 4647.167 | > 2423.552 > 5000 | 5.94 | 17964.779 | > 3025.739 > 7500 | 10.48 | 38622.443 | > 3684.262 > 10000 | 16.72 | 69538.085 | > 4159.418 I would have expected the delta between the "merge" and "hash" variant to be bigger, especially for default_statistics_target=10000. My small test also showed that. Any idea why this is not showing in your results? > Based on these results, I’d prefer the hash lookup implementation, so I > think it makes sense to improve your patch further and bring it into > good shape. Shall I take care of that, or would you prefer to do it > yourself? I think process-wise it's best if you review my code and I do the changes. Could you as part of your review test tables with just a few MCVs to make sure we're not regressing "small" cases? For now I'm only bailing if one of the two MCV lists has just a single value. I'm expecting the gains from fine tuning this value to be not measurable but let's double check. -- David Geier
On 08.09.2025 12:45, Ilia Evdokimov wrote: > > I realized I mistakenly copied the wrong results for the hash-map > version in my previous draft. Sorry about that. Here are the correct > benchmark results: > > Merge > > default_statistics_target | Planner Speedup (×) | Planner Before (ms) | > Planner After (ms) > -------------------------------------------------------------------------------- > 100 | 1.00 | 1892.627 | > 1898.622 > 1000 | 1.12 | 2286.922 | > 2033.553 > 2500 | 1.92 | 4647.167 | > 2423.552 > 5000 | 5.94 | 17964.779 | > 3025.739 > 7500 | 10.48 | 38622.443 | > 3684.262 > 10000 | 16.72 | 69538.085 | > 4159.418 > > > Hash-Map > > default_statistics_target | Planner Speedup (×) | Planner Before (ms) | > Planner After (ms) > -------------------------------------------------------------------------------- > 100 | 1.00 | 1892.627 | > 1886.969 > 1000 | 1.09 | 2286.922 | > 2100.099 > 2500 | 1.94 | 4647.167 | > 2400.711 > 5000 | 6.15 | 17964.779 | > 2919.914 > 7500 | 10.58 | 38622.443 | > 3650.375 > 10000 | 16.33 | 69538.085 | > 4257.864 > It still seems to me like something is fishy with the numbers or something in the benchmark adds a lot over overhead so that small differences in eqjoinsel_inner() don't show here. The delta between "hash" and "merge" for default_statistics_target=10000 should be the biggest but it's actually slower. -- David Geier
To evaluate it, I ran benchmarks on JOB with three variants: $ ./benchmark.sh master $ ./benchmark.sh merge $ ./benchmark.sh hash I compared total planning time across all 113 queries.Was this running with optimizations? How did you extract the planning time?
I save all query plans using EXPLAIN SUMMARY, then go through all the plans, read the 'Planning Time' for each, and sum them up.
I would have expected the delta between the "merge" and "hash" variant to be bigger, especially for default_statistics_target=10000. My small test also showed that. Any idea why this is not showing in your results?
So would I. With default_statistics_target = 10000 and the selectivity in the JOB queries being close to zero, the difference should be noticeable. I can only explain the previous results by cache-related effects on my machine.
I reran the benchmark on a clean cluster and collected the top slowest JOB queries — now the effect is clearly visible.
Merge (sum of all JOB queries)
==================
default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms)
--------------------------------------------------------------------------------
100 | 1.00 | 1888.105 | 1879.431
1000 | 1.14 | 2282.239 | 2009.114
2500 | 2.10 | 5595.030 | 2668.530
5000 | 5.56 | 18544.933 | 3333.252
7500 | 9.17 | 37390.956 | 4076.390
10000 | 16.10 | 69319.479 | 4306.417
HashMap (sum of all JOB queries)
==================
default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms)
--------------------------------------------------------------------------------
100 | 1.03 | 1888.105 | 1828.088
1000 | 1.18 | 2282.239 | 1939.884
2500 | 2.64 | 5595.030 | 2117.872
5000 | 7.80 | 18544.933 | 2377.206
7500 | 13.80 | 37390.956 | 2709.973
10000 | 23.32 | 69319.479 | 2973.073
Top 10 slowest JOB queries (default_statistics_target = 10000)
Query | master (ms) | merge (ms) | Hash (ms)
------+-------------+------------+-----------
29c | 1904.586 | 144.135 | 100.473
29b | 1881.392 | 117.891 | 89.028
29a | 1868.805 | 112.242 | 83.913
31c | 1867.234 | 76.498 | 56.140
30c | 1646.630 | 88.494 | 62.549
30b | 1608.820 | 84.821 | 64.603
31a | 1573.964 | 75.978 | 56.140
28a | 1457.738 | 95.939 | 77.309
28b | 1455.052 | 99.383 | 73.065
30a | 1416.699 | 91.057 | 62.549
BTW, the hashmap from your patch could also be applied to eqjoinsel_semi() function.
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com
Hi! On 08.09.2025 15:45, Ilia Evdokimov wrote: > I reran the benchmark on a clean cluster and collected the top slowest > JOB queries — now the effect is clearly visible. > > Merge (sum of all JOB queries) > ================== > default_statistics_target | Planner Speedup (×) | Planner Before (ms) | > Planner After (ms) > -------------------------------------------------------------------------------- > 100 | *1.00* | 1888.105 > | 1879.431 > 1000 | *1.14* | 2282.239 > | 2009.114 > 2500 | *2.10* | 5595.030 > | 2668.530 > 5000 | *5.56* | 18544.933 > | 3333.252 > 7500 | *9.17* | 37390.956 > | 4076.390 > 10000 | *16.10* | 69319.479 > | 4306.417 > > HashMap (sum of all JOB queries) > ================== > default_statistics_target | Planner Speedup (×) | Planner Before (ms) | > Planner After (ms) > -------------------------------------------------------------------------------- > 100 | *1.03* | 1888.105 | > 1828.088 > 1000 | *1.18* | 2282.239 | > 1939.884 > 2500 | *2.64* | 5595.030 | > 2117.872 > 5000 | *7.80* | 18544.933 | > 2377.206 > 7500 | *13.80* | 37390.956 | > 2709.973 > 10000 | *23.32* | 69319.479 | > 2973.073 > > Top 10 slowest JOB queries (default_statistics_target = 10000) > Query | master (ms) | merge (ms) | Hash (ms) > ------+-------------+------------+----------- > 29c | 1904.586 | 144.135 | 100.473 > 29b | 1881.392 | 117.891 | 89.028 > 29a | 1868.805 | 112.242 | 83.913 > 31c | 1867.234 | 76.498 | 56.140 > 30c | 1646.630 | 88.494 | 62.549 > 30b | 1608.820 | 84.821 | 64.603 > 31a | 1573.964 | 75.978 | 56.140 > 28a | 1457.738 | 95.939 | 77.309 > 28b | 1455.052 | 99.383 | 73.065 > 30a | 1416.699 | 91.057 | 62.549 This looks much better. Very nice! > > BTW, the hashmap from your patch could also be applied to > eqjoinsel_semi() function. > Yep. The inner loop only runs until clamped_nvalues2 and it doesn't compute matchprodfreq. I'll try to modify the patch such that it accounts for these differences without being too hard to read. Do you think anything else needs changes in the patch? Did you have a chance to check tables with just few MCVs or are there any queries in the JOB which regress with very small default_statistics_target? -- David Geier
Hi David,
Do you think anything else needs changes in the patch?
From an architectural perspective, I think the patch is already in good shape. However, I have some suggestions regarding code style:
- I would move McvHashEntry, McvHashContext, he new hash table definition, hash_mcv and are_mcvs_equal to the top.
- I’m not sure get_hash_func_oid() is needed at all – it seems we could do without it.
- It would be better to name the parameters matchProductFrequencies -> matchprodfreq, nMatches -> nmatches, hasMatch1 -> hasmatch1, hasMatch2 -> hasmatch2 in eqjoinsel_inner_with_hashtable().
- As I wrote earlier, since we now have a dedicated function eqjoinsel_inner_with_hashtable(), perhaps it could be used in both eqjoinsel_inner() and eqjoinsel_semi(). And since the hash-based search was factored out, maybe it would make sense to also factor out the O(N^2) nested loop implementation?
- I think it would be helpful to add a comment explaining that using a hash table is not efficient when the MCV array length equals 1:
if (Min(statsSlot1->nvalues, statsSlot2->nvalues) == 1)
return false;
Did you have a chance to check tables with just few MCVs or are there any queries in the JOB which regress with very small default_statistics_target?
Sure. I need some time to check this.
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com
Hi! Thanks for the review. On 09.09.2025 09:29, Ilia Evdokimov wrote: > From an architectural perspective, I think the patch is already in good > shape. However, I have some suggestions regarding code style: > > 1. I would move McvHashEntry, McvHashContext, he new hash table > definition, hash_mcv and are_mcvs_equal to the top. Done. I've also moved up try_eqjoinsel_with_hashtable(). > 2. I’m not sure get_hash_func_oid() is needed at all – it seems we > could do without it. Removed. > 3. It would be better to name the parameters matchProductFrequencies -> > matchprodfreq, nMatches -> nmatches, hasMatch1 -> hasmatch1, > hasMatch2 -> hasmatch2 in eqjoinsel_inner_with_hashtable(). Done. > 4. As I wrote earlier, since we now have a dedicated function > eqjoinsel_inner_with_hashtable(), perhaps it could be used in both > eqjoinsel_inner() and eqjoinsel_semi(). And since the hash-based Done. The gains for SEMI join are even bigger because the function is called 3 times for e.g. EXPLAIN SELECT * FROM foo f WHERE EXISTS (SELECT 1 FROM bar b where f.col = b.col); For that query the planning time for me goes from ~1300 ms -> 12 ms. > search was factored out, maybe it would make sense to also factor > out the O(N^2) nested loop implementation? Generally agreed and while tempting, I've refrained from doing the refactoring. Let's better do this in a separate patch to keep things simple. > 5. I think it would be helpful to add a comment explaining that using a > hash table is not efficient when the MCV array length equals 1: > > if (Min(statsSlot1->nvalues, statsSlot2->nvalues) == 1) > return false; Done. >> Did you have a >> chance to check tables with just few MCVs or are there any queries in >> the JOB which regress with very small default_statistics_target? > > > Sure. I need some time to check this. > Could you please do that with the latest attached patch so that we test it once more? Could you also run once more the JOB benchmark to get some test coverage on the SEMI join code (assuming it also uses SEMI joins)? Once we've concluded on above and there are no objections, I will register this patch in the commit fest. -- David Geier
Вложения
Hi! On 09.09.2025 12:22, David Geier wrote: > Could you please do that with the latest attached patch so that we test > it once more? LGTM. Yes, I'll test this patch. > > Could you also run once more the JOB benchmark to get some test coverage > on the SEMI join code (assuming it also uses SEMI joins)? Unfortunately, the JOB benchmark does not contain semi join nodes. However, TPC-DS does. I'll look for the queries with slowest planner times there and check them. I'll need some time to check both join and semi join cases with small and large default_statistics_target. I'll share the results later. > > Once we've concluded on above and there are no objections, I will > register this patch in the commit fest. Sure. No problem. -- Best regards, Ilia Evdokimov, Tantor Labs LLC, https://tantorlabs.com
Hi hackers, On 10.09.2025 16:56, Ilia Evdokimov wrote: > Unfortunately, the JOB benchmark does not contain semi join nodes. > However, TPC-DS does. I'll look for the queries with slowest planner > times there and check them. > > I'll need some time to check both join and semi join cases with small > and large default_statistics_target. I'll share the results later. JOIN ============================== I’ve benchmarked the new implementation of eqjoinsel() with different values of default_statistics_target. On small targets (1, 5, 10, 25, 50, 75, 100) the results are all within statistical noise, and I did not observe any regressions. In my view, it’s reasonable to keep the current condition that the hash table is not used for default_statistics_target = 1. Raising that threshold does not seem useful. Here are the results for JOB queries (where the effect of semi join is not visible due to different data distributions): default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms) ------------------------------------------------------------------------------------------ 1 | 1.00 | 1846.643 | 1847.409 5 | 1.00 | 1836.391 | 1828.318 10 | 0.95 | 1841.750 | 1929.722 25 | 0.99 | 1873.172 | 1890.741 50 | 0.98 | 1869.897 | 1898.470 75 | 1.02 | 1969.368 | 1929.521 100 | 0.97 | 1857.890 | 1921.207 1000 | 1.14 | 2279.700 | 1997.102 2500 | 1.78 | 4682.658 | 2636.202 5000 | 6.45 | 15943.696 | 2471.242 7500 | 12.45 | 34350.855 | 2758.565 10000 | 20.52 | 62519.342 | 3046.819 SEMI JOIN ============================== Unfortunately, in TPC-DS it is not possible to clearly see improvements for semi joins. To address this, I designed a synthetic example where the data distribution forces the loop to run fully, without exiting early, which makes the effect on semi joins more visible. In this setup, I also ensured that the length of the MCV array is equal to the chosen default_statistics_target. CREATE TABLE t1 AS SELECT CASE WHEN g <= 3000000 * 0.9 THEN (g % 10000) + 1 ELSE (g % 1000000) + 10000 END AS id FROM generate_series(1, 3000000) g; CREATE TABLE t2 AS SELECT CASE WHEN g <= 3000000 * 0.9 THEN (g % 10000) + 10001 ELSE (g % 1000000) + 20000 END AS id FROM generate_series(1, 3000000) g; ANALYZE t1, t2; The results of the query are: SELECT * FROM t1 WHERE id IN (SELECT id FROM t2); default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms) ------------------------------------------------------------------------------------------ 1 | 1.12 | 1.191 | 1.062 5 | 1.02 | 0.493 | 0.481 10 | 0.92 | 0.431 | 0.471 25 | 1.27 | 0.393 | 0.309 50 | 1.04 | 0.432 | 0.416 75 | 0.96 | 0.398 | 0.415 100 | 0.95 | 0.450 | 0.473 1000 | 9.42 | 6.742 | 0.716 2500 | 19.15 | 21.621 | 1.129 5000 | 46.74 | 85.667 | 1.833 7500 | 73.26 | 194.806 | 2.659 10000 | 107.95 | 349.981 | 3.242 -- Best regards, Ilia Evdokimov, Tantor Labs LLC, https://tantorlabs.com
Hi David, In v2 patch, when the join is reversed we pass the commutator operator Oid to eqjoinsel_semi(), and inside that function we immediately call get_opcode(<commutator operator Oid>). Did you mean for the function to take an operator Oid instead of an here? If that was unintentional, perhaps the cleanest fix is to add a new 'operator' parameter to eqjoinsel_semi() so we can keep passing 'opfuncoid' as before and avoid changing the behavior. -- Best regards, Ilia Evdokimov, Tantor Labs LLC, https://tantorlabs.com
On 17.09.2025 12:40, Ilia Evdokimov wrote: > Hi David, > > In v2 patch, when the join is reversed we pass the commutator operator > Oid to eqjoinsel_semi(), and inside that function we immediately call > get_opcode(<commutator operator Oid>). Did you mean for the function > to take an operator Oid instead of an here? > > If that was unintentional, perhaps the cleanest fix is to add a new > 'operator' parameter to eqjoinsel_semi() so we can keep passing > 'opfuncoid' as before and avoid changing the behavior. > This v3 patch fixes the confusion between operator and function Oids in eqjoinsel_semi(). This version restores the previous behavior by keeping the function Oid as before and adds an explicit 'operator' parameter so both values are available without extra behavior changes. Do you have any further comments or suggestions on this version? -- Best regards, Ilia Evdokimov, Tantor Labs LLC, https://tantorlabs.com
Вложения
Hi Ilia! On 10.09.2025 15:56, Ilia Evdokimov wrote: > > LGTM. Yes, I'll test this patch. > > > > Unfortunately, the JOB benchmark does not contain semi join nodes. > However, TPC-DS does. I'll look for the queries with slowest planner > times there and check them. > > I'll need some time to check both join and semi join cases with small > and large default_statistics_target. I'll share the results later. > Have you had a chance to test above things? I've seen that there's already a commit fest entry. I've adapted the entry (changed the title and added myself as author). Do you want to add yourself as reviewer? -- David Geier
Hi David, On 27.10.2025 18:50, David Geier wrote: > Hi Ilia! > > On 10.09.2025 15:56, Ilia Evdokimov wrote: >> LGTM. Yes, I'll test this patch. >> >> >> >> Unfortunately, the JOB benchmark does not contain semi join nodes. >> However, TPC-DS does. I'll look for the queries with slowest planner >> times there and check them. >> >> I'll need some time to check both join and semi join cases with small >> and large default_statistics_target. I'll share the results later. >> > Have you had a chance to test above things? Yes, I wrote about this here: https://www.postgresql.org/message-id/c3dbf2ab-d72d-4033-822a-60ad8023f499%40tantorlabs.com -- Best regards, Ilia Evdokimov, Tantor Labs LLC, https://tantorlabs.com/
Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes: > On 27.10.2025 18:50, David Geier wrote: >> On 10.09.2025 15:56, Ilia Evdokimov wrote: >>> I'll need some time to check both join and semi join cases with small >>> and large default_statistics_target. I'll share the results later. >> Have you had a chance to test above things? > Yes, I wrote about this here: > https://www.postgresql.org/message-id/c3dbf2ab-d72d-4033-822a-60ad8023f499%40tantorlabs.com Hmm. Those results sure look like there is a performance regression up to at least 100 MCVs ... not a large one, but consistently a few percent. That's a bit sad for a patch purporting to improve performance. It looks to me like perhaps we should stick to the old algorithm up to 100 or possibly even more MCVs. Certainly the threshold needs to be higher than 1, as you have it now. I eyeballed the patch itself very briefly, and have a couple quick comments: * Is hash_msv a typo for hash_mcv? If not, maybe spell out what it's supposed to mean. * The patch would be easier to read if it didn't reindent a couple large chunks of existing code. Can we change the factorization to avoid that? If not, I'd recommend submitting without that reindentation, and reminding the committer to reindent at the last moment. * The calculation loops in eqjoinsel_inner and eqjoinsel_semi are not identical, which makes it look quite weird to be writing just one function that conditionally replaces both. I wonder if we should refactor to have just one copy (and just eat the extra cycles of calculating matchprodfreq). * In fact ... I wonder if we should try harder to not do essentially identical calculations twice, remembering that eqjoinsel_semi is always used alongside eqjoinsel_inner. (Of course, we could only do that if clamped_nvalues2 is the same as sslot2->nvalues, but that's frequently true.) I think the reason it's duplicative right now is that we regarded this semijoin calculation as an experiment and so didn't want to invest a lot of effort in it ... but this patch is exactly a lot of effort, so maybe it's time to deal with that unfinished business. regards, tom lane
Thanks for the detailed feedback!
On 04.11.2025 00:55, Tom Lane wrote:
Hmm. Those results sure look like there is a performance regression up to at least 100 MCVs ... not a large one, but consistently a few percent. That's a bit sad for a patch purporting to improve performance. It looks to me like perhaps we should stick to the old algorithm up to 100 or possibly even more MCVs. Certainly the threshold needs to be higher than 1, as you have it now.
I re-ran the benchmark on JOB with a threshold of 100.Here are the updated results:
default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms)
--------------------------------------------------------------------------------
1 | 1.00 | 2320.412 | 2318.377
5 | 0.99 | 2335.894 | 2360.890
10 | 1.00 | 2350.612 | 2347.154
25 | 1.01 | 2365.977 | 2342.312
50 | 0.99 | 2381.554 | 2405.262
75 | 1.00 | 2396.481 | 2399.828
100 | 1.00 | 2410.367 | 2412.456
1000 | 1.11 | 2850.853 | 2564.303
2500 | 2.04 | 5571.688 | 2731.545
5000 | 6.05 | 18850.084 | 3114.692
7500 | 11.96 | 39160.898 | 3273.688
10000 | 19.04 | 71334.113 | 3745.955
I eyeballed the patch itself very briefly, and have a couple quick comments: * Is hash_msv a typo for hash_mcv? If not, maybe spell out what it's supposed to mean.
Yes, that was a typo — fixed.
* The patch would be easier to read if it didn't reindent a couple large chunks of existing code. Can we change the factorization to avoid that? If not, I'd recommend submitting without that reindentation, and reminding the committer to reindent at the last moment.
Fixed as well. I’ve removed all reindentation changes. I will keep that in mind for future submissions.
* The calculation loops in eqjoinsel_inner and eqjoinsel_semi are not identical, which makes it look quite weird to be writing just one function that conditionally replaces both. I wonder if we should refactor to have just one copy (and just eat the extra cycles of calculating matchprodfreq).
Agreed. I’ve dropped the attempt to merge them into a single function.
* In fact ... I wonder if we should try harder to not do essentially identical calculations twice, remembering that eqjoinsel_semi is always used alongside eqjoinsel_inner. (Of course, we could only do that if clamped_nvalues2 is the same as sslot2->nvalues, but that's frequently true.) I think the reason it's duplicative right now is that we regarded this semijoin calculation as an experiment and so didn't want to invest a lot of effort in it ... but this patch is exactly a lot of effort, so maybe it's time to deal with that unfinished business. regards, tom lane
Good point. I addressed this in a separate patch: eqjoinsel_inner() now saves matchfreq1, matchfreq2, nmatches so that eqjoinsel_semi() can reuse them when (clamped_nvalues2 == sslot2->nvalues). If the MCV list on the RHS is clamped, we still recompute locally. If you have a cleaner idea for how to share these values between the two functions without passing them explicitly, I’d be happy to consider it.
I’m attaching two patches:
1. v4-0001-Avoid-duplicate-MCV-matching-in-eqjoinsel_semi-an.patch - removes redundant MCV matching for semi/anti joins;
2. v4-0002-Optimize-MCV-matching-in-eqjoinsel_inner-and-eqjo.patch - adds hash-based MCV matching with a configurable threshold and includes fixes based on your comments.
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/
Вложения
Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes:
> Good point. I addressed this in a separate patch: eqjoinsel_inner() now
> saves matchfreq1, matchfreq2, nmatches so that eqjoinsel_semi() can
> reuse them when (clamped_nvalues2 == sslot2->nvalues). If the MCV list
> on the RHS is clamped, we still recompute locally. If you have a cleaner
> idea for how to share these values between the two functions without
> passing them explicitly, I’d be happy to consider it.
This didn't look very much like the refactorization that I had in
mind: I thought we should have one copy of the matching code, not two.
Also, after looking closer at your patch I realized you were just
punting for cross-type comparison operators, which I felt was kind
of sad. It's a little bit tricky to get simplehash.h to go along
with cross-type hashing, because it wants to use just one hash and
one equality function. But since those are interface routines we
are going to supply anyway, we can make them deal with the insert
and lookup cases differently.
So after a bit of hacking I ended up with the attached. I split up
the refactorization into several steps to make it easier to review.
(But I'd anticipate squashing these into one commit in the end,
so I didn't spend a lot of time on the commit messages.)
Also, 0001 in this series is not meant to be committed; what it
does is to add some debug logging to ease comparing runtimes of
different versions of eqjoinsel. I was able to use that to
convince myself that the refactoring steps didn't cost anything
meaningful in performance. Perhaps we could use it to investigate
the right hashing threshold more carefully, too.
There are still a couple of XXX comments in the attached, denoting
loose ends to look at. In particular, I wondered whether the
hash threshold check
if (Min(sslot1.nvalues, sslot2.nvalues) >= EQJOINSEL_MCV_HASH_THRESHOLD)
should use Max() instead --- that is, it might be safer to hash
if either MCV list is long. Or, holding one's head at a different
angle, perhaps the sum of the list lengths should be what's checked?
regards, tom lane
From e4f70adb688b5abf400f797ba0f15290a9529ac1 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 13 Nov 2025 12:18:19 -0500
Subject: [PATCH v5 1/5] Add some test scaffolding to join_selectivity().
This not-meant-for-commit patch adds some instrumentation to
plancat.c's join_selectivity() to log the result and runtime
of a join selectivity function. This is useful for manual
testing of performance patches in eqjoinsel().
To improve the accuracy of the runtime measurement, run the
function 1000 times in each call. The regression tests still
take a reasonable amount of time with this overhead, although
it's noticeably more than usual.
---
src/backend/optimizer/util/plancat.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d950bd93002..15c50e068a0 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -42,6 +42,7 @@
#include "parser/parse_relation.h"
#include "parser/parsetree.h"
#include "partitioning/partdesc.h"
+#include "portability/instr_time.h"
#include "rewrite/rewriteHandler.h"
#include "rewrite/rewriteManip.h"
#include "statistics/statistics.h"
@@ -2123,6 +2124,8 @@ join_selectivity(PlannerInfo *root,
{
RegProcedure oprjoin = get_oprjoin(operatorid);
float8 result;
+ instr_time start_time,
+ end_time;
/*
* if the oprjoin procedure is missing for whatever reason, use a
@@ -2131,6 +2134,10 @@ join_selectivity(PlannerInfo *root,
if (!oprjoin)
return (Selectivity) 0.5;
+ INSTR_TIME_SET_CURRENT(start_time);
+
+ for (int i = 0; i < 1000; i++)
+ {
result = DatumGetFloat8(OidFunctionCall5Coll(oprjoin,
inputcollid,
PointerGetDatum(root),
@@ -2138,6 +2145,21 @@ join_selectivity(PlannerInfo *root,
PointerGetDatum(args),
Int16GetDatum(jointype),
PointerGetDatum(sjinfo)));
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ /* Don't be chatty during initdb */
+ if (IsUnderPostmaster)
+ {
+ INSTR_TIME_SET_CURRENT(end_time);
+
+ INSTR_TIME_SUBTRACT(end_time, start_time);
+
+ /* multiply by 1e6 to get microsecs, divide by 1000 to cancel loop */
+ elog(LOG, "join_selectivity(opr %u, jointype %d) = %f, time %g us",
+ operatorid, jointype, result,
+ INSTR_TIME_GET_DOUBLE(end_time) * 1e3);
+ }
if (result < 0.0 || result > 1.0)
elog(ERROR, "invalid join selectivity: %f", result);
--
2.43.7
From 162f58fe782e60d548a0598b0228c096438b9a44 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 13 Nov 2025 12:40:53 -0500
Subject: [PATCH v5 2/5] Factor out duplicative code in
eqjoinsel_inner/eqjoinsel_semi.
These functions have essentially identical code for scanning the
two MCV lists and identifying which entries have matches in the
other list. While it's not a huge amount of code, it's 50 or
so lines, and will be more after an upcoming patch to use a hash
table with many MCVs. Let's reduce duplication by moving that
code into a common subroutine.
The one downside of doing this is that we must compute
sum(sslot1->numbers[i] * sslot2->numbers[j]) even though
eqjoinsel_semi won't need that. But the cost of that appears
negligible, so I didn't trouble to invent a way of avoiding it.
---
src/backend/utils/adt/selfuncs.c | 191 ++++++++++++++++---------------
1 file changed, 99 insertions(+), 92 deletions(-)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cb23ad52782..590b3a0c078 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -163,6 +163,11 @@ static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
Form_pg_statistic stats1, Form_pg_statistic stats2,
bool have_mcvs1, bool have_mcvs2,
RelOptInfo *inner_rel);
+static void eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+ AttStatsSlot *sslot1, AttStatsSlot *sslot2,
+ int nvalues1, int nvalues2,
+ bool *hasmatch1, bool *hasmatch2,
+ int *p_nmatches, double *p_matchprodfreq);
static bool estimate_multivariate_ndistinct(PlannerInfo *root,
RelOptInfo *rel, List **varinfos, double *ndistinct);
static bool convert_to_scalar(Datum value, Oid valuetypid, Oid collid,
@@ -2473,8 +2478,6 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
* results", Technical Report 1018, Computer Science Dept., University
* of Wisconsin, Madison, March 1991 (available from ftp.cs.wisc.edu).
*/
- LOCAL_FCINFO(fcinfo, 2);
- FmgrInfo eqproc;
bool *hasmatch1;
bool *hasmatch2;
double nullfrac1 = stats1->stanullfrac;
@@ -2491,55 +2494,17 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
int i,
nmatches;
- fmgr_info(opfuncoid, &eqproc);
-
- /*
- * Save a few cycles by setting up the fcinfo struct just once. Using
- * FunctionCallInvoke directly also avoids failure if the eqproc
- * returns NULL, though really equality functions should never do
- * that.
- */
- InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
- NULL, NULL);
- fcinfo->args[0].isnull = false;
- fcinfo->args[1].isnull = false;
-
+ /* Construct the match arrays */
hasmatch1 = (bool *) palloc0(sslot1->nvalues * sizeof(bool));
hasmatch2 = (bool *) palloc0(sslot2->nvalues * sizeof(bool));
- /*
- * Note we assume that each MCV will match at most one member of the
- * other MCV list. If the operator isn't really equality, there could
- * be multiple matches --- but we don't look for them, both for speed
- * and because the math wouldn't add up...
- */
- matchprodfreq = 0.0;
- nmatches = 0;
- for (i = 0; i < sslot1->nvalues; i++)
- {
- int j;
-
- fcinfo->args[0].value = sslot1->values[i];
-
- for (j = 0; j < sslot2->nvalues; j++)
- {
- Datum fresult;
-
- if (hasmatch2[j])
- continue;
- fcinfo->args[1].value = sslot2->values[j];
- fcinfo->isnull = false;
- fresult = FunctionCallInvoke(fcinfo);
- if (!fcinfo->isnull && DatumGetBool(fresult))
- {
- hasmatch1[i] = hasmatch2[j] = true;
- matchprodfreq += sslot1->numbers[i] * sslot2->numbers[j];
- nmatches++;
- break;
- }
- }
- }
+ eqjoinsel_find_matches(opfuncoid, collation,
+ sslot1, sslot2,
+ sslot1->nvalues, sslot2->nvalues,
+ hasmatch1, hasmatch2,
+ &nmatches, &matchprodfreq);
CLAMP_PROBABILITY(matchprodfreq);
+
/* Sum up frequencies of matched and unmatched MCVs */
matchfreq1 = unmatchfreq1 = 0.0;
for (i = 0; i < sslot1->nvalues; i++)
@@ -2700,12 +2665,11 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
* lists. We still have to estimate for the remaining population, but
* in a skewed distribution this gives us a big leg up in accuracy.
*/
- LOCAL_FCINFO(fcinfo, 2);
- FmgrInfo eqproc;
bool *hasmatch1;
bool *hasmatch2;
double nullfrac1 = stats1->stanullfrac;
- double matchfreq1,
+ double matchprodfreq,
+ matchfreq1,
uncertainfrac,
uncertain;
int i,
@@ -2721,52 +2685,16 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
*/
clamped_nvalues2 = Min(sslot2->nvalues, nd2);
- fmgr_info(opfuncoid, &eqproc);
-
- /*
- * Save a few cycles by setting up the fcinfo struct just once. Using
- * FunctionCallInvoke directly also avoids failure if the eqproc
- * returns NULL, though really equality functions should never do
- * that.
- */
- InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
- NULL, NULL);
- fcinfo->args[0].isnull = false;
- fcinfo->args[1].isnull = false;
-
+ /* Construct the match arrays */
hasmatch1 = (bool *) palloc0(sslot1->nvalues * sizeof(bool));
hasmatch2 = (bool *) palloc0(clamped_nvalues2 * sizeof(bool));
- /*
- * Note we assume that each MCV will match at most one member of the
- * other MCV list. If the operator isn't really equality, there could
- * be multiple matches --- but we don't look for them, both for speed
- * and because the math wouldn't add up...
- */
- nmatches = 0;
- for (i = 0; i < sslot1->nvalues; i++)
- {
- int j;
-
- fcinfo->args[0].value = sslot1->values[i];
-
- for (j = 0; j < clamped_nvalues2; j++)
- {
- Datum fresult;
+ eqjoinsel_find_matches(opfuncoid, collation,
+ sslot1, sslot2,
+ sslot1->nvalues, clamped_nvalues2,
+ hasmatch1, hasmatch2,
+ &nmatches, &matchprodfreq);
- if (hasmatch2[j])
- continue;
- fcinfo->args[1].value = sslot2->values[j];
- fcinfo->isnull = false;
- fresult = FunctionCallInvoke(fcinfo);
- if (!fcinfo->isnull && DatumGetBool(fresult))
- {
- hasmatch1[i] = hasmatch2[j] = true;
- nmatches++;
- break;
- }
- }
- }
/* Sum up frequencies of matched MCVs */
matchfreq1 = 0.0;
for (i = 0; i < sslot1->nvalues; i++)
@@ -2830,6 +2758,85 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
return selec;
}
+/*
+ * Identify matching MCVs for eqjoinsel_inner or eqjoinsel_semi.
+ *
+ * Inputs:
+ * opfuncoid: OID of equality function to use (might be cross-type)
+ * collation: OID of collation to use
+ * sslot1, sslot2: MCV values for the lefthand and righthand inputs
+ * nvalues1, nvalues2: number of values to be considered (can be less than
+ * sslotN->nvalues, but not more)
+ * Outputs:
+ * hasmatch1[], hasmatch2[]: pre-zeroed arrays of lengths nvalues1, nvalues2;
+ * entries are set to true if that MCV has a match on the other side
+ * *p_nmatches: receives number of MCV pairs that match
+ * *p_matchprodfreq: receives sum(sslot1->numbers[i] * sslot2->numbers[j])
+ * for matching MCVs
+ *
+ * Note we assume that each MCV will match at most one member of the other
+ * MCV list. If the operator isn't really equality, there could be multiple
+ * matches --- but we don't look for them, both for speed and because the
+ * math wouldn't add up...
+ */
+static void
+eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+ AttStatsSlot *sslot1, AttStatsSlot *sslot2,
+ int nvalues1, int nvalues2,
+ bool *hasmatch1, bool *hasmatch2,
+ int *p_nmatches, double *p_matchprodfreq)
+{
+ LOCAL_FCINFO(fcinfo, 2);
+ FmgrInfo eqproc;
+ double matchprodfreq = 0.0;
+ int nmatches = 0;
+
+ fmgr_info(opfuncoid, &eqproc);
+
+ /*
+ * Save a few cycles by setting up the fcinfo struct just once. Using
+ * FunctionCallInvoke directly also avoids failure if the eqproc returns
+ * NULL, though really equality functions should never do that.
+ */
+ InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
+ NULL, NULL);
+ fcinfo->args[0].isnull = false;
+ fcinfo->args[1].isnull = false;
+
+ /*
+ * The reason for this extra level of braces will become apparent later.
+ * For now, it just prevents having to re-indent this chunk of code moved
+ * from eqjoinsel_inner.
+ */
+ {
+ for (int i = 0; i < nvalues1; i++)
+ {
+ fcinfo->args[0].value = sslot1->values[i];
+
+ for (int j = 0; j < nvalues2; j++)
+ {
+ Datum fresult;
+
+ if (hasmatch2[j])
+ continue;
+ fcinfo->args[1].value = sslot2->values[j];
+ fcinfo->isnull = false;
+ fresult = FunctionCallInvoke(fcinfo);
+ if (!fcinfo->isnull && DatumGetBool(fresult))
+ {
+ hasmatch1[i] = hasmatch2[j] = true;
+ matchprodfreq += sslot1->numbers[i] * sslot2->numbers[j];
+ nmatches++;
+ break;
+ }
+ }
+ }
+ }
+
+ *p_nmatches = nmatches;
+ *p_matchprodfreq = matchprodfreq;
+}
+
/*
* neqjoinsel - Join selectivity of "!="
*/
--
2.43.7
From c476028f989dd01d27ffa4b2c1a175c18f379d4a Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 13 Nov 2025 12:50:44 -0500
Subject: [PATCH v5 3/5] Rethink eqjoinsel's handling of reversed joins.
Formerly, if we needed to deal with a "reversed" join where the
outer-side variable is on the right hand of the given operator,
we looked up the operator's commutator and applied that, so that
eqjoinsel_semi could always treat "sslot1" as the outer-side
variable of the semijoin.
This isn't great, because we ended up punting to a poor estimate
if no commutator is recorded. It also doesn't play well with
later changes in this patch series. Instead, let's handle the
case by swapping the left and right input values just before
we call the comparison operator. While this theoretically adds
cycles to the inner comparison loop, with the coding proposed
here I don't see any real timing difference. (But I only tested
it on x86_64.)
---
src/backend/utils/adt/selfuncs.c | 44 +++++++++++++++++++++++---------
1 file changed, 32 insertions(+), 12 deletions(-)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 590b3a0c078..53653d2d05b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -156,6 +156,7 @@ static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
Form_pg_statistic stats1, Form_pg_statistic stats2,
bool have_mcvs1, bool have_mcvs2);
static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
+ bool op_is_reversed,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
bool isdefault1, bool isdefault2,
@@ -164,6 +165,7 @@ static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
bool have_mcvs1, bool have_mcvs2,
RelOptInfo *inner_rel);
static void eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+ bool op_is_reversed,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
int nvalues1, int nvalues2,
bool *hasmatch1, bool *hasmatch2,
@@ -2394,6 +2396,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
if (!join_is_reversed)
selec = eqjoinsel_semi(opfuncoid, collation,
+ false,
&vardata1, &vardata2,
nd1, nd2,
isdefault1, isdefault2,
@@ -2402,11 +2405,8 @@ eqjoinsel(PG_FUNCTION_ARGS)
have_mcvs1, have_mcvs2,
inner_rel);
else
- {
- Oid commop = get_commutator(operator);
- Oid commopfuncoid = OidIsValid(commop) ? get_opcode(commop) : InvalidOid;
-
- selec = eqjoinsel_semi(commopfuncoid, collation,
+ selec = eqjoinsel_semi(opfuncoid, collation,
+ true,
&vardata2, &vardata1,
nd2, nd1,
isdefault2, isdefault1,
@@ -2414,7 +2414,6 @@ eqjoinsel(PG_FUNCTION_ARGS)
stats2, stats1,
have_mcvs2, have_mcvs1,
inner_rel);
- }
/*
* We should never estimate the output of a semijoin to be more
@@ -2499,6 +2498,7 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
hasmatch2 = (bool *) palloc0(sslot2->nvalues * sizeof(bool));
eqjoinsel_find_matches(opfuncoid, collation,
+ false,
sslot1, sslot2,
sslot1->nvalues, sslot2->nvalues,
hasmatch1, hasmatch2,
@@ -2607,11 +2607,13 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
* eqjoinsel_semi --- eqjoinsel for semi join
*
* (Also used for anti join, which we are supposed to estimate the same way.)
- * Caller has ensured that vardata1 is the LHS variable.
- * Unlike eqjoinsel_inner, we have to cope with opfuncoid being InvalidOid.
+ * Caller has ensured that vardata1 is the LHS variable; however, opfuncoid
+ * is for the original join operator, which might now need to have the inputs
+ * swapped in order to apply correctly.
*/
static double
eqjoinsel_semi(Oid opfuncoid, Oid collation,
+ bool op_is_reversed,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
bool isdefault1, bool isdefault2,
@@ -2655,7 +2657,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
isdefault2 = false;
}
- if (have_mcvs1 && have_mcvs2 && OidIsValid(opfuncoid))
+ if (have_mcvs1 && have_mcvs2)
{
/*
* We have most-common-value lists for both relations. Run through
@@ -2690,6 +2692,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
hasmatch2 = (bool *) palloc0(clamped_nvalues2 * sizeof(bool));
eqjoinsel_find_matches(opfuncoid, collation,
+ op_is_reversed,
sslot1, sslot2,
sslot1->nvalues, clamped_nvalues2,
hasmatch1, hasmatch2,
@@ -2762,8 +2765,9 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
* Identify matching MCVs for eqjoinsel_inner or eqjoinsel_semi.
*
* Inputs:
- * opfuncoid: OID of equality function to use (might be cross-type)
+ * opfuncoid: OID of equality function to use (might be reversed)
* collation: OID of collation to use
+ * op_is_reversed: indicates that opfuncoid compares right type to left type
* sslot1, sslot2: MCV values for the lefthand and righthand inputs
* nvalues1, nvalues2: number of values to be considered (can be less than
* sslotN->nvalues, but not more)
@@ -2781,6 +2785,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
*/
static void
eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+ bool op_is_reversed,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
int nvalues1, int nvalues2,
bool *hasmatch1, bool *hasmatch2,
@@ -2809,9 +2814,24 @@ eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
* from eqjoinsel_inner.
*/
{
+ int index1,
+ index2;
+
+ /* Set up to supply the values in the order the operator expects */
+ if (op_is_reversed)
+ {
+ index1 = 1;
+ index2 = 0;
+ }
+ else
+ {
+ index1 = 0;
+ index2 = 1;
+ }
+
for (int i = 0; i < nvalues1; i++)
{
- fcinfo->args[0].value = sslot1->values[i];
+ fcinfo->args[index1].value = sslot1->values[i];
for (int j = 0; j < nvalues2; j++)
{
@@ -2819,7 +2839,7 @@ eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
if (hasmatch2[j])
continue;
- fcinfo->args[1].value = sslot2->values[j];
+ fcinfo->args[index2].value = sslot2->values[j];
fcinfo->isnull = false;
fresult = FunctionCallInvoke(fcinfo);
if (!fcinfo->isnull && DatumGetBool(fresult))
--
2.43.7
From f9b45278ee55e03d363b45d8bce518e7695fb5f4 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 13 Nov 2025 12:59:21 -0500
Subject: [PATCH v5 4/5] Share more work between eqjoinsel_inner and
eqjoinsel_semi.
Originally, only one of eqjoinsel_inner and eqjoinsel_semi was
invoked per eqjoinsel call, so the fact that they duplicated a
good deal of work was irrelevant to performance. But since commit
a314c3407, the semi/antijoin case calls both, and that is really
expensive if there are a lot of MCVs to match. Refactor so that
we can re-use eqjoinsel_inner's matching results except in the
(uncommon) case where eqjoinsel_semi clamps the RHS MCV list size
because it's less than the expected number of rows to be fetched
from the RHS rel. This doesn't seem to create any performance
penalty for non-semijoin cases.
While at it, we can avoid doing fmgr_info twice too.
I considered also avoiding duplicate InitFunctionCallInfoData
calls, but desisted: that wouldn't save very much, and in my
tests it looks like there may be some performance advantage
if fcinfo is a local variable.
---
src/backend/utils/adt/selfuncs.c | 115 +++++++++++++++++++------------
1 file changed, 72 insertions(+), 43 deletions(-)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53653d2d05b..b4ed12a3737 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -148,14 +148,15 @@ get_relation_stats_hook_type get_relation_stats_hook = NULL;
get_index_stats_hook_type get_index_stats_hook = NULL;
static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
-static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
+static double eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
bool isdefault1, bool isdefault2,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
Form_pg_statistic stats1, Form_pg_statistic stats2,
- bool have_mcvs1, bool have_mcvs2);
-static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
+ bool have_mcvs1, bool have_mcvs2,
+ bool *hasmatch1, bool *hasmatch2);
+static double eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
bool op_is_reversed,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
@@ -163,8 +164,9 @@ static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
Form_pg_statistic stats1, Form_pg_statistic stats2,
bool have_mcvs1, bool have_mcvs2,
+ bool *hasmatch1, bool *hasmatch2,
RelOptInfo *inner_rel);
-static void eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+static void eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
bool op_is_reversed,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
int nvalues1, int nvalues2,
@@ -2311,12 +2313,15 @@ eqjoinsel(PG_FUNCTION_ARGS)
bool isdefault1;
bool isdefault2;
Oid opfuncoid;
+ FmgrInfo eqproc;
AttStatsSlot sslot1;
AttStatsSlot sslot2;
Form_pg_statistic stats1 = NULL;
Form_pg_statistic stats2 = NULL;
bool have_mcvs1 = false;
bool have_mcvs2 = false;
+ bool *hasmatch1 = NULL;
+ bool *hasmatch2 = NULL;
bool get_mcv_stats;
bool join_is_reversed;
RelOptInfo *inner_rel;
@@ -2367,14 +2372,25 @@ eqjoinsel(PG_FUNCTION_ARGS)
ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS);
}
+ /* Prepare info usable by both eqjoinsel_inner and eqjoinsel_semi */
+ if (have_mcvs1 && have_mcvs2)
+ {
+ fmgr_info(opfuncoid, &eqproc);
+ hasmatch1 = (bool *) palloc0(sslot1.nvalues * sizeof(bool));
+ hasmatch2 = (bool *) palloc0(sslot2.nvalues * sizeof(bool));
+ }
+ else
+ memset(&eqproc, 0, sizeof(eqproc)); /* silence uninit-var warnings */
+
/* We need to compute the inner-join selectivity in all cases */
- selec_inner = eqjoinsel_inner(opfuncoid, collation,
+ selec_inner = eqjoinsel_inner(&eqproc, collation,
&vardata1, &vardata2,
nd1, nd2,
isdefault1, isdefault2,
&sslot1, &sslot2,
stats1, stats2,
- have_mcvs1, have_mcvs2);
+ have_mcvs1, have_mcvs2,
+ hasmatch1, hasmatch2);
switch (sjinfo->jointype)
{
@@ -2395,7 +2411,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
inner_rel = find_join_input_rel(root, sjinfo->min_righthand);
if (!join_is_reversed)
- selec = eqjoinsel_semi(opfuncoid, collation,
+ selec = eqjoinsel_semi(&eqproc, collation,
false,
&vardata1, &vardata2,
nd1, nd2,
@@ -2403,9 +2419,10 @@ eqjoinsel(PG_FUNCTION_ARGS)
&sslot1, &sslot2,
stats1, stats2,
have_mcvs1, have_mcvs2,
+ hasmatch1, hasmatch2,
inner_rel);
else
- selec = eqjoinsel_semi(opfuncoid, collation,
+ selec = eqjoinsel_semi(&eqproc, collation,
true,
&vardata2, &vardata1,
nd2, nd1,
@@ -2413,6 +2430,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
&sslot2, &sslot1,
stats2, stats1,
have_mcvs2, have_mcvs1,
+ hasmatch2, hasmatch1,
inner_rel);
/*
@@ -2441,6 +2459,11 @@ eqjoinsel(PG_FUNCTION_ARGS)
ReleaseVariableStats(vardata1);
ReleaseVariableStats(vardata2);
+ if (hasmatch1)
+ pfree(hasmatch1);
+ if (hasmatch2)
+ pfree(hasmatch2);
+
CLAMP_PROBABILITY(selec);
PG_RETURN_FLOAT8((float8) selec);
@@ -2449,17 +2472,22 @@ eqjoinsel(PG_FUNCTION_ARGS)
/*
* eqjoinsel_inner --- eqjoinsel for normal inner join
*
+ * In addition to computing the selectivity estimate, this will fill
+ * the hasmatch1[] and hasmatch2[] arrays (if have_mcvs1 && have_mcvs2).
+ * We may be able to re-use that data in eqjoinsel_semi.
+ *
* We also use this for LEFT/FULL outer joins; it's not presently clear
* that it's worth trying to distinguish them here.
*/
static double
-eqjoinsel_inner(Oid opfuncoid, Oid collation,
+eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
bool isdefault1, bool isdefault2,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
Form_pg_statistic stats1, Form_pg_statistic stats2,
- bool have_mcvs1, bool have_mcvs2)
+ bool have_mcvs1, bool have_mcvs2,
+ bool *hasmatch1, bool *hasmatch2)
{
double selec;
@@ -2477,8 +2505,6 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
* results", Technical Report 1018, Computer Science Dept., University
* of Wisconsin, Madison, March 1991 (available from ftp.cs.wisc.edu).
*/
- bool *hasmatch1;
- bool *hasmatch2;
double nullfrac1 = stats1->stanullfrac;
double nullfrac2 = stats2->stanullfrac;
double matchprodfreq,
@@ -2493,11 +2519,8 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
int i,
nmatches;
- /* Construct the match arrays */
- hasmatch1 = (bool *) palloc0(sslot1->nvalues * sizeof(bool));
- hasmatch2 = (bool *) palloc0(sslot2->nvalues * sizeof(bool));
-
- eqjoinsel_find_matches(opfuncoid, collation,
+ /* Fill the match arrays */
+ eqjoinsel_find_matches(eqproc, collation,
false,
sslot1, sslot2,
sslot1->nvalues, sslot2->nvalues,
@@ -2526,8 +2549,6 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
}
CLAMP_PROBABILITY(matchfreq2);
CLAMP_PROBABILITY(unmatchfreq2);
- pfree(hasmatch1);
- pfree(hasmatch2);
/*
* Compute total frequency of non-null values that are not in the MCV
@@ -2607,12 +2628,13 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
* eqjoinsel_semi --- eqjoinsel for semi join
*
* (Also used for anti join, which we are supposed to estimate the same way.)
- * Caller has ensured that vardata1 is the LHS variable; however, opfuncoid
+ * Caller has ensured that vardata1 is the LHS variable; however, eqproc
* is for the original join operator, which might now need to have the inputs
- * swapped in order to apply correctly.
+ * swapped in order to apply correctly. Also, if have_mcvs1 && have_mcvs2
+ * then hasmatch1[] and hasmatch2[] were filled by eqjoinsel_inner.
*/
static double
-eqjoinsel_semi(Oid opfuncoid, Oid collation,
+eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
bool op_is_reversed,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
@@ -2620,6 +2642,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
Form_pg_statistic stats1, Form_pg_statistic stats2,
bool have_mcvs1, bool have_mcvs2,
+ bool *hasmatch1, bool *hasmatch2,
RelOptInfo *inner_rel)
{
double selec;
@@ -2667,8 +2690,6 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
* lists. We still have to estimate for the remaining population, but
* in a skewed distribution this gives us a big leg up in accuracy.
*/
- bool *hasmatch1;
- bool *hasmatch2;
double nullfrac1 = stats1->stanullfrac;
double matchprodfreq,
matchfreq1,
@@ -2687,16 +2708,29 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
*/
clamped_nvalues2 = Min(sslot2->nvalues, nd2);
- /* Construct the match arrays */
- hasmatch1 = (bool *) palloc0(sslot1->nvalues * sizeof(bool));
- hasmatch2 = (bool *) palloc0(clamped_nvalues2 * sizeof(bool));
-
- eqjoinsel_find_matches(opfuncoid, collation,
- op_is_reversed,
- sslot1, sslot2,
- sslot1->nvalues, clamped_nvalues2,
- hasmatch1, hasmatch2,
- &nmatches, &matchprodfreq);
+ /*
+ * If we did not set clamped_nvalues2 to less than sslot2->nvalues,
+ * then the hasmatch1[] and hasmatch2[] match flags computed by
+ * eqjoinsel_inner are still perfectly applicable, so we need not
+ * re-do the matching work. Note that it does not matter if
+ * op_is_reversed: we'd get the same answers.
+ *
+ * If we did clamp, then a different set of sslot2 values is to be
+ * compared, so we have to re-do the matching.
+ */
+ if (clamped_nvalues2 != sslot2->nvalues)
+ {
+ /* Must re-zero the arrays */
+ memset(hasmatch1, 0, sslot1->nvalues * sizeof(bool));
+ memset(hasmatch2, 0, clamped_nvalues2 * sizeof(bool));
+ /* Re-fill the match arrays */
+ eqjoinsel_find_matches(eqproc, collation,
+ op_is_reversed,
+ sslot1, sslot2,
+ sslot1->nvalues, clamped_nvalues2,
+ hasmatch1, hasmatch2,
+ &nmatches, &matchprodfreq);
+ }
/* Sum up frequencies of matched MCVs */
matchfreq1 = 0.0;
@@ -2706,8 +2740,6 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
matchfreq1 += sslot1->numbers[i];
}
CLAMP_PROBABILITY(matchfreq1);
- pfree(hasmatch1);
- pfree(hasmatch2);
/*
* Now we need to estimate the fraction of relation 1 that has at
@@ -2765,9 +2797,9 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
* Identify matching MCVs for eqjoinsel_inner or eqjoinsel_semi.
*
* Inputs:
- * opfuncoid: OID of equality function to use (might be reversed)
+ * eqproc: FmgrInfo for equality function to use (might be reversed)
* collation: OID of collation to use
- * op_is_reversed: indicates that opfuncoid compares right type to left type
+ * op_is_reversed: indicates that eqproc compares right type to left type
* sslot1, sslot2: MCV values for the lefthand and righthand inputs
* nvalues1, nvalues2: number of values to be considered (can be less than
* sslotN->nvalues, but not more)
@@ -2784,7 +2816,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
* math wouldn't add up...
*/
static void
-eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
bool op_is_reversed,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
int nvalues1, int nvalues2,
@@ -2792,18 +2824,15 @@ eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
int *p_nmatches, double *p_matchprodfreq)
{
LOCAL_FCINFO(fcinfo, 2);
- FmgrInfo eqproc;
double matchprodfreq = 0.0;
int nmatches = 0;
- fmgr_info(opfuncoid, &eqproc);
-
/*
* Save a few cycles by setting up the fcinfo struct just once. Using
* FunctionCallInvoke directly also avoids failure if the eqproc returns
* NULL, though really equality functions should never do that.
*/
- InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
+ InitFunctionCallInfoData(*fcinfo, eqproc, 2, collation,
NULL, NULL);
fcinfo->args[0].isnull = false;
fcinfo->args[1].isnull = false;
--
2.43.7
From 331e8a35118699cb6d4076aa65ddbfa44a905665 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 13 Nov 2025 16:55:43 -0500
Subject: [PATCH v5 5/5] Use hashing to avoid O(N^2) matching work in
eqjoinsel.
Use a simplehash hash table if there are enough MCVs and the
join operator has associated hash functions. The threshold
for switching to hash mode perhaps could use more research.
---
src/backend/utils/adt/selfuncs.c | 245 ++++++++++++++++++++++++++++++-
src/tools/pgindent/typedefs.list | 3 +
2 files changed, 243 insertions(+), 5 deletions(-)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index b4ed12a3737..a68d5423874 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -143,12 +143,47 @@
#define DEFAULT_PAGE_CPU_MULTIPLIER 50.0
+/*
+ * In production builds, switch to hash-based MCV matching when lists are
+ * large enough to amortize hash setup cost. In debug builds, we use a
+ * smaller threshold so that the regression tests cover both paths well.
+ */
+#ifndef USE_ASSERT_CHECKING
+#define EQJOINSEL_MCV_HASH_THRESHOLD 100
+#else
+#define EQJOINSEL_MCV_HASH_THRESHOLD 10
+#endif
+
+/* Entries in the simplehash hash table used by eqjoinsel_find_matches */
+typedef struct McvHashEntry
+{
+ Datum value; /* the value represented by this entry */
+ int index; /* its index in the relevant AttStatsSlot */
+ uint32 hash; /* hash code for the Datum */
+ char status; /* status code used by simplehash.h */
+} McvHashEntry;
+
+/* private_data for the simplehash hash table */
+typedef struct McvHashContext
+{
+ FunctionCallInfo equal_fcinfo; /* the equality join operator */
+ FunctionCallInfo hash_fcinfo; /* the hash function to use */
+ bool op_is_reversed; /* equality compares hash type to probe type */
+ bool insert_mode; /* doing inserts or lookups? */
+ bool hash_typbyval; /* typbyval of hashed data type */
+ int16 hash_typlen; /* typlen of hashed data type */
+} McvHashContext;
+
+/* forward reference */
+typedef struct McvHashTable_hash McvHashTable_hash;
+
/* Hooks for plugins to get control when we ask for stats */
get_relation_stats_hook_type get_relation_stats_hook = NULL;
get_index_stats_hook_type get_index_stats_hook = NULL;
static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
static double eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
+ Oid hashLeft, Oid hashRight,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
bool isdefault1, bool isdefault2,
@@ -157,6 +192,7 @@ static double eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
bool have_mcvs1, bool have_mcvs2,
bool *hasmatch1, bool *hasmatch2);
static double eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
+ Oid hashLeft, Oid hashRight,
bool op_is_reversed,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
@@ -167,11 +203,14 @@ static double eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
bool *hasmatch1, bool *hasmatch2,
RelOptInfo *inner_rel);
static void eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
+ Oid hashLeft, Oid hashRight,
bool op_is_reversed,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
int nvalues1, int nvalues2,
bool *hasmatch1, bool *hasmatch2,
int *p_nmatches, double *p_matchprodfreq);
+static uint32 hash_mcv(McvHashTable_hash *tab, Datum key);
+static bool mcvs_equal(McvHashTable_hash *tab, Datum key0, Datum key1);
static bool estimate_multivariate_ndistinct(PlannerInfo *root,
RelOptInfo *rel, List **varinfos, double *ndistinct);
static bool convert_to_scalar(Datum value, Oid valuetypid, Oid collid,
@@ -227,6 +266,20 @@ static RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
static double btcost_correlation(IndexOptInfo *index,
VariableStatData *vardata);
+/* Define support routines for MCV hash tables */
+#define SH_PREFIX McvHashTable
+#define SH_ELEMENT_TYPE McvHashEntry
+#define SH_KEY_TYPE Datum
+#define SH_KEY value
+#define SH_HASH_KEY(tab,key) hash_mcv(tab, key)
+#define SH_EQUAL(tab,key0,key1) mcvs_equal(tab, key0, key1)
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tab,ent) (ent)->hash
+#define SH_DEFINE
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
/*
* eqsel - Selectivity of "=" for any data types.
@@ -2314,6 +2367,8 @@ eqjoinsel(PG_FUNCTION_ARGS)
bool isdefault2;
Oid opfuncoid;
FmgrInfo eqproc;
+ Oid hashLeft = InvalidOid;
+ Oid hashRight = InvalidOid;
AttStatsSlot sslot1;
AttStatsSlot sslot2;
Form_pg_statistic stats1 = NULL;
@@ -2378,12 +2433,20 @@ eqjoinsel(PG_FUNCTION_ARGS)
fmgr_info(opfuncoid, &eqproc);
hasmatch1 = (bool *) palloc0(sslot1.nvalues * sizeof(bool));
hasmatch2 = (bool *) palloc0(sslot2.nvalues * sizeof(bool));
+
+ /*
+ * If the MCV lists are long enough to justify hashing, try to look up
+ * hash functions for the join operator. XXX should this be Max()?
+ */
+ if (Min(sslot1.nvalues, sslot2.nvalues) >= EQJOINSEL_MCV_HASH_THRESHOLD)
+ (void) get_op_hash_functions(operator, &hashLeft, &hashRight);
}
else
memset(&eqproc, 0, sizeof(eqproc)); /* silence uninit-var warnings */
/* We need to compute the inner-join selectivity in all cases */
selec_inner = eqjoinsel_inner(&eqproc, collation,
+ hashLeft, hashRight,
&vardata1, &vardata2,
nd1, nd2,
isdefault1, isdefault2,
@@ -2412,6 +2475,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
if (!join_is_reversed)
selec = eqjoinsel_semi(&eqproc, collation,
+ hashLeft, hashRight,
false,
&vardata1, &vardata2,
nd1, nd2,
@@ -2423,6 +2487,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
inner_rel);
else
selec = eqjoinsel_semi(&eqproc, collation,
+ hashLeft, hashRight,
true,
&vardata2, &vardata1,
nd2, nd1,
@@ -2481,6 +2546,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
*/
static double
eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
+ Oid hashLeft, Oid hashRight,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
bool isdefault1, bool isdefault2,
@@ -2521,6 +2587,7 @@ eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
/* Fill the match arrays */
eqjoinsel_find_matches(eqproc, collation,
+ hashLeft, hashRight,
false,
sslot1, sslot2,
sslot1->nvalues, sslot2->nvalues,
@@ -2635,6 +2702,7 @@ eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
*/
static double
eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
+ Oid hashLeft, Oid hashRight,
bool op_is_reversed,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
@@ -2725,6 +2793,7 @@ eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
memset(hasmatch2, 0, clamped_nvalues2 * sizeof(bool));
/* Re-fill the match arrays */
eqjoinsel_find_matches(eqproc, collation,
+ hashLeft, hashRight,
op_is_reversed,
sslot1, sslot2,
sslot1->nvalues, clamped_nvalues2,
@@ -2799,6 +2868,8 @@ eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
* Inputs:
* eqproc: FmgrInfo for equality function to use (might be reversed)
* collation: OID of collation to use
+ * hashLeft, hashRight: OIDs of hash functions associated with equality op,
+ * or InvalidOid if we're not to use hashing
* op_is_reversed: indicates that eqproc compares right type to left type
* sslot1, sslot2: MCV values for the lefthand and righthand inputs
* nvalues1, nvalues2: number of values to be considered (can be less than
@@ -2810,6 +2881,9 @@ eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
* *p_matchprodfreq: receives sum(sslot1->numbers[i] * sslot2->numbers[j])
* for matching MCVs
*
+ * Note that hashLeft is for the eqproc's left-hand input type, hashRight
+ * for its right, regardless of op_is_reversed.
+ *
* Note we assume that each MCV will match at most one member of the other
* MCV list. If the operator isn't really equality, there could be multiple
* matches --- but we don't look for them, both for speed and because the
@@ -2817,6 +2891,7 @@ eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
*/
static void
eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
+ Oid hashLeft, Oid hashRight,
bool op_is_reversed,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
int nvalues1, int nvalues2,
@@ -2837,12 +2912,111 @@ eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
fcinfo->args[0].isnull = false;
fcinfo->args[1].isnull = false;
- /*
- * The reason for this extra level of braces will become apparent later.
- * For now, it just prevents having to re-indent this chunk of code moved
- * from eqjoinsel_inner.
- */
+ if (OidIsValid(hashLeft) && OidIsValid(hashRight))
+ {
+ /* Use a hash table to speed up the matching */
+ LOCAL_FCINFO(hash_fcinfo, 1);
+ FmgrInfo hash_proc;
+ McvHashContext hashContext;
+ McvHashTable_hash *hashTable;
+ AttStatsSlot *statsProbe;
+ AttStatsSlot *statsHash;
+ bool *hasMatchProbe;
+ bool *hasMatchHash;
+ int nvaluesProbe;
+ int nvaluesHash;
+
+ /* Make sure we build the hash table on the smaller array. */
+ if (sslot1->nvalues >= sslot2->nvalues)
+ {
+ statsProbe = sslot1;
+ statsHash = sslot2;
+ hasMatchProbe = hasmatch1;
+ hasMatchHash = hasmatch2;
+ nvaluesProbe = nvalues1;
+ nvaluesHash = nvalues2;
+ }
+ else
+ {
+ /* We'll have to reverse the direction of use of the operator. */
+ op_is_reversed = !op_is_reversed;
+ statsProbe = sslot2;
+ statsHash = sslot1;
+ hasMatchProbe = hasmatch2;
+ hasMatchHash = hasmatch1;
+ nvaluesProbe = nvalues2;
+ nvaluesHash = nvalues1;
+ }
+
+ /*
+ * Build the hash table on the smaller array, using the appropriate
+ * hash function for its data type.
+ */
+ fmgr_info(op_is_reversed ? hashLeft : hashRight, &hash_proc);
+ InitFunctionCallInfoData(*hash_fcinfo, &hash_proc, 1, collation,
+ NULL, NULL);
+ hash_fcinfo->args[0].isnull = false;
+
+ hashContext.equal_fcinfo = fcinfo;
+ hashContext.hash_fcinfo = hash_fcinfo;
+ hashContext.op_is_reversed = op_is_reversed;
+ hashContext.insert_mode = true;
+ get_typlenbyval(statsHash->valuetype,
+ &hashContext.hash_typlen,
+ &hashContext.hash_typbyval);
+
+ hashTable = McvHashTable_create(CurrentMemoryContext,
+ nvaluesHash,
+ &hashContext);
+
+ for (int i = 0; i < nvaluesHash; i++)
+ {
+ bool found = false;
+ McvHashEntry *entry = McvHashTable_insert(hashTable,
+ statsHash->values[i],
+ &found);
+
+ Assert(!found); /* XXX seems possibly unsafe */
+ entry->index = i;
+ }
+
+ /*
+ * Prepare to probe the hash table. If the probe values are of a
+ * different data type, then we need to change hash functions. (This
+ * code relies on the assumption that since we defined SH_STORE_HASH,
+ * simplehash.h will never need to compute hash values for existing
+ * hash table entries.)
+ */
+ hashContext.insert_mode = false;
+ if (hashLeft != hashRight)
+ {
+ fmgr_info(op_is_reversed ? hashRight : hashLeft, &hash_proc);
+ /* Resetting hash_fcinfo is probably unnecessary, but be safe */
+ InitFunctionCallInfoData(*hash_fcinfo, &hash_proc, 1, collation,
+ NULL, NULL);
+ hash_fcinfo->args[0].isnull = false;
+ }
+
+ /* Look up each probe value in turn. */
+ for (int i = 0; i < nvaluesProbe; i++)
+ {
+ McvHashEntry *entry = McvHashTable_lookup(hashTable,
+ statsProbe->values[i]);
+
+ /* As in the other code path, skip already-matched hash entries */
+ if (entry != NULL && !hasMatchHash[entry->index])
+ {
+ hasMatchHash[entry->index] = hasMatchProbe[i] = true;
+ nmatches++;
+ matchprodfreq += statsHash->numbers[entry->index] * statsProbe->numbers[i];
+ }
+ }
+
+ McvHashTable_destroy(hashTable);
+ }
+ else
{
+ /* We're not to use hashing, so do it the O(N^2) way */
int index1,
index2;
@@ -2886,6 +3060,67 @@ eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
*p_matchprodfreq = matchprodfreq;
}
+/*
+ * Support functions for the hash tables used by eqjoinsel_find_matches
+ */
+static uint32
+hash_mcv(McvHashTable_hash *tab, Datum key)
+{
+ McvHashContext *context = (McvHashContext *) tab->private_data;
+ FunctionCallInfo fcinfo = context->hash_fcinfo;
+ Datum fresult;
+
+ fcinfo->args[0].value = key;
+ fcinfo->isnull = false;
+ fresult = FunctionCallInvoke(fcinfo);
+ Assert(!fcinfo->isnull);
+ return DatumGetUInt32(fresult);
+}
+
+static bool
+mcvs_equal(McvHashTable_hash *tab, Datum key0, Datum key1)
+{
+ McvHashContext *context = (McvHashContext *) tab->private_data;
+
+ if (context->insert_mode)
+ {
+ /*
+ * During the insertion step, any comparisons will be between two
+ * Datums of the hash table's data type, so if the given operator is
+ * cross-type it will be the wrong thing to use. Fortunately, we can
+ * use datum_image_eq instead. The MCV values should all be distinct
+ * anyway, so it's mostly pro-forma to compare them at all.
+ */
+ return datum_image_eq(key0, key1,
+ context->hash_typbyval, context->hash_typlen);
+ }
+ else
+ {
+ FunctionCallInfo fcinfo = context->equal_fcinfo;
+ Datum fresult;
+
+ /*
+ * Apply the operator the correct way around. Although simplehash.h
+ * doesn't document this explicitly, during lookups key0 is from the
+ * hash table while key1 is the probe value, so we should compare them
+ * in that order only if op_is_reversed.
+ */
+ if (context->op_is_reversed)
+ {
+ fcinfo->args[0].value = key0;
+ fcinfo->args[1].value = key1;
+ }
+ else
+ {
+ fcinfo->args[0].value = key1;
+ fcinfo->args[1].value = key0;
+ }
+ fcinfo->isnull = false;
+ fresult = FunctionCallInvoke(fcinfo);
+ return (!fcinfo->isnull && DatumGetBool(fresult));
+ }
+}
+
/*
* neqjoinsel - Join selectivity of "!="
*/
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 23bce72ae64..ac5c6ba9833 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1666,6 +1666,9 @@ ManyTestResourceKind
Material
MaterialPath
MaterialState
+McvHashContext
+McvHashEntry
+McvHashTable_hash
MdPathStr
MdfdVec
Memoize
--
2.43.7
Hi Ilia! On 13.10.2025 12:08, Ilia Evdokimov wrote: > > On 17.09.2025 12:40, Ilia Evdokimov wrote: >> Hi David, >> >> In v2 patch, when the join is reversed we pass the commutator operator >> Oid to eqjoinsel_semi(), and inside that function we immediately call >> get_opcode(<commutator operator Oid>). Did you mean for the function >> to take an operator Oid instead of an here? >> >> If that was unintentional, perhaps the cleanest fix is to add a new >> 'operator' parameter to eqjoinsel_semi() so we can keep passing >> 'opfuncoid' as before and avoid changing the behavior. >> > > This v3 patch fixes the confusion between operator and function Oids in > eqjoinsel_semi(). This version restores the previous behavior by keeping > the function Oid as before and adds an explicit 'operator' parameter so > both values are available without extra behavior changes. > > Do you have any further comments or suggestions on this version? > I'm sorry for missing your email with the test results. I'll read up on it as well as the v3 patch early next week and reply. -- David Geier
This didn't look very much like the refactorization that I had in mind: I thought we should have one copy of the matching code, not two. Also, after looking closer at your patch I realized you were just punting for cross-type comparison operators, which I felt was kind of sad. It's a little bit tricky to get simplehash.h to go along with cross-type hashing, because it wants to use just one hash and one equality function. But since those are interface routines we are going to supply anyway, we can make them deal with the insert and lookup cases differently.
I had considered the cross-type comparison operators and I didn’t see a clean way to support them, so I intentionally excluded cross-type cases from hash probing. Your suggestion to switch the hash function in probe mode is clearly a more correct approach than simply rejecting those cases. Thanks for the explanation.
So after a bit of hacking I ended up with the attached. I split up the refactorization into several steps to make it easier to review. (But I'd anticipate squashing these into one commit in the end, so I didn't spend a lot of time on the commit messages.)
I reviewed patches 0002-0004 with the refactoring, and I think the overall approach is excellent. However, I noticed one issue: in eqjoinsel_semi() the variable 'nmatches' is not initialized and can lead to undefined behavior when clamped_nvalues2 == sslot2->nvalues. Before the refactoring it was initialized by zero.
Also, 0001 in this series is not meant to be committed; what it does is to add some debug logging to ease comparing runtimes of different versions of eqjoinsel. I was able to use that to convince myself that the refactoring steps didn't cost anything meaningful in performance. Perhaps we could use it to investigate the right hashing threshold more carefully, too.
With 0001 patch I tested the selectivity calculation time for SEMI JOIN after applying patches 0002-0004, and the time was cut in half. Thank you for the work on that.
There are still a couple of XXX comments in the attached, denoting
loose ends to look at. In particular, I wondered whether the
hash threshold check
if (Min(sslot1.nvalues, sslot2.nvalues) >= EQJOINSEL_MCV_HASH_THRESHOLD)
should use Max() instead --- that is, it might be safer to hash
if either MCV list is long. Or, holding one's head at a different
angle, perhaps the sum of the list lengths should be what's checked?
regards, tom lane
Hmm… using the sum actually seems like a good idea for me. It may provide a smoother switch-over point between the two MCV-scanning algorithms when both list lengths are below the threshold. But this definitely needs to be validated by measuring different MCV lengths below the threshold using the 0001 patch.
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/
Hi Ilia! On 13.10.2025 12:08, Ilia Evdokimov wrote: > > On 17.09.2025 12:40, Ilia Evdokimov wrote: >> In v2 patch, when the join is reversed we pass the commutator operator >> Oid to eqjoinsel_semi(), and inside that function we immediately call >> get_opcode(<commutator operator Oid>). Did you mean for the function >> to take an operator Oid instead of an here? >> >> If that was unintentional, perhaps the cleanest fix is to add a new >> 'operator' parameter to eqjoinsel_semi() so we can keep passing >> 'opfuncoid' as before and avoid changing the behavior. >> > > This v3 patch fixes the confusion between operator and function Oids in > eqjoinsel_semi(). This version restores the previous behavior by keeping > the function Oid as before and adds an explicit 'operator' parameter so > both values are available without extra behavior changes. > > Do you have any further comments or suggestions on this version? I believe it doesn't matter. Because in try_eqjoinsel_with_hashtable() we bail if the hash function of the LHS and the RHS of the equality operator is not the same. Hence, if we use the commutator operator or not doesn't matter. But maybe I'm overlooking something. Can you provide an example that fails without your change? If you can, we could add that also to the regression tests. Beyond that everything looks good to me. The regression tests also passed for me. -- David Geier
Hi Ilia! On 16.09.2025 17:52, Ilia Evdokimov wrote: > Hi hackers, > > On 10.09.2025 16:56, Ilia Evdokimov wrote: >> Unfortunately, the JOB benchmark does not contain semi join nodes. >> However, TPC-DS does. I'll look for the queries with slowest planner >> times there and check them. >> >> I'll need some time to check both join and semi join cases with small >> and large default_statistics_target. I'll share the results later. > > JOIN > ============================== > > I’ve benchmarked the new implementation of eqjoinsel() with different > values of default_statistics_target. On small targets (1, 5, 10, 25, 50, > 75, 100) the results are all within statistical noise, and I did not > observe any regressions. In my view, it’s reasonable to keep the current > condition that the hash table is not used for default_statistics_target > = 1. Raising that threshold does not seem useful. > > Here are the results for JOB queries (where the effect of semi join is > not visible due to different data distributions): > > default_statistics_target | Planner Speedup (×) | Planner Before (ms) | > Planner After (ms) > ------------------------------------------------------------------------------------------ > 1 | 1.00 | 1846.643 | > 1847.409 > 5 | 1.00 | 1836.391 | > 1828.318 > 10 | 0.95 | 1841.750 | > 1929.722 > 25 | 0.99 | 1873.172 | > 1890.741 > 50 | 0.98 | 1869.897 | > 1898.470 > 75 | 1.02 | 1969.368 | > 1929.521 > 100 | 0.97 | 1857.890 | > 1921.207 > 1000 | 1.14 | 2279.700 | > 1997.102 > 2500 | 1.78 | 4682.658 | > 2636.202 > 5000 | 6.45 | 15943.696 | > 2471.242 > 7500 | 12.45 | 34350.855 | > 2758.565 > 10000 | 20.52 | 62519.342 | > 3046.819 > Good that we've confirmed that. > SEMI JOIN > ============================== > > Unfortunately, in TPC-DS it is not possible to clearly see improvements > for semi joins. To address this, I designed a synthetic example where > the data distribution forces the loop to run fully, without exiting > early, which makes the effect on semi joins more visible. In this setup, > I also ensured that the length of the MCV array is equal to the chosen > default_statistics_target. > > CREATE TABLE t1 AS > SELECT CASE > WHEN g <= 3000000 * 0.9 THEN (g % 10000) + 1 > ELSE (g % 1000000) + 10000 > END AS id > FROM generate_series(1, 3000000) g; > > CREATE TABLE t2 AS > SELECT CASE > WHEN g <= 3000000 * 0.9 THEN (g % 10000) + 10001 > ELSE (g % 1000000) + 20000 > END AS id > FROM generate_series(1, 3000000) g; > > ANALYZE t1, t2; > > The results of the query are: > > SELECT * FROM t1 > WHERE id IN (SELECT id FROM t2); > > default_statistics_target | Planner Speedup (×) | Planner Before (ms) | > Planner After (ms) > ------------------------------------------------------------------------------------------ > 1 | 1.12 | 1.191 | > 1.062 > 5 | 1.02 | 0.493 | > 0.481 > 10 | 0.92 | 0.431 | > 0.471 > 25 | 1.27 | 0.393 | > 0.309 > 50 | 1.04 | 0.432 | > 0.416 > 75 | 0.96 | 0.398 | > 0.415 > 100 | 0.95 | 0.450 | > 0.473 > 1000 | 9.42 | 6.742 | > 0.716 > 2500 | 19.15 | 21.621 | > 1.129 > 5000 | 46.74 | 85.667 | > 1.833 > 7500 | 73.26 | 194.806 | > 2.659 > 10000 | 107.95 | 349.981 | > 3.242 > That's some decent speedups, considering that it's planning time. Thanks for testing the code! -- David Geier
Hi David, It looks like there are some technical issues, but Tom and I are currently discussing version 5 of the patches. Here is the link to the ongoing discussion [0]. If you have any suggestions or feedback about the patches, feel free to share them. [0]: https://www.postgresql.org/message-id/3026409.1763072514%40sss.pgh.pa.us -- Best regards, Ilia Evdokimov, Tantor Labs LLC, https://tantorlabs.com/
Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes:
> On 14.11.2025 01:21, Tom Lane wrote:
>> So after a bit of hacking I ended up with the attached. I split up
>> the refactorization into several steps to make it easier to review.
> I reviewed patches 0002-0004 with the refactoring, and I think the
> overall approach is excellent. However, I noticed one issue: in
> eqjoinsel_semi() the variable 'nmatches' is not initialized and can lead
> to undefined behavior when clamped_nvalues2 == sslot2->nvalues. Before
> the refactoring it was initialized by zero.
Argh! Can't believe I missed that.
I experimented with combining all of eqjoinsel_find_matches' outputs
into one struct, but decided that that was uglier than just adding one
more pass-by-reference argument to eqjoinsel_inner/semi.
>> There are still a couple of XXX comments in the attached, denoting
>> loose ends to look at.
In the attached v6, I cleaned up one of the XXX items, deciding that
duplicate entries in the hash table should be coped with not asserted
against. The reason is that we might be working with a comparison
operator that is not exactly the one used to build the MCV list:
the planner is usually pretty cavalier about applying stats that are
only fuzzy matches to what it needs. So it seems possible that we
could find entries that are equal according to the operator we're
using, even though they're unequal according to what ANALYZE used to
compute the MCV list. I didn't actively try to hit that Assert, but
I think a counterexample could be built by using a case-insensitive
collation in a join query.
>> In particular, I wondered whether the
>> hash threshold check
>> if (Min(sslot1.nvalues, sslot2.nvalues) >= EQJOINSEL_MCV_HASH_THRESHOLD)
>> should use Max() instead --- that is, it might be safer to hash
>> if either MCV list is long. Or, holding one's head at a different
>> angle, perhaps the sum of the list lengths should be what's checked?
> Hmm… using the sum actually seems like a good idea for me.
Actually, after sleeping on it it seems like the obvious thing is
to test "sslot1.nvalues * sslot2.nvalues", since the work we are
thinking about saving scales as that product. But I'm not sure
what threshold value to use if we do that. Maybe around 10000?
regards, tom lane
From 15f6c1df696438f5a8f84c8fe3bcf2b1dd1ea475 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon, 17 Nov 2025 12:06:49 -0500
Subject: [PATCH v6 1/5] Add some test scaffolding to join_selectivity().
This not-meant-for-commit patch adds some instrumentation to
plancat.c's join_selectivity() to log the result and runtime
of a join selectivity function. This is useful for manual
testing of performance patches in eqjoinsel().
To improve the accuracy of the runtime measurement, run the
function 1000 times in each call. The regression tests still
take a reasonable amount of time with this overhead, although
it's noticeably more than usual.
---
src/backend/optimizer/util/plancat.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d950bd93002..15c50e068a0 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -42,6 +42,7 @@
#include "parser/parse_relation.h"
#include "parser/parsetree.h"
#include "partitioning/partdesc.h"
+#include "portability/instr_time.h"
#include "rewrite/rewriteHandler.h"
#include "rewrite/rewriteManip.h"
#include "statistics/statistics.h"
@@ -2123,6 +2124,8 @@ join_selectivity(PlannerInfo *root,
{
RegProcedure oprjoin = get_oprjoin(operatorid);
float8 result;
+ instr_time start_time,
+ end_time;
/*
* if the oprjoin procedure is missing for whatever reason, use a
@@ -2131,6 +2134,10 @@ join_selectivity(PlannerInfo *root,
if (!oprjoin)
return (Selectivity) 0.5;
+ INSTR_TIME_SET_CURRENT(start_time);
+
+ for (int i = 0; i < 1000; i++)
+ {
result = DatumGetFloat8(OidFunctionCall5Coll(oprjoin,
inputcollid,
PointerGetDatum(root),
@@ -2138,6 +2145,21 @@ join_selectivity(PlannerInfo *root,
PointerGetDatum(args),
Int16GetDatum(jointype),
PointerGetDatum(sjinfo)));
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ /* Don't be chatty during initdb */
+ if (IsUnderPostmaster)
+ {
+ INSTR_TIME_SET_CURRENT(end_time);
+
+ INSTR_TIME_SUBTRACT(end_time, start_time);
+
+ /* multiply by 1e6 to get microsecs, divide by 1000 to cancel loop */
+ elog(LOG, "join_selectivity(opr %u, jointype %d) = %f, time %g us",
+ operatorid, jointype, result,
+ INSTR_TIME_GET_DOUBLE(end_time) * 1e3);
+ }
if (result < 0.0 || result > 1.0)
elog(ERROR, "invalid join selectivity: %f", result);
--
2.43.7
From 571d2ea49ec9388010fa26a8fa165a7faac94c94 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon, 17 Nov 2025 12:08:39 -0500
Subject: [PATCH v6 2/5] Factor out duplicative code in
eqjoinsel_inner/eqjoinsel_semi.
These functions have essentially identical code for scanning the
two MCV lists and identifying which entries have matches in the
other list. While it's not a huge amount of code, it's 50 or
so lines, and will be more after an upcoming patch to use a hash
table with many MCVs. Let's reduce duplication by moving that
code into a common subroutine.
The one downside of doing this is that we must compute
sum(sslot1->numbers[i] * sslot2->numbers[j]) even though
eqjoinsel_semi won't need that. But the cost of that appears
negligible, so I didn't trouble to invent a way of avoiding it.
---
src/backend/utils/adt/selfuncs.c | 191 ++++++++++++++++---------------
1 file changed, 99 insertions(+), 92 deletions(-)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cb23ad52782..590b3a0c078 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -163,6 +163,11 @@ static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
Form_pg_statistic stats1, Form_pg_statistic stats2,
bool have_mcvs1, bool have_mcvs2,
RelOptInfo *inner_rel);
+static void eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+ AttStatsSlot *sslot1, AttStatsSlot *sslot2,
+ int nvalues1, int nvalues2,
+ bool *hasmatch1, bool *hasmatch2,
+ int *p_nmatches, double *p_matchprodfreq);
static bool estimate_multivariate_ndistinct(PlannerInfo *root,
RelOptInfo *rel, List **varinfos, double *ndistinct);
static bool convert_to_scalar(Datum value, Oid valuetypid, Oid collid,
@@ -2473,8 +2478,6 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
* results", Technical Report 1018, Computer Science Dept., University
* of Wisconsin, Madison, March 1991 (available from ftp.cs.wisc.edu).
*/
- LOCAL_FCINFO(fcinfo, 2);
- FmgrInfo eqproc;
bool *hasmatch1;
bool *hasmatch2;
double nullfrac1 = stats1->stanullfrac;
@@ -2491,55 +2494,17 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
int i,
nmatches;
- fmgr_info(opfuncoid, &eqproc);
-
- /*
- * Save a few cycles by setting up the fcinfo struct just once. Using
- * FunctionCallInvoke directly also avoids failure if the eqproc
- * returns NULL, though really equality functions should never do
- * that.
- */
- InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
- NULL, NULL);
- fcinfo->args[0].isnull = false;
- fcinfo->args[1].isnull = false;
-
+ /* Construct the match arrays */
hasmatch1 = (bool *) palloc0(sslot1->nvalues * sizeof(bool));
hasmatch2 = (bool *) palloc0(sslot2->nvalues * sizeof(bool));
- /*
- * Note we assume that each MCV will match at most one member of the
- * other MCV list. If the operator isn't really equality, there could
- * be multiple matches --- but we don't look for them, both for speed
- * and because the math wouldn't add up...
- */
- matchprodfreq = 0.0;
- nmatches = 0;
- for (i = 0; i < sslot1->nvalues; i++)
- {
- int j;
-
- fcinfo->args[0].value = sslot1->values[i];
-
- for (j = 0; j < sslot2->nvalues; j++)
- {
- Datum fresult;
-
- if (hasmatch2[j])
- continue;
- fcinfo->args[1].value = sslot2->values[j];
- fcinfo->isnull = false;
- fresult = FunctionCallInvoke(fcinfo);
- if (!fcinfo->isnull && DatumGetBool(fresult))
- {
- hasmatch1[i] = hasmatch2[j] = true;
- matchprodfreq += sslot1->numbers[i] * sslot2->numbers[j];
- nmatches++;
- break;
- }
- }
- }
+ eqjoinsel_find_matches(opfuncoid, collation,
+ sslot1, sslot2,
+ sslot1->nvalues, sslot2->nvalues,
+ hasmatch1, hasmatch2,
+ &nmatches, &matchprodfreq);
CLAMP_PROBABILITY(matchprodfreq);
+
/* Sum up frequencies of matched and unmatched MCVs */
matchfreq1 = unmatchfreq1 = 0.0;
for (i = 0; i < sslot1->nvalues; i++)
@@ -2700,12 +2665,11 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
* lists. We still have to estimate for the remaining population, but
* in a skewed distribution this gives us a big leg up in accuracy.
*/
- LOCAL_FCINFO(fcinfo, 2);
- FmgrInfo eqproc;
bool *hasmatch1;
bool *hasmatch2;
double nullfrac1 = stats1->stanullfrac;
- double matchfreq1,
+ double matchprodfreq,
+ matchfreq1,
uncertainfrac,
uncertain;
int i,
@@ -2721,52 +2685,16 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
*/
clamped_nvalues2 = Min(sslot2->nvalues, nd2);
- fmgr_info(opfuncoid, &eqproc);
-
- /*
- * Save a few cycles by setting up the fcinfo struct just once. Using
- * FunctionCallInvoke directly also avoids failure if the eqproc
- * returns NULL, though really equality functions should never do
- * that.
- */
- InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
- NULL, NULL);
- fcinfo->args[0].isnull = false;
- fcinfo->args[1].isnull = false;
-
+ /* Construct the match arrays */
hasmatch1 = (bool *) palloc0(sslot1->nvalues * sizeof(bool));
hasmatch2 = (bool *) palloc0(clamped_nvalues2 * sizeof(bool));
- /*
- * Note we assume that each MCV will match at most one member of the
- * other MCV list. If the operator isn't really equality, there could
- * be multiple matches --- but we don't look for them, both for speed
- * and because the math wouldn't add up...
- */
- nmatches = 0;
- for (i = 0; i < sslot1->nvalues; i++)
- {
- int j;
-
- fcinfo->args[0].value = sslot1->values[i];
-
- for (j = 0; j < clamped_nvalues2; j++)
- {
- Datum fresult;
+ eqjoinsel_find_matches(opfuncoid, collation,
+ sslot1, sslot2,
+ sslot1->nvalues, clamped_nvalues2,
+ hasmatch1, hasmatch2,
+ &nmatches, &matchprodfreq);
- if (hasmatch2[j])
- continue;
- fcinfo->args[1].value = sslot2->values[j];
- fcinfo->isnull = false;
- fresult = FunctionCallInvoke(fcinfo);
- if (!fcinfo->isnull && DatumGetBool(fresult))
- {
- hasmatch1[i] = hasmatch2[j] = true;
- nmatches++;
- break;
- }
- }
- }
/* Sum up frequencies of matched MCVs */
matchfreq1 = 0.0;
for (i = 0; i < sslot1->nvalues; i++)
@@ -2830,6 +2758,85 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
return selec;
}
+/*
+ * Identify matching MCVs for eqjoinsel_inner or eqjoinsel_semi.
+ *
+ * Inputs:
+ * opfuncoid: OID of equality function to use (might be cross-type)
+ * collation: OID of collation to use
+ * sslot1, sslot2: MCV values for the lefthand and righthand inputs
+ * nvalues1, nvalues2: number of values to be considered (can be less than
+ * sslotN->nvalues, but not more)
+ * Outputs:
+ * hasmatch1[], hasmatch2[]: pre-zeroed arrays of lengths nvalues1, nvalues2;
+ * entries are set to true if that MCV has a match on the other side
+ * *p_nmatches: receives number of MCV pairs that match
+ * *p_matchprodfreq: receives sum(sslot1->numbers[i] * sslot2->numbers[j])
+ * for matching MCVs
+ *
+ * Note we assume that each MCV will match at most one member of the other
+ * MCV list. If the operator isn't really equality, there could be multiple
+ * matches --- but we don't look for them, both for speed and because the
+ * math wouldn't add up...
+ */
+static void
+eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+ AttStatsSlot *sslot1, AttStatsSlot *sslot2,
+ int nvalues1, int nvalues2,
+ bool *hasmatch1, bool *hasmatch2,
+ int *p_nmatches, double *p_matchprodfreq)
+{
+ LOCAL_FCINFO(fcinfo, 2);
+ FmgrInfo eqproc;
+ double matchprodfreq = 0.0;
+ int nmatches = 0;
+
+ fmgr_info(opfuncoid, &eqproc);
+
+ /*
+ * Save a few cycles by setting up the fcinfo struct just once. Using
+ * FunctionCallInvoke directly also avoids failure if the eqproc returns
+ * NULL, though really equality functions should never do that.
+ */
+ InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
+ NULL, NULL);
+ fcinfo->args[0].isnull = false;
+ fcinfo->args[1].isnull = false;
+
+ /*
+ * The reason for this extra level of braces will become apparent later.
+ * For now, it just prevents having to re-indent this chunk of code moved
+ * from eqjoinsel_inner.
+ */
+ {
+ for (int i = 0; i < nvalues1; i++)
+ {
+ fcinfo->args[0].value = sslot1->values[i];
+
+ for (int j = 0; j < nvalues2; j++)
+ {
+ Datum fresult;
+
+ if (hasmatch2[j])
+ continue;
+ fcinfo->args[1].value = sslot2->values[j];
+ fcinfo->isnull = false;
+ fresult = FunctionCallInvoke(fcinfo);
+ if (!fcinfo->isnull && DatumGetBool(fresult))
+ {
+ hasmatch1[i] = hasmatch2[j] = true;
+ matchprodfreq += sslot1->numbers[i] * sslot2->numbers[j];
+ nmatches++;
+ break;
+ }
+ }
+ }
+ }
+
+ *p_nmatches = nmatches;
+ *p_matchprodfreq = matchprodfreq;
+}
+
/*
* neqjoinsel - Join selectivity of "!="
*/
--
2.43.7
From 60093f713fdecd329bc394cc0f864cec16cd8e79 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon, 17 Nov 2025 12:09:24 -0500
Subject: [PATCH v6 3/5] Rethink eqjoinsel's handling of reversed joins.
Formerly, if we needed to deal with a "reversed" join where the
outer-side variable is on the right hand of the given operator,
we looked up the operator's commutator and applied that, so that
eqjoinsel_semi could always treat "sslot1" as the outer-side
variable of the semijoin.
This isn't great, because we ended up punting to a poor estimate
if no commutator is recorded. It also doesn't play well with
later changes in this patch series. Instead, let's handle the
case by swapping the left and right input values just before
we call the comparison operator. While this theoretically adds
cycles to the inner comparison loop, with the coding proposed
here I don't see any real timing difference. (But I only tested
it on x86_64.)
---
src/backend/utils/adt/selfuncs.c | 44 +++++++++++++++++++++++---------
1 file changed, 32 insertions(+), 12 deletions(-)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 590b3a0c078..53653d2d05b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -156,6 +156,7 @@ static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
Form_pg_statistic stats1, Form_pg_statistic stats2,
bool have_mcvs1, bool have_mcvs2);
static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
+ bool op_is_reversed,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
bool isdefault1, bool isdefault2,
@@ -164,6 +165,7 @@ static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
bool have_mcvs1, bool have_mcvs2,
RelOptInfo *inner_rel);
static void eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+ bool op_is_reversed,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
int nvalues1, int nvalues2,
bool *hasmatch1, bool *hasmatch2,
@@ -2394,6 +2396,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
if (!join_is_reversed)
selec = eqjoinsel_semi(opfuncoid, collation,
+ false,
&vardata1, &vardata2,
nd1, nd2,
isdefault1, isdefault2,
@@ -2402,11 +2405,8 @@ eqjoinsel(PG_FUNCTION_ARGS)
have_mcvs1, have_mcvs2,
inner_rel);
else
- {
- Oid commop = get_commutator(operator);
- Oid commopfuncoid = OidIsValid(commop) ? get_opcode(commop) : InvalidOid;
-
- selec = eqjoinsel_semi(commopfuncoid, collation,
+ selec = eqjoinsel_semi(opfuncoid, collation,
+ true,
&vardata2, &vardata1,
nd2, nd1,
isdefault2, isdefault1,
@@ -2414,7 +2414,6 @@ eqjoinsel(PG_FUNCTION_ARGS)
stats2, stats1,
have_mcvs2, have_mcvs1,
inner_rel);
- }
/*
* We should never estimate the output of a semijoin to be more
@@ -2499,6 +2498,7 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
hasmatch2 = (bool *) palloc0(sslot2->nvalues * sizeof(bool));
eqjoinsel_find_matches(opfuncoid, collation,
+ false,
sslot1, sslot2,
sslot1->nvalues, sslot2->nvalues,
hasmatch1, hasmatch2,
@@ -2607,11 +2607,13 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
* eqjoinsel_semi --- eqjoinsel for semi join
*
* (Also used for anti join, which we are supposed to estimate the same way.)
- * Caller has ensured that vardata1 is the LHS variable.
- * Unlike eqjoinsel_inner, we have to cope with opfuncoid being InvalidOid.
+ * Caller has ensured that vardata1 is the LHS variable; however, opfuncoid
+ * is for the original join operator, which might now need to have the inputs
+ * swapped in order to apply correctly.
*/
static double
eqjoinsel_semi(Oid opfuncoid, Oid collation,
+ bool op_is_reversed,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
bool isdefault1, bool isdefault2,
@@ -2655,7 +2657,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
isdefault2 = false;
}
- if (have_mcvs1 && have_mcvs2 && OidIsValid(opfuncoid))
+ if (have_mcvs1 && have_mcvs2)
{
/*
* We have most-common-value lists for both relations. Run through
@@ -2690,6 +2692,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
hasmatch2 = (bool *) palloc0(clamped_nvalues2 * sizeof(bool));
eqjoinsel_find_matches(opfuncoid, collation,
+ op_is_reversed,
sslot1, sslot2,
sslot1->nvalues, clamped_nvalues2,
hasmatch1, hasmatch2,
@@ -2762,8 +2765,9 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
* Identify matching MCVs for eqjoinsel_inner or eqjoinsel_semi.
*
* Inputs:
- * opfuncoid: OID of equality function to use (might be cross-type)
+ * opfuncoid: OID of equality function to use (might be reversed)
* collation: OID of collation to use
+ * op_is_reversed: indicates that opfuncoid compares right type to left type
* sslot1, sslot2: MCV values for the lefthand and righthand inputs
* nvalues1, nvalues2: number of values to be considered (can be less than
* sslotN->nvalues, but not more)
@@ -2781,6 +2785,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
*/
static void
eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+ bool op_is_reversed,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
int nvalues1, int nvalues2,
bool *hasmatch1, bool *hasmatch2,
@@ -2809,9 +2814,24 @@ eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
* from eqjoinsel_inner.
*/
{
+ int index1,
+ index2;
+
+ /* Set up to supply the values in the order the operator expects */
+ if (op_is_reversed)
+ {
+ index1 = 1;
+ index2 = 0;
+ }
+ else
+ {
+ index1 = 0;
+ index2 = 1;
+ }
+
for (int i = 0; i < nvalues1; i++)
{
- fcinfo->args[0].value = sslot1->values[i];
+ fcinfo->args[index1].value = sslot1->values[i];
for (int j = 0; j < nvalues2; j++)
{
@@ -2819,7 +2839,7 @@ eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
if (hasmatch2[j])
continue;
- fcinfo->args[1].value = sslot2->values[j];
+ fcinfo->args[index2].value = sslot2->values[j];
fcinfo->isnull = false;
fresult = FunctionCallInvoke(fcinfo);
if (!fcinfo->isnull && DatumGetBool(fresult))
--
2.43.7
From 8a0544353014d60979bca6a83c909bb680c5b043 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon, 17 Nov 2025 12:45:35 -0500
Subject: [PATCH v6 4/5] Share more work between eqjoinsel_inner and
eqjoinsel_semi.
Originally, only one of eqjoinsel_inner and eqjoinsel_semi was
invoked per eqjoinsel call, so the fact that they duplicated a
good deal of work was irrelevant to performance. But since commit
a314c3407, the semi/antijoin case calls both, and that is really
expensive if there are a lot of MCVs to match. Refactor so that
we can re-use eqjoinsel_inner's matching results except in the
(uncommon) case where eqjoinsel_semi clamps the RHS MCV list size
because it's less than the expected number of rows to be fetched
from the RHS rel. This doesn't seem to create any performance
penalty for non-semijoin cases.
While at it, we can avoid doing fmgr_info twice too.
I considered also avoiding duplicate InitFunctionCallInfoData
calls, but desisted: that wouldn't save very much, and in my
tests it looks like there may be some performance advantage
if fcinfo is a local variable.
---
src/backend/utils/adt/selfuncs.c | 128 ++++++++++++++++++++-----------
1 file changed, 84 insertions(+), 44 deletions(-)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53653d2d05b..4335daf9a80 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -148,14 +148,16 @@ get_relation_stats_hook_type get_relation_stats_hook = NULL;
get_index_stats_hook_type get_index_stats_hook = NULL;
static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
-static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
+static double eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
bool isdefault1, bool isdefault2,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
Form_pg_statistic stats1, Form_pg_statistic stats2,
- bool have_mcvs1, bool have_mcvs2);
-static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
+ bool have_mcvs1, bool have_mcvs2,
+ bool *hasmatch1, bool *hasmatch2,
+ int *p_nmatches);
+static double eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
bool op_is_reversed,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
@@ -163,8 +165,10 @@ static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
Form_pg_statistic stats1, Form_pg_statistic stats2,
bool have_mcvs1, bool have_mcvs2,
+ bool *hasmatch1, bool *hasmatch2,
+ int *p_nmatches,
RelOptInfo *inner_rel);
-static void eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+static void eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
bool op_is_reversed,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
int nvalues1, int nvalues2,
@@ -2311,12 +2315,16 @@ eqjoinsel(PG_FUNCTION_ARGS)
bool isdefault1;
bool isdefault2;
Oid opfuncoid;
+ FmgrInfo eqproc;
AttStatsSlot sslot1;
AttStatsSlot sslot2;
Form_pg_statistic stats1 = NULL;
Form_pg_statistic stats2 = NULL;
bool have_mcvs1 = false;
bool have_mcvs2 = false;
+ bool *hasmatch1 = NULL;
+ bool *hasmatch2 = NULL;
+ int nmatches = 0;
bool get_mcv_stats;
bool join_is_reversed;
RelOptInfo *inner_rel;
@@ -2367,14 +2375,26 @@ eqjoinsel(PG_FUNCTION_ARGS)
ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS);
}
+ /* Prepare info usable by both eqjoinsel_inner and eqjoinsel_semi */
+ if (have_mcvs1 && have_mcvs2)
+ {
+ fmgr_info(opfuncoid, &eqproc);
+ hasmatch1 = (bool *) palloc0(sslot1.nvalues * sizeof(bool));
+ hasmatch2 = (bool *) palloc0(sslot2.nvalues * sizeof(bool));
+ }
+ else
+ memset(&eqproc, 0, sizeof(eqproc)); /* silence uninit-var warnings */
+
/* We need to compute the inner-join selectivity in all cases */
- selec_inner = eqjoinsel_inner(opfuncoid, collation,
+ selec_inner = eqjoinsel_inner(&eqproc, collation,
&vardata1, &vardata2,
nd1, nd2,
isdefault1, isdefault2,
&sslot1, &sslot2,
stats1, stats2,
- have_mcvs1, have_mcvs2);
+ have_mcvs1, have_mcvs2,
+ hasmatch1, hasmatch2,
+ &nmatches);
switch (sjinfo->jointype)
{
@@ -2395,7 +2415,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
inner_rel = find_join_input_rel(root, sjinfo->min_righthand);
if (!join_is_reversed)
- selec = eqjoinsel_semi(opfuncoid, collation,
+ selec = eqjoinsel_semi(&eqproc, collation,
false,
&vardata1, &vardata2,
nd1, nd2,
@@ -2403,9 +2423,11 @@ eqjoinsel(PG_FUNCTION_ARGS)
&sslot1, &sslot2,
stats1, stats2,
have_mcvs1, have_mcvs2,
+ hasmatch1, hasmatch2,
+ &nmatches,
inner_rel);
else
- selec = eqjoinsel_semi(opfuncoid, collation,
+ selec = eqjoinsel_semi(&eqproc, collation,
true,
&vardata2, &vardata1,
nd2, nd1,
@@ -2413,6 +2435,8 @@ eqjoinsel(PG_FUNCTION_ARGS)
&sslot2, &sslot1,
stats2, stats1,
have_mcvs2, have_mcvs1,
+ hasmatch2, hasmatch1,
+ &nmatches,
inner_rel);
/*
@@ -2441,6 +2465,11 @@ eqjoinsel(PG_FUNCTION_ARGS)
ReleaseVariableStats(vardata1);
ReleaseVariableStats(vardata2);
+ if (hasmatch1)
+ pfree(hasmatch1);
+ if (hasmatch2)
+ pfree(hasmatch2);
+
CLAMP_PROBABILITY(selec);
PG_RETURN_FLOAT8((float8) selec);
@@ -2449,17 +2478,23 @@ eqjoinsel(PG_FUNCTION_ARGS)
/*
* eqjoinsel_inner --- eqjoinsel for normal inner join
*
+ * In addition to computing the selectivity estimate, this will fill
+ * hasmatch1[], hasmatch2[], and *p_nmatches (if have_mcvs1 && have_mcvs2).
+ * We may be able to re-use that data in eqjoinsel_semi.
+ *
* We also use this for LEFT/FULL outer joins; it's not presently clear
* that it's worth trying to distinguish them here.
*/
static double
-eqjoinsel_inner(Oid opfuncoid, Oid collation,
+eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
bool isdefault1, bool isdefault2,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
Form_pg_statistic stats1, Form_pg_statistic stats2,
- bool have_mcvs1, bool have_mcvs2)
+ bool have_mcvs1, bool have_mcvs2,
+ bool *hasmatch1, bool *hasmatch2,
+ int *p_nmatches)
{
double selec;
@@ -2477,8 +2512,6 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
* results", Technical Report 1018, Computer Science Dept., University
* of Wisconsin, Madison, March 1991 (available from ftp.cs.wisc.edu).
*/
- bool *hasmatch1;
- bool *hasmatch2;
double nullfrac1 = stats1->stanullfrac;
double nullfrac2 = stats2->stanullfrac;
double matchprodfreq,
@@ -2493,16 +2526,14 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
int i,
nmatches;
- /* Construct the match arrays */
- hasmatch1 = (bool *) palloc0(sslot1->nvalues * sizeof(bool));
- hasmatch2 = (bool *) palloc0(sslot2->nvalues * sizeof(bool));
-
- eqjoinsel_find_matches(opfuncoid, collation,
+ /* Fill the match arrays */
+ eqjoinsel_find_matches(eqproc, collation,
false,
sslot1, sslot2,
sslot1->nvalues, sslot2->nvalues,
hasmatch1, hasmatch2,
- &nmatches, &matchprodfreq);
+ p_nmatches, &matchprodfreq);
+ nmatches = *p_nmatches;
CLAMP_PROBABILITY(matchprodfreq);
/* Sum up frequencies of matched and unmatched MCVs */
@@ -2526,8 +2557,6 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
}
CLAMP_PROBABILITY(matchfreq2);
CLAMP_PROBABILITY(unmatchfreq2);
- pfree(hasmatch1);
- pfree(hasmatch2);
/*
* Compute total frequency of non-null values that are not in the MCV
@@ -2607,12 +2636,14 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
* eqjoinsel_semi --- eqjoinsel for semi join
*
* (Also used for anti join, which we are supposed to estimate the same way.)
- * Caller has ensured that vardata1 is the LHS variable; however, opfuncoid
+ * Caller has ensured that vardata1 is the LHS variable; however, eqproc
* is for the original join operator, which might now need to have the inputs
- * swapped in order to apply correctly.
+ * swapped in order to apply correctly. Also, if have_mcvs1 && have_mcvs2
+ * then hasmatch1[], hasmatch2[], and *p_nmatches were filled by
+ * eqjoinsel_inner.
*/
static double
-eqjoinsel_semi(Oid opfuncoid, Oid collation,
+eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
bool op_is_reversed,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
@@ -2620,6 +2651,8 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
Form_pg_statistic stats1, Form_pg_statistic stats2,
bool have_mcvs1, bool have_mcvs2,
+ bool *hasmatch1, bool *hasmatch2,
+ int *p_nmatches,
RelOptInfo *inner_rel)
{
double selec;
@@ -2667,8 +2700,6 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
* lists. We still have to estimate for the remaining population, but
* in a skewed distribution this gives us a big leg up in accuracy.
*/
- bool *hasmatch1;
- bool *hasmatch2;
double nullfrac1 = stats1->stanullfrac;
double matchprodfreq,
matchfreq1,
@@ -2687,16 +2718,30 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
*/
clamped_nvalues2 = Min(sslot2->nvalues, nd2);
- /* Construct the match arrays */
- hasmatch1 = (bool *) palloc0(sslot1->nvalues * sizeof(bool));
- hasmatch2 = (bool *) palloc0(clamped_nvalues2 * sizeof(bool));
-
- eqjoinsel_find_matches(opfuncoid, collation,
- op_is_reversed,
- sslot1, sslot2,
- sslot1->nvalues, clamped_nvalues2,
- hasmatch1, hasmatch2,
- &nmatches, &matchprodfreq);
+ /*
+ * If we did not set clamped_nvalues2 to less than sslot2->nvalues,
+ * then the hasmatch1[] and hasmatch2[] match flags computed by
+ * eqjoinsel_inner are still perfectly applicable, so we need not
+ * re-do the matching work. Note that it does not matter if
+ * op_is_reversed: we'd get the same answers.
+ *
+ * If we did clamp, then a different set of sslot2 values is to be
+ * compared, so we have to re-do the matching.
+ */
+ if (clamped_nvalues2 != sslot2->nvalues)
+ {
+ /* Must re-zero the arrays */
+ memset(hasmatch1, 0, sslot1->nvalues * sizeof(bool));
+ memset(hasmatch2, 0, clamped_nvalues2 * sizeof(bool));
+ /* Re-fill the match arrays */
+ eqjoinsel_find_matches(eqproc, collation,
+ op_is_reversed,
+ sslot1, sslot2,
+ sslot1->nvalues, clamped_nvalues2,
+ hasmatch1, hasmatch2,
+ p_nmatches, &matchprodfreq);
+ }
+ nmatches = *p_nmatches;
/* Sum up frequencies of matched MCVs */
matchfreq1 = 0.0;
@@ -2706,8 +2751,6 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
matchfreq1 += sslot1->numbers[i];
}
CLAMP_PROBABILITY(matchfreq1);
- pfree(hasmatch1);
- pfree(hasmatch2);
/*
* Now we need to estimate the fraction of relation 1 that has at
@@ -2765,9 +2808,9 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
* Identify matching MCVs for eqjoinsel_inner or eqjoinsel_semi.
*
* Inputs:
- * opfuncoid: OID of equality function to use (might be reversed)
+ * eqproc: FmgrInfo for equality function to use (might be reversed)
* collation: OID of collation to use
- * op_is_reversed: indicates that opfuncoid compares right type to left type
+ * op_is_reversed: indicates that eqproc compares right type to left type
* sslot1, sslot2: MCV values for the lefthand and righthand inputs
* nvalues1, nvalues2: number of values to be considered (can be less than
* sslotN->nvalues, but not more)
@@ -2784,7 +2827,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
* math wouldn't add up...
*/
static void
-eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
+eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
bool op_is_reversed,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
int nvalues1, int nvalues2,
@@ -2792,18 +2835,15 @@ eqjoinsel_find_matches(Oid opfuncoid, Oid collation,
int *p_nmatches, double *p_matchprodfreq)
{
LOCAL_FCINFO(fcinfo, 2);
- FmgrInfo eqproc;
double matchprodfreq = 0.0;
int nmatches = 0;
- fmgr_info(opfuncoid, &eqproc);
-
/*
* Save a few cycles by setting up the fcinfo struct just once. Using
* FunctionCallInvoke directly also avoids failure if the eqproc returns
* NULL, though really equality functions should never do that.
*/
- InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
+ InitFunctionCallInfoData(*fcinfo, eqproc, 2, collation,
NULL, NULL);
fcinfo->args[0].isnull = false;
fcinfo->args[1].isnull = false;
--
2.43.7
From 28a9e2dd123aee5e3c1217fb90bf5b51fdb5d427 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon, 17 Nov 2025 13:12:18 -0500
Subject: [PATCH v6 5/5] Use hashing to avoid O(N^2) matching work in
eqjoinsel.
Use a simplehash hash table if there are enough MCVs and the
join operator has associated hash functions. The threshold
for switching to hash mode perhaps could use more research.
---
src/backend/utils/adt/selfuncs.c | 253 ++++++++++++++++++++++++++++++-
src/tools/pgindent/typedefs.list | 3 +
2 files changed, 251 insertions(+), 5 deletions(-)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 4335daf9a80..3354fea6a76 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -143,12 +143,47 @@
#define DEFAULT_PAGE_CPU_MULTIPLIER 50.0
+/*
+ * In production builds, switch to hash-based MCV matching when lists are
+ * large enough to amortize hash setup cost. In debug builds, we use a
+ * smaller threshold so that the regression tests cover both paths well.
+ */
+#ifndef USE_ASSERT_CHECKING
+#define EQJOINSEL_MCV_HASH_THRESHOLD 100
+#else
+#define EQJOINSEL_MCV_HASH_THRESHOLD 10
+#endif
+
+/* Entries in the simplehash hash table used by eqjoinsel_find_matches */
+typedef struct MCVHashEntry
+{
+ Datum value; /* the value represented by this entry */
+ int index; /* its index in the relevant AttStatsSlot */
+ uint32 hash; /* hash code for the Datum */
+ char status; /* status code used by simplehash.h */
+} MCVHashEntry;
+
+/* private_data for the simplehash hash table */
+typedef struct MCVHashContext
+{
+ FunctionCallInfo equal_fcinfo; /* the equality join operator */
+ FunctionCallInfo hash_fcinfo; /* the hash function to use */
+ bool op_is_reversed; /* equality compares hash type to probe type */
+ bool insert_mode; /* doing inserts or lookups? */
+ bool hash_typbyval; /* typbyval of hashed data type */
+ int16 hash_typlen; /* typlen of hashed data type */
+} MCVHashContext;
+
+/* forward reference */
+typedef struct MCVHashTable_hash MCVHashTable_hash;
+
/* Hooks for plugins to get control when we ask for stats */
get_relation_stats_hook_type get_relation_stats_hook = NULL;
get_index_stats_hook_type get_index_stats_hook = NULL;
static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
static double eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
+ Oid hashLeft, Oid hashRight,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
bool isdefault1, bool isdefault2,
@@ -158,6 +193,7 @@ static double eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
bool *hasmatch1, bool *hasmatch2,
int *p_nmatches);
static double eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
+ Oid hashLeft, Oid hashRight,
bool op_is_reversed,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
@@ -169,11 +205,14 @@ static double eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
int *p_nmatches,
RelOptInfo *inner_rel);
static void eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
+ Oid hashLeft, Oid hashRight,
bool op_is_reversed,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
int nvalues1, int nvalues2,
bool *hasmatch1, bool *hasmatch2,
int *p_nmatches, double *p_matchprodfreq);
+static uint32 hash_mcv(MCVHashTable_hash *tab, Datum key);
+static bool mcvs_equal(MCVHashTable_hash *tab, Datum key0, Datum key1);
static bool estimate_multivariate_ndistinct(PlannerInfo *root,
RelOptInfo *rel, List **varinfos, double *ndistinct);
static bool convert_to_scalar(Datum value, Oid valuetypid, Oid collid,
@@ -229,6 +268,20 @@ static RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
static double btcost_correlation(IndexOptInfo *index,
VariableStatData *vardata);
+/* Define support routines for MCV hash tables */
+#define SH_PREFIX MCVHashTable
+#define SH_ELEMENT_TYPE MCVHashEntry
+#define SH_KEY_TYPE Datum
+#define SH_KEY value
+#define SH_HASH_KEY(tab,key) hash_mcv(tab, key)
+#define SH_EQUAL(tab,key0,key1) mcvs_equal(tab, key0, key1)
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tab,ent) (ent)->hash
+#define SH_DEFINE
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
/*
* eqsel - Selectivity of "=" for any data types.
@@ -2316,6 +2369,8 @@ eqjoinsel(PG_FUNCTION_ARGS)
bool isdefault2;
Oid opfuncoid;
FmgrInfo eqproc;
+ Oid hashLeft = InvalidOid;
+ Oid hashRight = InvalidOid;
AttStatsSlot sslot1;
AttStatsSlot sslot2;
Form_pg_statistic stats1 = NULL;
@@ -2381,12 +2436,20 @@ eqjoinsel(PG_FUNCTION_ARGS)
fmgr_info(opfuncoid, &eqproc);
hasmatch1 = (bool *) palloc0(sslot1.nvalues * sizeof(bool));
hasmatch2 = (bool *) palloc0(sslot2.nvalues * sizeof(bool));
+
+ /*
+ * If the MCV lists are long enough to justify hashing, try to look up
+ * hash functions for the join operator. XXX should this be Max()?
+ */
+ if (Min(sslot1.nvalues, sslot2.nvalues) >= EQJOINSEL_MCV_HASH_THRESHOLD)
+ (void) get_op_hash_functions(operator, &hashLeft, &hashRight);
}
else
memset(&eqproc, 0, sizeof(eqproc)); /* silence uninit-var warnings */
/* We need to compute the inner-join selectivity in all cases */
selec_inner = eqjoinsel_inner(&eqproc, collation,
+ hashLeft, hashRight,
&vardata1, &vardata2,
nd1, nd2,
isdefault1, isdefault2,
@@ -2416,6 +2479,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
if (!join_is_reversed)
selec = eqjoinsel_semi(&eqproc, collation,
+ hashLeft, hashRight,
false,
&vardata1, &vardata2,
nd1, nd2,
@@ -2428,6 +2492,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
inner_rel);
else
selec = eqjoinsel_semi(&eqproc, collation,
+ hashLeft, hashRight,
true,
&vardata2, &vardata1,
nd2, nd1,
@@ -2487,6 +2552,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
*/
static double
eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
+ Oid hashLeft, Oid hashRight,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
bool isdefault1, bool isdefault2,
@@ -2528,6 +2594,7 @@ eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
/* Fill the match arrays */
eqjoinsel_find_matches(eqproc, collation,
+ hashLeft, hashRight,
false,
sslot1, sslot2,
sslot1->nvalues, sslot2->nvalues,
@@ -2644,6 +2711,7 @@ eqjoinsel_inner(FmgrInfo *eqproc, Oid collation,
*/
static double
eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
+ Oid hashLeft, Oid hashRight,
bool op_is_reversed,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
@@ -2735,6 +2803,7 @@ eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
memset(hasmatch2, 0, clamped_nvalues2 * sizeof(bool));
/* Re-fill the match arrays */
eqjoinsel_find_matches(eqproc, collation,
+ hashLeft, hashRight,
op_is_reversed,
sslot1, sslot2,
sslot1->nvalues, clamped_nvalues2,
@@ -2810,6 +2879,8 @@ eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
* Inputs:
* eqproc: FmgrInfo for equality function to use (might be reversed)
* collation: OID of collation to use
+ * hashLeft, hashRight: OIDs of hash functions associated with equality op,
+ * or InvalidOid if we're not to use hashing
* op_is_reversed: indicates that eqproc compares right type to left type
* sslot1, sslot2: MCV values for the lefthand and righthand inputs
* nvalues1, nvalues2: number of values to be considered (can be less than
@@ -2821,6 +2892,9 @@ eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
* *p_matchprodfreq: receives sum(sslot1->numbers[i] * sslot2->numbers[j])
* for matching MCVs
*
+ * Note that hashLeft is for the eqproc's left-hand input type, hashRight
+ * for its right, regardless of op_is_reversed.
+ *
* Note we assume that each MCV will match at most one member of the other
* MCV list. If the operator isn't really equality, there could be multiple
* matches --- but we don't look for them, both for speed and because the
@@ -2828,6 +2902,7 @@ eqjoinsel_semi(FmgrInfo *eqproc, Oid collation,
*/
static void
eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
+ Oid hashLeft, Oid hashRight,
bool op_is_reversed,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
int nvalues1, int nvalues2,
@@ -2848,12 +2923,119 @@ eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
fcinfo->args[0].isnull = false;
fcinfo->args[1].isnull = false;
- /*
- * The reason for this extra level of braces will become apparent later.
- * For now, it just prevents having to re-indent this chunk of code moved
- * from eqjoinsel_inner.
- */
+ if (OidIsValid(hashLeft) && OidIsValid(hashRight))
+ {
+ /* Use a hash table to speed up the matching */
+ LOCAL_FCINFO(hash_fcinfo, 1);
+ FmgrInfo hash_proc;
+ MCVHashContext hashContext;
+ MCVHashTable_hash *hashTable;
+ AttStatsSlot *statsProbe;
+ AttStatsSlot *statsHash;
+ bool *hasMatchProbe;
+ bool *hasMatchHash;
+ int nvaluesProbe;
+ int nvaluesHash;
+
+ /* Make sure we build the hash table on the smaller array. */
+ if (sslot1->nvalues >= sslot2->nvalues)
+ {
+ statsProbe = sslot1;
+ statsHash = sslot2;
+ hasMatchProbe = hasmatch1;
+ hasMatchHash = hasmatch2;
+ nvaluesProbe = nvalues1;
+ nvaluesHash = nvalues2;
+ }
+ else
+ {
+ /* We'll have to reverse the direction of use of the operator. */
+ op_is_reversed = !op_is_reversed;
+ statsProbe = sslot2;
+ statsHash = sslot1;
+ hasMatchProbe = hasmatch2;
+ hasMatchHash = hasmatch1;
+ nvaluesProbe = nvalues2;
+ nvaluesHash = nvalues1;
+ }
+
+ /*
+ * Build the hash table on the smaller array, using the appropriate
+ * hash function for its data type.
+ */
+ fmgr_info(op_is_reversed ? hashLeft : hashRight, &hash_proc);
+ InitFunctionCallInfoData(*hash_fcinfo, &hash_proc, 1, collation,
+ NULL, NULL);
+ hash_fcinfo->args[0].isnull = false;
+
+ hashContext.equal_fcinfo = fcinfo;
+ hashContext.hash_fcinfo = hash_fcinfo;
+ hashContext.op_is_reversed = op_is_reversed;
+ hashContext.insert_mode = true;
+ get_typlenbyval(statsHash->valuetype,
+ &hashContext.hash_typlen,
+ &hashContext.hash_typbyval);
+
+ hashTable = MCVHashTable_create(CurrentMemoryContext,
+ nvaluesHash,
+ &hashContext);
+
+ for (int i = 0; i < nvaluesHash; i++)
+ {
+ bool found = false;
+ MCVHashEntry *entry = MCVHashTable_insert(hashTable,
+ statsHash->values[i],
+ &found);
+
+ /*
+ * We're not really expecting duplicates in the MCV list, but it
+ * seems possible that there could be entries that are equal
+ * according to the operator we're testing even though they are
+ * unequal according to the datatype's default opclass. If we
+ * find a dup, just ignore it, leaving the hash entry's index
+ * pointing at the first occurrence.
+ */
+ if (!found)
+ entry->index = i;
+ }
+
+ /*
+ * Prepare to probe the hash table. If the probe values are of a
+ * different data type, then we need to change hash functions. (This
+ * code relies on the assumption that since we defined SH_STORE_HASH,
+ * simplehash.h will never need to compute hash values for existing
+ * hash table entries.)
+ */
+ hashContext.insert_mode = false;
+ if (hashLeft != hashRight)
+ {
+ fmgr_info(op_is_reversed ? hashRight : hashLeft, &hash_proc);
+ /* Resetting hash_fcinfo is probably unnecessary, but be safe */
+ InitFunctionCallInfoData(*hash_fcinfo, &hash_proc, 1, collation,
+ NULL, NULL);
+ hash_fcinfo->args[0].isnull = false;
+ }
+
+ /* Look up each probe value in turn. */
+ for (int i = 0; i < nvaluesProbe; i++)
+ {
+ MCVHashEntry *entry = MCVHashTable_lookup(hashTable,
+ statsProbe->values[i]);
+
+ /* As in the other code path, skip already-matched hash entries */
+ if (entry != NULL && !hasMatchHash[entry->index])
+ {
+ hasMatchHash[entry->index] = hasMatchProbe[i] = true;
+ nmatches++;
+ matchprodfreq += statsHash->numbers[entry->index] * statsProbe->numbers[i];
+ }
+ }
+
+ MCVHashTable_destroy(hashTable);
+ }
+ else
{
+ /* We're not to use hashing, so do it the O(N^2) way */
int index1,
index2;
@@ -2897,6 +3079,67 @@ eqjoinsel_find_matches(FmgrInfo *eqproc, Oid collation,
*p_matchprodfreq = matchprodfreq;
}
+/*
+ * Support functions for the hash tables used by eqjoinsel_find_matches
+ */
+static uint32
+hash_mcv(MCVHashTable_hash *tab, Datum key)
+{
+ MCVHashContext *context = (MCVHashContext *) tab->private_data;
+ FunctionCallInfo fcinfo = context->hash_fcinfo;
+ Datum fresult;
+
+ fcinfo->args[0].value = key;
+ fcinfo->isnull = false;
+ fresult = FunctionCallInvoke(fcinfo);
+ Assert(!fcinfo->isnull);
+ return DatumGetUInt32(fresult);
+}
+
+static bool
+mcvs_equal(MCVHashTable_hash *tab, Datum key0, Datum key1)
+{
+ MCVHashContext *context = (MCVHashContext *) tab->private_data;
+
+ if (context->insert_mode)
+ {
+ /*
+ * During the insertion step, any comparisons will be between two
+ * Datums of the hash table's data type, so if the given operator is
+ * cross-type it will be the wrong thing to use. Fortunately, we can
+ * use datum_image_eq instead. The MCV values should all be distinct
+ * anyway, so it's mostly pro-forma to compare them at all.
+ */
+ return datum_image_eq(key0, key1,
+ context->hash_typbyval, context->hash_typlen);
+ }
+ else
+ {
+ FunctionCallInfo fcinfo = context->equal_fcinfo;
+ Datum fresult;
+
+ /*
+ * Apply the operator the correct way around. Although simplehash.h
+ * doesn't document this explicitly, during lookups key0 is from the
+ * hash table while key1 is the probe value, so we should compare them
+ * in that order only if op_is_reversed.
+ */
+ if (context->op_is_reversed)
+ {
+ fcinfo->args[0].value = key0;
+ fcinfo->args[1].value = key1;
+ }
+ else
+ {
+ fcinfo->args[0].value = key1;
+ fcinfo->args[1].value = key0;
+ }
+ fcinfo->isnull = false;
+ fresult = FunctionCallInvoke(fcinfo);
+ return (!fcinfo->isnull && DatumGetBool(fresult));
+ }
+}
+
/*
* neqjoinsel - Join selectivity of "!="
*/
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 23bce72ae64..57f2a9ccdc5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1649,6 +1649,9 @@ LtreeGistOptions
LtreeSignature
MAGIC
MBuf
+MCVHashContext
+MCVHashEntry
+MCVHashTable_hash
MCVItem
MCVList
MEMORY_BASIC_INFORMATION
--
2.43.7