Обсуждение: [PATCH] Add support for SAOP in the optimizer for partial index paths

Поиск
Список
Период
Сортировка

[PATCH] Add support for SAOP in the optimizer for partial index paths

От
Jim Vanns
Дата:
Hi Postgres hackers,

This is my first patch to the project and I've been sitting on it for 6 months!

This patch was produced via:

git diff -p -U 4d936c3fff1ac8dead2cc240ba3da2ed6337257c

The branch point was 4d936c3fff1ac8dead2cc240ba3da2ed6337257c (master
as of 05/12/2025 1445 GMT)

The patch, though a single diff, was generated from 7 logically
distinct commits (feature, tests, expected output etc.).

I hope I've read the submission guides sufficiently. The code change
was based heavily on the existing code in indxpath.c.

Here's a summary of the feature:

    Prior to this patch, only BitmapOr paths were considered for partial
    indexes. With this patch, we now support ScalarArrayOpExpr clauses
    too (i.e. ANY() and IN()).

    I found no entry for this feature in the TODO list here;
    - https://wiki.postgresql.org/wiki/Todo

    However, it has previously been reported/raised here;
    -
https://www.postgresql.org/message-id/flat/c128bd06-a246-4129-914c-3dee7b13417a%40vondra.me#5b3f3b7e90d6de8c39a095afaea6b460

    The new function, generate_bitmap_saop_paths, was largely based on the
    existing generate_bitmap_or_paths() function while also glancing at
    other array handling code such as that found in backend/utils/adt/xml.c
    plus some additional false-starts in backend/optimizer/util/predtest.c

    The C code was formatted via;
    src/tools/pgindent/pgindent --indent=src/tools/pg_bsd_indent/pg_bsd_indent

Cheers,

Jim Vanns

Вложения

Re: [PATCH] Add support for SAOP in the optimizer for partial index paths

От
Jim Vanns
Дата:
Hello again,

Hope you don't mind me bumping this a little, but I wondered if I
should have requested a shepherd/mentor in getting this patch through
the review process? Or await a commitfest?

Cheers,

Jim

On Fri, 5 Dec 2025 at 14:59, Jim Vanns <james.vanns@gmail.com> wrote:
>
> Hi Postgres hackers,
>
> This is my first patch to the project and I've been sitting on it for 6 months!
>
> This patch was produced via:
>
> git diff -p -U 4d936c3fff1ac8dead2cc240ba3da2ed6337257c
>
> The branch point was 4d936c3fff1ac8dead2cc240ba3da2ed6337257c (master as of 05/12/2025 1445 GMT)
>
> The patch, though a single diff, was generated from 7 logically distinct commits (feature, tests, expected output
etc.).
>
> I hope I've read the submission guides sufficiently. The code change was based heavily on the existing code in
indxpath.c.
>
> Here's a summary of the feature:
>
>     Prior to this patch, only BitmapOr paths were considered for partial
>     indexes. With this patch, we now support ScalarArrayOpExpr clauses
>     too (i.e. ANY() and IN()).
>
>     I found no entry for this feature in the TODO list here;
>     - https://wiki.postgresql.org/wiki/Todo
>
>     However, it has previously been reported/raised here;
>     -
https://www.postgresql.org/message-id/flat/c128bd06-a246-4129-914c-3dee7b13417a%40vondra.me#5b3f3b7e90d6de8c39a095afaea6b460
>
>     The new function, generate_bitmap_saop_paths, was largely based on the
>     existing generate_bitmap_or_paths() function while also glancing at
>     other array handling code such as that found in backend/utils/adt/xml.c
>     plus some additional false-starts in backend/optimizer/util/predtest.c
>
>     The C code was formatted via;
>     src/tools/pgindent/pgindent --indent=src/tools/pg_bsd_indent/pg_bsd_indent
>
> Cheers,
>
> Jim Vanns



Re: [PATCH] Add support for SAOP in the optimizer for partial index paths

От
Jim Vanns
Дата:
Just another gentle nudge in the home somebody might bite. I rebased
it into master again this morning and that all worked fine, so I don't
think there is any clashing code, yet. I'm happy to address feedback
etc. where guided.

Cheers

Jim

On Fri, 5 Dec 2025 at 14:59, Jim Vanns <james.vanns@gmail.com> wrote:
>
> Hi Postgres hackers,
>
> This is my first patch to the project and I've been sitting on it for 6 months!
>
> This patch was produced via:
>
> git diff -p -U 4d936c3fff1ac8dead2cc240ba3da2ed6337257c
>
> The branch point was 4d936c3fff1ac8dead2cc240ba3da2ed6337257c (master
> as of 05/12/2025 1445 GMT)
>
> The patch, though a single diff, was generated from 7 logically
> distinct commits (feature, tests, expected output etc.).
>
> I hope I've read the submission guides sufficiently. The code change
> was based heavily on the existing code in indxpath.c.
>
> Here's a summary of the feature:
>
>     Prior to this patch, only BitmapOr paths were considered for partial
>     indexes. With this patch, we now support ScalarArrayOpExpr clauses
>     too (i.e. ANY() and IN()).
>
>     I found no entry for this feature in the TODO list here;
>     - https://wiki.postgresql.org/wiki/Todo
>
>     However, it has previously been reported/raised here;
>     -
https://www.postgresql.org/message-id/flat/c128bd06-a246-4129-914c-3dee7b13417a%40vondra.me#5b3f3b7e90d6de8c39a095afaea6b460
>
>     The new function, generate_bitmap_saop_paths, was largely based on the
>     existing generate_bitmap_or_paths() function while also glancing at
>     other array handling code such as that found in backend/utils/adt/xml.c
>     plus some additional false-starts in backend/optimizer/util/predtest.c
>
>     The C code was formatted via;
>     src/tools/pgindent/pgindent --indent=src/tools/pg_bsd_indent/pg_bsd_indent
>
> Cheers,
>
> Jim Vanns



Re: [PATCH] Add support for SAOP in the optimizer for partial index paths

От
David Rowley
Дата:
On Sat, 6 Dec 2025 at 03:59, Jim Vanns <james.vanns@gmail.com> wrote:
> This is my first patch to the project and I've been sitting on it for 6 months!

Welcome!

> Here's a summary of the feature:
>
>     Prior to this patch, only BitmapOr paths were considered for partial
>     indexes. With this patch, we now support ScalarArrayOpExpr clauses
>     too (i.e. ANY() and IN()).
>
>     I found no entry for this feature in the TODO list here;
>     - https://wiki.postgresql.org/wiki/Todo
>
>     However, it has previously been reported/raised here;
>     -
https://www.postgresql.org/message-id/flat/c128bd06-a246-4129-914c-3dee7b13417a%40vondra.me#5b3f3b7e90d6de8c39a095afaea6b460
>
>     The new function, generate_bitmap_saop_paths, was largely based on the
>     existing generate_bitmap_or_paths() function while also glancing at
>     other array handling code such as that found in backend/utils/adt/xml.c
>     plus some additional false-starts in backend/optimizer/util/predtest.c

I had a quick look and the idea seems reasonable.

A couple of things:

1. It's probably worth having generate_bitmap_saop_paths() do a
precheck for suitable partial and bitmap supporting indexes before
looping over each element of the SOAP array. Maybe just before the
"elem_type = ARR_ELEMTYPE(arrayval);" where the more expensive stuff
starts to happen. You could also record the List's array element
indexes of the possibly suitable partial indexes in a Bitmapset and
loop over those ones with a bms_next_member() loop rather than all
'indexes'. I think partial indexes are rare enough to warrant the
short circuit before getting in too deep. Also, not having to re-find
the indexes you're interested in for each SOAP array element seems
worthwhile.

2. For your tests, I think you can lump all these new tests into
bitmapops.sql. Please shrink the row counts down to much smaller than
10k rows. There's probably no need for any rows if you disable
enable_seqscan and enable_indexscan. The existing test in that file
has to have quite a large row count as it's testing lossy bitmaps.

I would expect this extra processing to add quite a bit of overhead in
certain scenarios. Can you test this and include the SQL scripts you
used to test that? We need to establish the performance of a
reasonable worst-case for this doesn't unreasonably slow the planner
down. Perhaps a few dozen indexes and test with a 100-element SOAP
array and extract the average planning time from EXPLAIN (SUMMARY ON)
with and without the patch.

If you do see quite a bit of overhead, then that might also trigger
you to consider what other short-circuits are possible.

Also, please register the patch in [1]. Unfortunately, the January CF
has started now, but if you get it in March's then it shouldn't get
forgotten.

David

[1] https://commitfest.postgresql.org/58/



Re: [PATCH] Add support for SAOP in the optimizer for partial index paths

От
Jim Vanns
Дата:
Thank you so much, David. I'll give this reply a more thorough read through and address the points you've raised over the next few days or so.

Cheers

Jim

On Sat, 3 Jan 2026, 00:38 David Rowley, <dgrowleyml@gmail.com> wrote:
On Sat, 6 Dec 2025 at 03:59, Jim Vanns <james.vanns@gmail.com> wrote:
> This is my first patch to the project and I've been sitting on it for 6 months!

Welcome!

> Here's a summary of the feature:
>
>     Prior to this patch, only BitmapOr paths were considered for partial
>     indexes. With this patch, we now support ScalarArrayOpExpr clauses
>     too (i.e. ANY() and IN()).
>
>     I found no entry for this feature in the TODO list here;
>     - https://wiki.postgresql.org/wiki/Todo
>
>     However, it has previously been reported/raised here;
>     - https://www.postgresql.org/message-id/flat/c128bd06-a246-4129-914c-3dee7b13417a%40vondra.me#5b3f3b7e90d6de8c39a095afaea6b460
>
>     The new function, generate_bitmap_saop_paths, was largely based on the
>     existing generate_bitmap_or_paths() function while also glancing at
>     other array handling code such as that found in backend/utils/adt/xml.c
>     plus some additional false-starts in backend/optimizer/util/predtest.c

I had a quick look and the idea seems reasonable.

A couple of things:

1. It's probably worth having generate_bitmap_saop_paths() do a
precheck for suitable partial and bitmap supporting indexes before
looping over each element of the SOAP array. Maybe just before the
"elem_type = ARR_ELEMTYPE(arrayval);" where the more expensive stuff
starts to happen. You could also record the List's array element
indexes of the possibly suitable partial indexes in a Bitmapset and
loop over those ones with a bms_next_member() loop rather than all
'indexes'. I think partial indexes are rare enough to warrant the
short circuit before getting in too deep. Also, not having to re-find
the indexes you're interested in for each SOAP array element seems
worthwhile.

2. For your tests, I think you can lump all these new tests into
bitmapops.sql. Please shrink the row counts down to much smaller than
10k rows. There's probably no need for any rows if you disable
enable_seqscan and enable_indexscan. The existing test in that file
has to have quite a large row count as it's testing lossy bitmaps.

I would expect this extra processing to add quite a bit of overhead in
certain scenarios. Can you test this and include the SQL scripts you
used to test that? We need to establish the performance of a
reasonable worst-case for this doesn't unreasonably slow the planner
down. Perhaps a few dozen indexes and test with a 100-element SOAP
array and extract the average planning time from EXPLAIN (SUMMARY ON)
with and without the patch.

If you do see quite a bit of overhead, then that might also trigger
you to consider what other short-circuits are possible.

Also, please register the patch in [1]. Unfortunately, the January CF
has started now, but if you get it in March's then it shouldn't get
forgotten.

David

[1] https://commitfest.postgresql.org/58/

Re: [PATCH] Add support for SAOP in the optimizer for partial index paths

От
Jim Vanns
Дата:
Hi David and other PG hackers,

Before I continue with the other suggestions of consolidating the test
and benchmarking, I've made the code change you suggested and used a
bitmap for recording positions in the list of candidate indexes. Can
you check and make sure I'm on the right track?

I've rebased my patchset on top of 'master' as of today
(e5a5e0a90750d665cab417322b9f85c806430d85) and it all still appears to
work as before (make check succeeds still!).

I've attached the full patch from 'git diff -p -U
e5a5e0a90750d665cab417322b9f85c806430d85' as version 2.

Cheers,

Jim

On Sat, 3 Jan 2026 at 00:38, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Sat, 6 Dec 2025 at 03:59, Jim Vanns <james.vanns@gmail.com> wrote:
> > This is my first patch to the project and I've been sitting on it for 6 months!
>
> Welcome!
>
> > Here's a summary of the feature:
> >
> >     Prior to this patch, only BitmapOr paths were considered for partial
> >     indexes. With this patch, we now support ScalarArrayOpExpr clauses
> >     too (i.e. ANY() and IN()).
> >
> >     I found no entry for this feature in the TODO list here;
> >     - https://wiki.postgresql.org/wiki/Todo
> >
> >     However, it has previously been reported/raised here;
> >     -
https://www.postgresql.org/message-id/flat/c128bd06-a246-4129-914c-3dee7b13417a%40vondra.me#5b3f3b7e90d6de8c39a095afaea6b460
> >
> >     The new function, generate_bitmap_saop_paths, was largely based on the
> >     existing generate_bitmap_or_paths() function while also glancing at
> >     other array handling code such as that found in backend/utils/adt/xml.c
> >     plus some additional false-starts in backend/optimizer/util/predtest.c
>
> I had a quick look and the idea seems reasonable.
>
> A couple of things:
>
> 1. It's probably worth having generate_bitmap_saop_paths() do a
> precheck for suitable partial and bitmap supporting indexes before
> looping over each element of the SOAP array. Maybe just before the
> "elem_type = ARR_ELEMTYPE(arrayval);" where the more expensive stuff
> starts to happen. You could also record the List's array element
> indexes of the possibly suitable partial indexes in a Bitmapset and
> loop over those ones with a bms_next_member() loop rather than all
> 'indexes'. I think partial indexes are rare enough to warrant the
> short circuit before getting in too deep. Also, not having to re-find
> the indexes you're interested in for each SOAP array element seems
> worthwhile.
>
> 2. For your tests, I think you can lump all these new tests into
> bitmapops.sql. Please shrink the row counts down to much smaller than
> 10k rows. There's probably no need for any rows if you disable
> enable_seqscan and enable_indexscan. The existing test in that file
> has to have quite a large row count as it's testing lossy bitmaps.
>
> I would expect this extra processing to add quite a bit of overhead in
> certain scenarios. Can you test this and include the SQL scripts you
> used to test that? We need to establish the performance of a
> reasonable worst-case for this doesn't unreasonably slow the planner
> down. Perhaps a few dozen indexes and test with a 100-element SOAP
> array and extract the average planning time from EXPLAIN (SUMMARY ON)
> with and without the patch.
>
> If you do see quite a bit of overhead, then that might also trigger
> you to consider what other short-circuits are possible.
>
> Also, please register the patch in [1]. Unfortunately, the January CF
> has started now, but if you get it in March's then it shouldn't get
> forgotten.
>
> David
>
> [1] https://commitfest.postgresql.org/58/

Вложения

Re: [PATCH] Add support for SAOP in the optimizer for partial index paths

От
David Rowley
Дата:
On Sun, 11 Jan 2026 at 06:03, Jim Vanns <james.vanns@gmail.com> wrote:
> Before I continue with the other suggestions of consolidating the test
> and benchmarking, I've made the code change you suggested and used a
> bitmap for recording positions in the list of candidate indexes. Can
> you check and make sure I'm on the right track?

Just a quick look;

1. There doesn't seem to be any consideration that there may be many
partial indexes which are suitable for the SAOP element:

drop table if exists t;
create table t (a int);
insert into t select x/1000 from generate_series(1,1000000)x;
create index t_eq_1 on t (a) where a = 1;
create index t_eq_2 on t (a) where a = 2;
create index t_eq_3 on t (a) where a = 3;

create index t_le_2 on t (a) where a <= 2;

explain select * from t where a in(1,2,3); -- Uses t_le_2 twice rather
than the other two more suitable indexes.

drop index t_le_2;
explain select * from t where a in(1,2,3); -- What I'd expect the
above query to produce.

See: compare_path_costs_fuzzily()

2. Is there any point in trying the index again when this condition is
true: if (!clauseset.nonempty). Since you'll be looking for the same
column for the next element, shouldn't you do bms_del_member() on that
index? Then put an "if (bms_is_empty(suitable_indexes)) break;" before
the while loop so that you don't needlessly process the entire SAOP
array when you run out of suitable indexes.

3. Styistically, instead of using int index_pos, you can use
foreach_current_index(idx_lc).

4. I think the following code puts too much faith into there only
being 1 path produced. From a quick skim of the current code in
build_index_paths(), because you're requesting ST_BITMAPSCAN, we don't
seem to ever produce more than 1 path, but if that changed, then your
code would make the list contain too many paths.

per_saop_paths = list_concat(per_saop_paths, indexpaths);

5. Minor detail, but there's a bit of inconsistency in how you're
checking for empty Lists. The preferred way is: list != NIL.

6. Are you sure you want to use predOK == true indexes? Do you have a
case where this new code can produce a better plan than if the predOK
index was used directly by the existing Path generation code? If so,
please provide examples.

David



Re: [PATCH] Add support for SAOP in the optimizer for partial index paths

От
Jim Vanns
Дата:
Thanks David. Apologies for the tardy response. As you can see, I don't get a whole lot of spare time at the moment. Is there some better or more suitable reference code or docs I should be looking at to help address your points? I'll try and take another pass soon.

Cheers

Jim

On Mon, 12 Jan 2026, 00:54 David Rowley, <dgrowleyml@gmail.com> wrote:
On Sun, 11 Jan 2026 at 06:03, Jim Vanns <james.vanns@gmail.com> wrote:
> Before I continue with the other suggestions of consolidating the test
> and benchmarking, I've made the code change you suggested and used a
> bitmap for recording positions in the list of candidate indexes. Can
> you check and make sure I'm on the right track?

Just a quick look;

1. There doesn't seem to be any consideration that there may be many
partial indexes which are suitable for the SAOP element:

drop table if exists t;
create table t (a int);
insert into t select x/1000 from generate_series(1,1000000)x;
create index t_eq_1 on t (a) where a = 1;
create index t_eq_2 on t (a) where a = 2;
create index t_eq_3 on t (a) where a = 3;

create index t_le_2 on t (a) where a <= 2;

explain select * from t where a in(1,2,3); -- Uses t_le_2 twice rather
than the other two more suitable indexes.

drop index t_le_2;
explain select * from t where a in(1,2,3); -- What I'd expect the
above query to produce.

See: compare_path_costs_fuzzily()

2. Is there any point in trying the index again when this condition is
true: if (!clauseset.nonempty). Since you'll be looking for the same
column for the next element, shouldn't you do bms_del_member() on that
index? Then put an "if (bms_is_empty(suitable_indexes)) break;" before
the while loop so that you don't needlessly process the entire SAOP
array when you run out of suitable indexes.

3. Styistically, instead of using int index_pos, you can use
foreach_current_index(idx_lc).

4. I think the following code puts too much faith into there only
being 1 path produced. From a quick skim of the current code in
build_index_paths(), because you're requesting ST_BITMAPSCAN, we don't
seem to ever produce more than 1 path, but if that changed, then your
code would make the list contain too many paths.

per_saop_paths = list_concat(per_saop_paths, indexpaths);

5. Minor detail, but there's a bit of inconsistency in how you're
checking for empty Lists. The preferred way is: list != NIL.

6. Are you sure you want to use predOK == true indexes? Do you have a
case where this new code can produce a better plan than if the predOK
index was used directly by the existing Path generation code? If so,
please provide examples.

David

Re: [PATCH] Add support for SAOP in the optimizer for partial index paths

От
Jim Vanns
Дата:
OK, I've had a considerable refactor which hopefully addresses all your points.

The function now proceeds as follows;

1) Preparatory/rudimentary checks outside of main loop (OR-based SAOP
clause, array dimensionality, element count etc.)
2) Pre-filter bitmap indices for partials or planner implied
3) For each element IN() now look for best choice index via
compare_path_costs (not just first as before)
3a) Check clause fits general index structure (affecting all elements)
3b) Check index matches predicate value/element
4) Build bitmap OR path for this candidate
5) Tests now moved to existing bitmapopts sql/out

I've extended the existing test too to include the multiple-choice
path for the planner as you suggested, which this patch should now
handle (before it was greedy).

It's rebased on top of the current master
(aecc558666ad62fbecb08ff7af1394656811a581) to remain
relevant/up-to-date.

I've not yet tested the performance of it (preferring
correctness/acceptance first!) in the face of many indexes and 100
elements in the array but before I do so, is there a best place for
this kind of test in the source tree? Somewhere beneath src/test
again? I looked but couldn't see an obvious place (bar the regression
tests).

Cheers,

Jim


Jim

On Mon, 12 Jan 2026 at 00:54, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Sun, 11 Jan 2026 at 06:03, Jim Vanns <james.vanns@gmail.com> wrote:
> > Before I continue with the other suggestions of consolidating the test
> > and benchmarking, I've made the code change you suggested and used a
> > bitmap for recording positions in the list of candidate indexes. Can
> > you check and make sure I'm on the right track?
>
> Just a quick look;
>
> 1. There doesn't seem to be any consideration that there may be many
> partial indexes which are suitable for the SAOP element:
>
> drop table if exists t;
> create table t (a int);
> insert into t select x/1000 from generate_series(1,1000000)x;
> create index t_eq_1 on t (a) where a = 1;
> create index t_eq_2 on t (a) where a = 2;
> create index t_eq_3 on t (a) where a = 3;
>
> create index t_le_2 on t (a) where a <= 2;
>
> explain select * from t where a in(1,2,3); -- Uses t_le_2 twice rather
> than the other two more suitable indexes.
>
> drop index t_le_2;
> explain select * from t where a in(1,2,3); -- What I'd expect the
> above query to produce.
>
> See: compare_path_costs_fuzzily()
>
> 2. Is there any point in trying the index again when this condition is
> true: if (!clauseset.nonempty). Since you'll be looking for the same
> column for the next element, shouldn't you do bms_del_member() on that
> index? Then put an "if (bms_is_empty(suitable_indexes)) break;" before
> the while loop so that you don't needlessly process the entire SAOP
> array when you run out of suitable indexes.
>
> 3. Styistically, instead of using int index_pos, you can use
> foreach_current_index(idx_lc).
>
> 4. I think the following code puts too much faith into there only
> being 1 path produced. From a quick skim of the current code in
> build_index_paths(), because you're requesting ST_BITMAPSCAN, we don't
> seem to ever produce more than 1 path, but if that changed, then your
> code would make the list contain too many paths.
>
> per_saop_paths = list_concat(per_saop_paths, indexpaths);
>
> 5. Minor detail, but there's a bit of inconsistency in how you're
> checking for empty Lists. The preferred way is: list != NIL.
>
> 6. Are you sure you want to use predOK == true indexes? Do you have a
> case where this new code can produce a better plan than if the predOK
> index was used directly by the existing Path generation code? If so,
> please provide examples.
>
> David

Вложения