Обсуждение: Partial hash index is not used for implied qual.

Поиск
Список
Период
Сортировка

Partial hash index is not used for implied qual.

От
Sergei Glukhov
Дата:
Hi!

Partial hash index is not used if qual is an implied qual
since this qual is not added to indrestrictinfo and we cannot
get the keys needed to make hash index scan possible.
Suggested fix is to add implied qual for the indexes
which requires the presence of a key to scan the index.

How to repeat:

CREATE TABLE hash_partial(x) AS
        SELECT x as y from generate_series(1, 1000) as x;
ANALYZE hash_partial;
CREATE INDEX partial_idx ON hash_partial USING hash(x) WHERE x = 1;
EXPLAIN (COSTS OFF) SELECT x FROM hash_partial WHERE x = 1;
...
          QUERY PLAN
--------------------------
  Seq Scan on hash_partial
    Filter: (x = 1)
  ...

  Regards,
  Sergei Glukhov


Вложения

Re: Partial hash index is not used for implied qual.

От
David Rowley
Дата:
On Mon, 24 Nov 2025 at 20:33, Sergei Glukhov <s.glukhov@postgrespro.ru> wrote:
> Partial hash index is not used if qual is an implied qual
> since this qual is not added to indrestrictinfo and we cannot
> get the keys needed to make hash index scan possible.
> Suggested fix is to add implied qual for the indexes
> which requires the presence of a key to scan the index.
>
> How to repeat:

That's not very good. I see we abort building the index path at:

/*
 * If no clauses match the first index column, check for amoptionalkey
 * restriction.  We can't generate a scan over an index with
 * amoptionalkey = false unless there's at least one index clause.
 * (When working on columns after the first, this test cannot fail. It
 * is always okay for columns after the first to not have any
 * clauses.)
 */
if (index_clauses == NIL && !index->amoptionalkey)
    return NIL;

and that's there due to the fact that Hash doesn't support full
clauseless scans. Without the above, you'd get:

SELECT x FROM hash_partial WHERE x = 1;
ERROR:  hash indexes do not support whole-index scans

so, that leads me to believe the location you're adjusting is probably
the best place to fix this issue.

As for the patch, you didn't update the comment to include the reason
you're keeping the restrictinfo clause. That's not great. I think you
should break that new test out into a new "if" test, maybe like:

/*
 * Keep the restrictinfo for non-amoptionalkey index types as
 * dropping the clause could result in having no clauses to use to
 * scan the index.  That's unsupported by non-amoptionalkey
 * indexes, so if we dropped this qual, we'd fail to build a Path
 * for this index later in planning.
 */
if (!index->amoptionalkey)
    index->indrestrictinfo = lappend(index->indrestrictinfo, rinfo);

/* predicate_implied_by() assumes first arg is immutable */
else if (contain_mutable_functions((Node *) rinfo->clause) ||
           !predicate_implied_by(list_make1(rinfo->clause),
                                               index->indpred, false))
    index->indrestrictinfo = lappend(index->indrestrictinfo, rinfo);

David



Re: Partial hash index is not used for implied qual.

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> so, that leads me to believe the location you're adjusting is probably
> the best place to fix this issue.

Wouldn't it be better to handle it more like the is_target_rel logic
a few lines further up?  I also object to putting the test between
the contain_mutable_functions and predicate_implied_by calls; that's
both confusing and probably wrong.  We're only calling
contain_mutable_functions to guard an assumption that
predicate_implied_by makes.

A larger point is that I think leaving such quals in indrestrictinfo
probably distorts our estimates of indexscan costs: we are likely to
think they contribute selectivity when they don't.  Maybe that's a
problem to address separately, but it should be looked at.  We skated
past the same problem for is_target_rel cases on the grounds that that
consideration affects all indexes on the rel equally; but as proposed,
this will probably result in an improper bias towards a partial hash
index.

            regards, tom lane



Re: Partial hash index is not used for implied qual.

От
Tom Lane
Дата:
I wrote:
> Wouldn't it be better to handle it more like the is_target_rel logic
> a few lines further up?

Actually, after thinking a bit longer, it'd be better to do something
like the attached so that we don't keep redundant quals unless they'd
*all* be excluded.

There's definitely something fishy about the costing though.
I experimented with this variant of Sergei's example:

regression=# CREATE TABLE hash_partial(x) AS SELECT x % 100 as y from generate_series(1, 1000) as x;
SELECT 1000
regression=# ANALYZE hash_partial;
ANALYZE
regression=# CREATE INDEX partial_idx ON hash_partial USING hash(x) WHERE x = 1;
CREATE INDEX
regression=# set enable_seqscan TO 0;  -- else we'll go for a seqscan
SET
regression=# EXPLAIN SELECT x FROM hash_partial WHERE x = 1;
                                 QUERY PLAN
----------------------------------------------------------------------------
 Bitmap Heap Scan on hash_partial  (cost=24.08..32.56 rows=10 width=4)
   Recheck Cond: (x = 1)
   ->  Bitmap Index Scan on partial_idx  (cost=0.00..24.07 rows=10 width=0)
         Index Cond: (x = 1)
(4 rows)

regression=# drop index partial_idx;
DROP INDEX
regression=# CREATE INDEX ON hash_partial USING hash(x);
CREATE INDEX
regression=# EXPLAIN SELECT x FROM hash_partial WHERE x = 1;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Bitmap Heap Scan on hash_partial  (cost=4.08..12.56 rows=10 width=4)
   Recheck Cond: (x = 1)
   ->  Bitmap Index Scan on hash_partial_x_idx  (cost=0.00..4.08 rows=10 width=0)
         Index Cond: (x = 1)
(4 rows)

Why are we thinking that a non-partial index would be substantially
cheaper to scan?  That seems surely wrong, and it runs counter to my
intuition about why this fix is incomplete.  (I expected an unfair
bias towards the partial index, not against it.)

            regards, tom lane

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c62e3f87724..c2687dec425 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -4038,6 +4038,7 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
     foreach(lc, rel->indexlist)
     {
         IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
+        List       *newrestrictinfo;
         ListCell   *lcr;

         if (index->indpred == NIL)
@@ -4051,8 +4052,8 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
         if (is_target_rel)
             continue;

-        /* Else compute indrestrictinfo as the non-implied quals */
-        index->indrestrictinfo = NIL;
+        /* Else compute new indrestrictinfo as the non-implied quals */
+        newrestrictinfo = NIL;
         foreach(lcr, rel->baserestrictinfo)
         {
             RestrictInfo *rinfo = (RestrictInfo *) lfirst(lcr);
@@ -4061,8 +4062,18 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
             if (contain_mutable_functions((Node *) rinfo->clause) ||
                 !predicate_implied_by(list_make1(rinfo->clause),
                                       index->indpred, false))
-                index->indrestrictinfo = lappend(index->indrestrictinfo, rinfo);
+                newrestrictinfo = lappend(newrestrictinfo, rinfo);
         }
+
+        /*
+         * If we excluded every qual as implied by the predicate, and the
+         * index cannot do full-index scans, then it's better to leave
+         * indrestrictinfo alone so that we can still build a scan on this
+         * index.  XXX We will underestimate the cost of such an indexscan;
+         * what can we do about that?
+         */
+        if (!(newrestrictinfo == NIL && !index->amoptionalkey))
+            index->indrestrictinfo = newrestrictinfo;
     }
 }


Re: Partial hash index is not used for implied qual.

От
Sergei Glukhov
Дата:
Hi,

On 11/25/25 6:01 AM, Tom Lane wrote:
> I wrote:
>> Wouldn't it be better to handle it more like the is_target_rel logic
>> a few lines further up?
> Actually, after thinking a bit longer, it'd be better to do something
> like the attached so that we don't keep redundant quals unless they'd
> *all* be excluded.
>
> There's definitely something fishy about the costing though.
> I experimented with this variant of Sergei's example:
>
> regression=# CREATE TABLE hash_partial(x) AS SELECT x % 100 as y from generate_series(1, 1000) as x;
> SELECT 1000
> regression=# ANALYZE hash_partial;
> ANALYZE
> regression=# CREATE INDEX partial_idx ON hash_partial USING hash(x) WHERE x = 1;
> CREATE INDEX
> regression=# set enable_seqscan TO 0;  -- else we'll go for a seqscan
> SET
> regression=# EXPLAIN SELECT x FROM hash_partial WHERE x = 1;
>                                   QUERY PLAN
> ----------------------------------------------------------------------------
>   Bitmap Heap Scan on hash_partial  (cost=24.08..32.56 rows=10 width=4)
>     Recheck Cond: (x = 1)
>     ->  Bitmap Index Scan on partial_idx  (cost=0.00..24.07 rows=10 width=0)
>           Index Cond: (x = 1)
> (4 rows)
>
> regression=# drop index partial_idx;
> DROP INDEX
> regression=# CREATE INDEX ON hash_partial USING hash(x);
> CREATE INDEX
> regression=# EXPLAIN SELECT x FROM hash_partial WHERE x = 1;
>                                      QUERY PLAN
> ----------------------------------------------------------------------------------
>   Bitmap Heap Scan on hash_partial  (cost=4.08..12.56 rows=10 width=4)
>     Recheck Cond: (x = 1)
>     ->  Bitmap Index Scan on hash_partial_x_idx  (cost=0.00..4.08 rows=10 width=0)
>           Index Cond: (x = 1)
> (4 rows)
>
> Why are we thinking that a non-partial index would be substantially
> cheaper to scan?  That seems surely wrong, and it runs counter to my
> intuition about why this fix is incomplete.  (I expected an unfair
> bias towards the partial index, not against it.)
>
>             regards, tom lane
>

Thanks for the fix. It seems there is another case for investigation:

DROP TABLE hash_partial;
CREATE TABLE hash_partial(x, y) AS
SELECT x, x + x as y from generate_series(1, 1000) as x;
ANALYZE hash_partial;
CREATE INDEX partial_idx  ON hash_partial USING hash(x) WHERE x = 1;
SET enable_seqscan TO 0;
EXPLAIN SELECT x FROM hash_partial WHERE x = 1 and y < 0;
--------------------------------------------------------------------------------
Seq Scan on hash_partial  (cost=0.00..23.00 rows=1 width=4)
    Disabled: true
    Filter: ((y < 0) AND (x = 1))
(3 rows)


  Regarding strangeness of the cost,
  cost is depends on numIndexPages and
  in genericcostestimate() we calulate numIndexPages:

  numIndexPages = ceil(numIndexTuples * index->pages / index->tuples);

  For non-partial index index->pages = 6 and index->tuples = 1000
  and for partial index index->pages = 6 and index->tuples = 10.
  Number of pages depends on index relation size and
  initial size is 6 * BLCKSZ for both, partial and non-partial hash indexes
  Initial size of the hash index relation, in turn,
  depends on total number of tuples in the table.

  Regards,
  Sergei Glukhov





Re: Partial hash index is not used for implied qual.

От
David Rowley
Дата:
On Tue, 25 Nov 2025 at 15:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Actually, after thinking a bit longer, it'd be better to do something
> like the attached so that we don't keep redundant quals unless they'd
> *all* be excluded.

I think your 1st patch was more along the correct lines. With the 2nd
one, I think you're maybe assuming that the non-empty newrestrictinfo
must contain an indexable qual, but the list we're working with at
that point is the rel's baserestrictinfo, which could contain a bunch
of stuff that'll never match to the index. If you continue to remove
the implied qual when there's a non-indexable base qual in the list,
then we'll still have the same issue that Sergei is trying to fix.
Just try adding an unrelated qual with your 2nd patch. Something like:
select * from hash_partial where x=1 and ctid >= '(0,0)';

I think you might have tried the 2nd method because you didn't see the
point in adding >1 indexable qual to scan the index with when we just
want ==1. If you wanted to do that, then maybe match_clause_to_index()
would be the place to ensure the list_length() doesn't go above 1 for
non-amoptionalkey indexes. Is that really worthwhile? hash indexes
only support equality anyway, so in what scenario would there be more
than 1 qual?

David



Re: Partial hash index is not used for implied qual.

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> On Tue, 25 Nov 2025 at 15:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Actually, after thinking a bit longer, it'd be better to do something
>> like the attached so that we don't keep redundant quals unless they'd
>> *all* be excluded.

> I think your 1st patch was more along the correct lines. With the 2nd
> one, I think you're maybe assuming that the non-empty newrestrictinfo
> must contain an indexable qual, but the list we're working with at
> that point is the rel's baserestrictinfo, which could contain a bunch
> of stuff that'll never match to the index. If you continue to remove
> the implied qual when there's a non-indexable base qual in the list,
> then we'll still have the same issue that Sergei is trying to fix.

Right, as Sergei also noted.  We should just do it as attached.

I checked the costing calculations and it's basically that
genericcostestimate doesn't understand about hash buckets.
For the partial index, it correctly estimates that we'll visit
all 10 of the tuples in the index, so it supposes that that
means we'll visit all of the index's pages.  For the non-partial
index, it correctly estimates that we'll visit 10 of the 1000
tuples in the index, so it supposes that that means we'll visit
1/100th of the index's pages (rounded up to 1).  This error is
compounded by the toy example, which only has 6 pages in either index
(the minimum size of a hash index, I think).  There may or may not be
anything we can usefully do to improve that situation ... and for that
matter, it's not clear that preferring the partial index would really
be a win.  As constructed, this test case has only one hash value in
the partial index, which I think is not exactly a case where you want
a hash index.  I tried scaling up the table size, and got a badly
bloated partial index (about half as big as the non-partial one),
which seems to indicate that the code is vainly splitting and
re-splitting trying to divide that one huge bucket.

So I'm inclined to apply the attached and just call it good.

Should we back-patch?  I'm unsure.  Clearly it's a bug that we
cannot generate an indexscan plan in this case, but we've learned
that changing plans in released branches is often not wanted.
And given the lack of field complaints, nobody is using the case
anyway.

            regards, tom lane

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c62e3f87724..2654c59c4c6 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -4051,6 +4051,16 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
         if (is_target_rel)
             continue;

+        /*
+         * If index is !amoptionalkey, also leave indrestrictinfo as set
+         * above.  Otherwise we risk removing all quals for the first index
+         * key and then not being able to generate an indexscan at all.  It
+         * would be better to be more selective, but we've not yet identified
+         * which if any of the quals match the first index key.
+         */
+        if (!index->amoptionalkey)
+            continue;
+
         /* Else compute indrestrictinfo as the non-implied quals */
         index->indrestrictinfo = NIL;
         foreach(lcr, rel->baserestrictinfo)

Re: Partial hash index is not used for implied qual.

От
David Rowley
Дата:
On Thu, 27 Nov 2025 at 07:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I checked the costing calculations and it's basically that
> genericcostestimate doesn't understand about hash buckets.
> For the partial index, it correctly estimates that we'll visit
> all 10 of the tuples in the index, so it supposes that that
> means we'll visit all of the index's pages.  For the non-partial
> index, it correctly estimates that we'll visit 10 of the 1000

I assume you must mean using your "x % 100" case rather than Sergei's case.

> tuples in the index, so it supposes that that means we'll visit
> 1/100th of the index's pages (rounded up to 1).  This error is
> compounded by the toy example, which only has 6 pages in either index
> (the minimum size of a hash index, I think).  There may or may not be
> anything we can usefully do to improve that situation

I'm not so clear on why this is bad. The index has 1000 tuples on 6
pages and we want to scan 1% of the rows in the index. As a result,
genericcostestimate() calculates that's 1 page. How many pages would
you expect to scan?

> ... and for that
> matter, it's not clear that preferring the partial index would really
> be a win.  As constructed, this test case has only one hash value in
> the partial index, which I think is not exactly a case where you want
> a hash index.  I tried scaling up the table size, and got a badly
> bloated partial index (about half as big as the non-partial one),
> which seems to indicate that the code is vainly splitting and
> re-splitting trying to divide that one huge bucket.

I assumed this was some attempt at finding a cheap way to find rows
matching the index's predicate without having to have an index on all
rows.

> So I'm inclined to apply the attached and just call it good.

I think the patch looks fine.

> Should we back-patch?  I'm unsure.  Clearly it's a bug that we
> cannot generate an indexscan plan in this case, but we've learned
> that changing plans in released branches is often not wanted.
> And given the lack of field complaints, nobody is using the case
> anyway.

I feel like anyone adding a partial hash index has done so quite
purposefully. I suspect they might be surprised if there's no means
whatsoever to use that index in scans, so perhaps it's ok to
backpatch.

Sergei, can you confirm if this was something he noticed when playing
around on master, or if this came from a field report?

David



Re: Partial hash index is not used for implied qual.

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> On Thu, 27 Nov 2025 at 07:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I checked the costing calculations and it's basically that
>> genericcostestimate doesn't understand about hash buckets.

> I assume you must mean using your "x % 100" case rather than Sergei's case.

Right.

>> ... tuples in the index, so it supposes that that means we'll visit
>> 1/100th of the index's pages (rounded up to 1).

> I'm not so clear on why this is bad. The index has 1000 tuples on 6
> pages and we want to scan 1% of the rows in the index. As a result,
> genericcostestimate() calculates that's 1 page. How many pages would
> you expect to scan?

To be clear, neither of genericcostestimate's estimates are bad as
a generic estimate.  And I'm not even sure that we could do better
with a hash-aware estimate implemented in hashcostestimate.  I think
the key thing about this test case is that the partial index ends
up with all its entries in one hash bucket.  I'm not sure that we
could reasonably expect to know that in the cost estimator.  Even
if we could, should we really expend a lot of effort on the case?
It's a textbook example of when you should not use a hash index.

I'm interested in fixing this can't-generate-this-plan-shape bug
because it probably impacts more-realistic use-cases.  But I
think the cost estimation problem is probably specific to cases
where you shouldn't have used a hash index, so I'm okay with
ignoring that part.  Until more evidence arrives, anyway.

            regards, tom lane



Re: Partial hash index is not used for implied qual.

От
Sergei Glukhov
Дата:
Hi,

On 11/27/25 7:01 AM, David Rowley wrote:
> On Thu, 27 Nov 2025 at 07:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So I'm inclined to apply the attached and just call it good.
> I think the patch looks fine.

+1, verified, thanks a lot!

>
>> Should we back-patch?  I'm unsure.  Clearly it's a bug that we
>> cannot generate an indexscan plan in this case, but we've learned
>> that changing plans in released branches is often not wanted.
>> And given the lack of field complaints, nobody is using the case
>> anyway.
> I feel like anyone adding a partial hash index has done so quite
> purposefully. I suspect they might be surprised if there's no means
> whatsoever to use that index in scans, so perhaps it's ok to
> backpatch.
>
> Sergei, can you confirm if this was something he noticed when playing
> around on master, or if this came from a field report?

It was reported for v16.


Regards,
Sergei Glukhov




Re: Partial hash index is not used for implied qual.

От
Tom Lane
Дата:
Sergei Glukhov <s.glukhov@postgrespro.ru> writes:
> On 11/27/25 7:01 AM, David Rowley wrote:
>> On Thu, 27 Nov 2025 at 07:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Should we back-patch?  I'm unsure.  Clearly it's a bug that we
>>> cannot generate an indexscan plan in this case, but we've learned
>>> that changing plans in released branches is often not wanted.

>> Sergei, can you confirm if this was something he noticed when playing
>> around on master, or if this came from a field report?

> It was reported for v16.

OK, I'll backpatch then.

            regards, tom lane