Обсуждение: Why isn't PG using an index-only scan?

Поиск
Список
Период
Сортировка

Why isn't PG using an index-only scan?

От
Jean-Christophe BOGGIO
Дата:

Hello,

I have this very simple query:

INSERT INTO copyrightad (idoeu, idad, role, mechowned, perfowned, iscontrolled)
SELECT o.idoeu, c.idad, LEFT(sipa_capacity1,3), sipa_mech_owned, sipa_perf_owned, sipa_controlled='Y'
FROM imaestro.msipfl ip
JOIN oeu o ON o.imworkid=ip.sipa_song_code
JOIN ad c ON c.imcompid=ip.sipa_ip_code
WHERE ip.sipa_comp_or_publ='C';

And here are the number of elements in each table:

imaestro.msipfl: 1550019

oeu: 1587533

ad: 304986

The explain plan is saying this:

                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------
Insert on copyrightad  (cost=613790.59..4448045.97 rows=0 width=0)
  ->  Merge Join  (cost=613790.59..4448045.97 rows=84972138 width=328)
        Merge Cond: (((c.imcompid)::numeric) = ip.sipa_ip_code)
        ->  Sort  (cost=35712.97..36475.44 rows=304986 width=8)
              Sort Key: ((c.imcompid)::numeric)
              ->  Index Only Scan using ix_ad_imcompid on ad c  (cost=0.42..7931.21 rows=304986 width=8)
        ->  Materialize  (cost=578077.61..594521.17 rows=3288712 width=23)
              ->  Sort  (cost=578077.61..586299.39 rows=3288712 width=23)
                    Sort Key: ip.sipa_ip_code
                    ->  Hash Join  (cost=56043.04..154644.48 rows=3288712 width=23)
                          Hash Cond: ((o.imworkid)::numeric = ip.sipa_song_code)
                          ->  Seq Scan on oeu o  (cost=0.00..41901.33 rows=1587533 width=8)
                          ->  Hash  (cost=48542.24..48542.24 rows=600064 width=25)
                                ->  Seq Scan on msipfl ip  (cost=0.00..48542.24 rows=600064 width=25)
                                      Filter: ((sipa_comp_or_publ)::text = 'C'::text)

Table ad has this index:

"ix_ad_imcompid" btree (imcompid, idad)

Table oeu has this one:

"ix_oeu_imcompid" btree (imworkid, idoeu)

idad and idoeu are both primary keys of, respectively, ad and oeu.

The resultset should be 591615 rows because:

select sipa_comp_or_publ, count(*) from imaestro.msipfl group by 1;
sipa_comp_or_publ | count   
-------------------+--------
C                 | 591615
P                 | 958404

Is it normal that PG is doing a seq scan on table oeu but an index-only scan on ad? I had to stop the query after 5 hours, how can I make this faster? Of course I ran VACUUM ANALYZE.

These are the memory settings I have, but I have plenty of unused RAM. Should I bump something up?

shared_buffers = 16GB                   # min 128kB                                                                                                  
#huge_pages = try                       # on, off, or try                                                                                            
#huge_page_size = 0                     # zero for system default                                                                                    
#temp_buffers = 8MB                     # min 800kB                                                                                                  
#max_prepared_transactions = 0          # zero disables the feature                                                                                  
work_mem = 128MB                                # min 64kB          

$ free -h
              total        used        free      shared  buff/cache   available
Mem:           125Gi        18Gi       3.1Gi        15Gi       121Gi       106Gi
Swap:          4.0Gi       2.0Gi       2.0Gi

Thanks for your help,

JC

Re: Why isn't PG using an index-only scan?

От
David Rowley
Дата:
On Thu, 18 Sept 2025 at 18:36, Jean-Christophe BOGGIO
<postgresql@thefreecat.org> wrote:
> Insert on copyrightad  (cost=613790.59..4448045.97 rows=0 width=0)
>   ->  Merge Join  (cost=613790.59..4448045.97 rows=84972138 width=328)
>         Merge Cond: (((c.imcompid)::numeric) = ip.sipa_ip_code)
>         ->  Sort  (cost=35712.97..36475.44 rows=304986 width=8)
>               Sort Key: ((c.imcompid)::numeric)
>               ->  Index Only Scan using ix_ad_imcompid on ad c  (cost=0.42..7931.21 rows=304986 width=8)
>         ->  Materialize  (cost=578077.61..594521.17 rows=3288712 width=23)
>               ->  Sort  (cost=578077.61..586299.39 rows=3288712 width=23)
>                     Sort Key: ip.sipa_ip_code
>                     ->  Hash Join  (cost=56043.04..154644.48 rows=3288712 width=23)
>                           Hash Cond: ((o.imworkid)::numeric = ip.sipa_song_code)
>                           ->  Seq Scan on oeu o  (cost=0.00..41901.33 rows=1587533 width=8)
>                           ->  Hash  (cost=48542.24..48542.24 rows=600064 width=25)
>                                 ->  Seq Scan on msipfl ip  (cost=0.00..48542.24 rows=600064 width=25)
>                                       Filter: ((sipa_comp_or_publ)::text = 'C'::text)

> Is it normal that PG is doing a seq scan on table oeu but an index-only scan on ad? I had to stop the query after 5
hours,how can I make this faster? Of course I ran VACUUM ANALYZE.
 

Yes. Since *all* records of "oeu" are required and they're not
required in any particular order, then Seq Scan should be the fastest
way to access those records.

5 hours seems very slow for the estimated number of records. Have you
tried running the SELECT using EXPLAIN ANALYZE without the INSERT
part? Even if the 84 million Merge Join row estimate is accurate, 5
hours seems excessively long.

If it still takes a long time, you might try SET enable_mergejoin = 0;
and run the EXPLAIN ANALYZE SELECT .. part.  That'll at least give us
more accurate row counts of what we're actually working with.

David



Re: Why isn't PG using an index-only scan?

От
Jean-Christophe BOGGIO
Дата:
Thanks David,

Le 18/09/2025 à 09:20, David Rowley a écrit :
> Yes. Since *all* records of "oeu" are required and they're not
> required in any particular order, then Seq Scan should be the fastest
> way to access those records.

Ok but then why is it doing it on the AD table? Is it because of the 
number of rows?

> 5 hours seems very slow for the estimated number of records. Have you
> tried running the SELECT using EXPLAIN ANALYZE without the INSERT
> part? Even if the 84 million Merge Join row estimate is accurate, 5
> hours seems excessively long.

I tried this, it took 1 second. And then I discovered that I had an old 
trigger on the copyrightad table.

The insert took 2 seconds.

Very sorry for wasting your time and thanks again for putting me on the 
right track.

Have a great day!



Re: Why isn't PG using an index-only scan?

От
Andrei Lepikhov
Дата:
On 18/9/2025 09:20, David Rowley wrote:
> On Thu, 18 Sept 2025 at 18:36, Jean-Christophe BOGGIO
> If it still takes a long time, you might try SET enable_mergejoin = 0;
> and run the EXPLAIN ANALYZE SELECT .. part.  That'll at least give us
> more accurate row counts of what we're actually working with.This appears to be a good example of a missing feature:
thein-execution 
 
callback or hook.
Imagine if we had a hook within the ExecProcNode. In that scenario, we 
could create a trivial extension that would stop the query after, let's 
say, 10 minutes of execution and display the current state. This would 
give us more reliable data on estimation and the state of the plan tree.

What are your thoughts?

-- 
regards, Andrei Lepikhov



Re: Why isn't PG using an index-only scan?

От
David Rowley
Дата:
On Thu, 18 Sept 2025 at 19:45, Jean-Christophe BOGGIO
<postgresql@thefreecat.org> wrote:
> Ok but then why is it doing it on the AD table? Is it because of the
> number of rows?

It's hard to tell as I don't have a clear view on which columns are
from which tables.  Perhaps "ad" can Index Only Scan because all of
the columns required are in an index but with "oeu" there's a column
being used that's not indexed?

David



Re: Why isn't PG using an index-only scan?

От
David Rowley
Дата:
On Thu, 18 Sept 2025 at 19:55, Andrei Lepikhov <lepihov@gmail.com> wrote:
> Imagine if we had a hook within the ExecProcNode. In that scenario, we
> could create a trivial extension that would stop the query after, let's
> say, 10 minutes of execution and display the current state. This would
> give us more reliable data on estimation and the state of the plan tree.

I recall something along those lines existing once in the extension
world. Or maybe just from a previous employer. I don't recall many of
the details. Maybe something like a function you pass an SQL string
and if you cancelled the query it reported the EXPLAIN ANALYZE done so
far. I assume it must have done something like LOG it as it couldn't
have shown the EXPLAIN as query results on cancel.

David



Re: Why isn't PG using an index-only scan?

От
Andrei Lepikhov
Дата:
On 18/9/2025 13:35, David Rowley wrote:
> On Thu, 18 Sept 2025 at 19:55, Andrei Lepikhov <lepihov@gmail.com> wrote:
>> Imagine if we had a hook within the ExecProcNode. In that scenario, we
>> could create a trivial extension that would stop the query after, let's
>> say, 10 minutes of execution and display the current state. This would
>> give us more reliable data on estimation and the state of the plan tree.
> 
> I recall something along those lines existing once in the extension
> world. Or maybe just from a previous employer. I don't recall many of
> the details. Maybe something like a function you pass an SQL string
> and if you cancelled the query it reported the EXPLAIN ANALYZE done so
> far. I assume it must have done something like LOG it as it couldn't
> have shown the EXPLAIN as query results on cancel.
It looks like a makeshift solution. By implementing a callback, we could 
elevate 'interrupter' to a first-class feature, enabling us to monitor 
the state of the entire query tree (it is especially cool in EXPLAIN 
ANALYZE mode when we may be OK with a partial result).
What's more interesting if Robert's work on nodes' extensibility 
successes, it enables modules to save planning decisions in the plan. 
Thereby, the in-executor callback provides a means to kick off the query 
planned with incorrect estimations (e.g., too many spilled tuples).

Perhaps we should start working on introducing this type of callback/ hook?

-- 
regards, Andrei Lepikhov



Re: Why isn't PG using an index-only scan?

От
David Rowley
Дата:
On Thu, 18 Sept 2025 at 23:55, Andrei Lepikhov <lepihov@gmail.com> wrote:
> It looks like a makeshift solution. By implementing a callback, we could
> elevate 'interrupter' to a first-class feature, enabling us to monitor
> the state of the entire query tree (it is especially cool in EXPLAIN
> ANALYZE mode when we may be OK with a partial result).
> What's more interesting if Robert's work on nodes' extensibility
> successes, it enables modules to save planning decisions in the plan.
> Thereby, the in-executor callback provides a means to kick off the query
> planned with incorrect estimations (e.g., too many spilled tuples).
>
> Perhaps we should start working on introducing this type of callback/ hook?

I don't really have a fully formed opinion here, but I expect that if
there is something that needs a new hook function, it might be easier
to convince everyone that it's a good idea to accept that change if
you first write a patch to add the hook, then go off and write the
extension that uses it and then come back with that as evidence that
the hook is useful enough to be committed to core.

There's certainly places where you could add a hook that would just
add an unacceptable overhead that we couldn't stomach. I expect
ExecProcNode would be one of those places.

I do agree that trying to come up with something for this is worthy of
some time and effort. Helping people with performance issues when they
can't even get EXPLAIN ANALYZE to finish is quite tricky.

David



Re: Why isn't PG using an index-only scan?

От
Andrei Lepikhov
Дата:
On 19/9/2025 03:05, David Rowley wrote:
> On Thu, 18 Sept 2025 at 23:55, Andrei Lepikhov <lepihov@gmail.com> wrote:
>> Perhaps we should start working on introducing this type of callback/ hook?
> There's certainly places where you could add a hook that would just
> add an unacceptable overhead that we couldn't stomach. I expect
> ExecProcNode would be one of those places.
That seems a bit odd. It is not a significant issue to instruct the 
compiler to trust the 'else' branch of the "if (PlanState::hook != 
NULL)" condition and reduce overhead to zero. However, if the hook is 
set, it indicates that the user (module or extension) genuinely needs it 
to be called for this specific node.
> I do agree that trying to come up with something for this is worthy of
> some time and effort. Helping people with performance issues when they
> can't even get EXPLAIN ANALYZE to finish is quite tricky.That's all I need; thank you. Starting a PostgreSQL project
without
 
positive feedback from a committer usually results in wasted time, 
unless you are purely conducting research ;).

-- regards, Andrei Lepikhov