Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
Дата
Msg-id CAApHDvoTSiiYDuEY3gxZ_j6pnxtp6jGwguvLet4+ix6L1Z7aQg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )  ("David G. Johnston" <david.g.johnston@gmail.com>)
Список pgsql-performance
On Fri, 8 May 2020 at 10:00, David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Thu, May 7, 2020 at 11:07 AM Amarendra Konda <amar.vijaya@gmail.com> wrote:
>>
>> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)  SELECT pa.process_activity_id AS pa_process_activity_id  FROM
process_activitypa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00'  AND EXISTS ( SELECT 1
FROMprocess_instance pi where pi.app_id = pa.app_id AND pi.process_instance_id = pa.process_instance_id  AND pi.user_id
='137074931866340') ORDER BY pa.process_instance_id,  pa.created limit 50; 
>>
>>
>>                ->  Index Scan using process_activity_process_instance_id_app_id_created_idx on
public.process_activitypa  (cost=0.70..1061.62 rows=1436 width=32) (actual time=0.011..20.320 rows=23506 loops=2) 
>
> > Index Cond: ((m.process_instance_id = pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND
(m.created> '1970-01-01 00:00:00'::timestamp without time zone)) 
>
> I suppose during the nested loop the inner index scan could limit itself to the first 50 entries it finds (since the
firsttwo index columns are being held constant on each scan, m.created should define the traversal order...) so that
theoutput of the nested loop ends up being (max 2 x 50) 100 entries which are then sorted and only the top 50 returned. 
>
> Whether the executor could but isn't doing that here or isn't programmed to do that (or my logic is totally off) I do
notknow. 

I think the planner is likely not putting the process_activity table
on the outer side of the nested loop join due to the poor row
estimates.  If it knew that so many rows would match the join then it
likely would have done that to save from having to perform the sort at
all.  However, because the planner has put the process_instance on the
outer side of the nested loop join, it's the pathkeys from that path
that the nested loop node has, which is not the same as what the ORDER
BY needs, so the planner must add a sort step, which means that all
rows from the nested loop plan must be read so that they can be
sorted.

It might be worth trying: create index on process_instance
(user_id,app_id); as that might lower the cost of performing the join
in the opposite order and have the planner prefer that order instead.
If doing that, the OP could then ditch the
fki_conv_konotor_user_user_id index to save space.

If that's not enough to convince the planner that the opposite order
is better then certainly SET enable_sort TO off; would.

David



В списке pgsql-performance по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
Следующее
От: Laurenz Albe
Дата:
Сообщение: Re: pg_attribute, pg_class, pg_depend grow huge in count and sizewith multiple tenants.