Обсуждение: Query optimization advice for beginners

Поиск
Список
Период
Сортировка

Query optimization advice for beginners

От
Kemal Ortanca
Дата:
Hello mail group members, 

I started a new job as PostgreSQL DBA. This is my first mail, I hope the mail I sent meets the rules. 

There is a query that runs slowly when I look at the logs of the database. When I check the resources of the system, there is no problem in the resources, but this query running slowly.  There is no "Seq Scan" in the queries, so the tables are already indexed. But I did not fully understand if the indexes were made correctly. When I analyze the query result on explain.depesz, it seems that the query is taking too long. 

How should I fix the query below? How should I read the output of explain.depesz? 

Thank you in advance for your help.

select pro.id as pro_id
, pro.code
, coalesce(s.is_pick, false)
, coalesce(sum(sb.quantity), 0) as pick_quantity
from mainproduct_productmetaproduction pro, order_basketitemdetail bid
left join shelf_shelvedproductbatch sb on sb.basketitem_detail_id = bid.id
left join shelf_shelvedproducts sp on sp.id = sb.shelved_product_id
left join shelf_shelf s on s.id = sp.shelf_id
where pro.id = bid.production_id
and (
select coalesce(sum(bid.quantity), 0)
from order_basketitem bi
, order_basketitemdetail bid
, order_order o
where o.type in (2,7,9) and o.id = bi.order_id
and o.is_cancelled = false
and bi.is_cancelled = false
and o.is_closed = false
and o.is_picked = false
and o.is_invoiced = false
and o.is_sent = false
and bi.id = bid.basketitem_id
and bid.quantity > (
select coalesce(sum(picked_quantity),0)
from order_basketitembatch bib
where bib.detail_id=bid.id
)
and bid.code = pro.code
) > 0
group by 1,2,3 --,bid.pallet_item_quantity
having coalesce(s.is_pick, false)
and round((coalesce(sum(sb.quantity), 0) / GREATEST(MAX(bid.pallet_item_quantity), 1)::float)::numeric, 2) <= 0.15

https://explain.depesz.com/s/G4vq

Yours truly,
Kemal Ortanca

Re: Query optimization advice for beginners

От
Andreas Kretschmer
Дата:

Am 27.01.20 um 14:15 schrieb Kemal Ortanca:
>
> https://explain.depesz.com/s/G4vq
>
>

the estimates and the real values are very different, seems like 
problems with autoanalyze.

which version?



Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Re: Query optimization advice for beginners

От
Kemal Ortanca
Дата:
Firstly, thank you for coming back.

PostgreSQL version = 11.5

Is there a resource or postgresql configuration you want me to check in addition?



From: Andreas Kretschmer <andreas@a-kretschmer.de>
Sent: Monday, January 27, 2020 3:57 PM
To: pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>
Subject: Re: Query optimization advice for beginners
 


Am 27.01.20 um 14:15 schrieb Kemal Ortanca:
>
> https://explain.depesz.com/s/G4vq
>
>

the estimates and the real values are very different, seems like
problems with autoanalyze.

which version?



Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



Re: Query optimization advice for beginners

От
Michael Lewis
Дата:
You've got two references to order_basketitemdetail both aliased to bid and ALSO a table called order_basketitembatch aliased to bib. I assume that confuses the planner, but even if it doesn't it certainly confuses any new developers trying to understand the query's intention.

The biggest thing that leaps out at me on the explain plan is the 822 thousand loops on index order_basketitembatch_detail_id_9268ccff. That seems to be the subquery in the where clause of the subquery in the main where clause. I never get great results when I nest sub-queries multiple levels. Without knowing your data, we can only make guesses about restructuring the query so it performs better.

select bi.id AS basketitem_id --coalesce(sum(bid.quantity), 0)
from order_basketitem bi
--, order_basketitemdetail bid
, order_order o
where o.type in (2,7,9) and o.id = bi.order_id
and o.is_cancelled = false
and bi.is_cancelled = false
and o.is_closed = false
and o.is_picked = false
and o.is_invoiced = false
and o.is_sent = false
--and bi.id = bid.basketitem_id

For a query like the above, how restrictive is it? That is, of ALL the records in order_basketitem table, how many are returned by the above condition? I would think that the number of orders that have been picked or invoiced or sent or closed or cancelled would be LARGE and so this query may eliminate most of the orders from being considered. Not to mention the order type id restriction.

If I found that the above query resulted in 1% of the table being returned perhaps, there are a number of ways to influence the planner to do this work first such as-

1) put this in a sub-query as the FROM and include OFFSET 0 hack to prevent in-lining
2) put in a CTE using the WITH keyword (note- need to use MATERIALIZED option once on PG12 since default behavior changes)
3) if the number of records returned is large (10 thousand maybe?) and the optimizer is making bad choices on the rest of the query that uses this result set, put this query into a temp table, analyze it, and then use it.

Re: Query optimization advice for beginners

От
Laurenz Albe
Дата:
On Mon, 2020-01-27 at 13:15 +0000, Kemal Ortanca wrote:
> There is a query that runs slowly when I look at the logs of the database. When I check the
> resources of the system, there is no problem in the resources, but this query running slowly.
> There is no "Seq Scan" in the queries, so the tables are already indexed. But I did not
> fully understand if the indexes were made correctly. When I analyze the query result on
> explain.depesz, it seems that the query is taking too long. 
> 
> How should I fix the query below? How should I read the output of explain.depesz? 
> 
> https://explain.depesz.com/s/G4vq

Normally you focus on where the time is spent and the mis-estimates.

The mis-estimates are notable, but this time not the reason for a
wrong choice of join strategy: evern though there are overestimates,
a nested loop join is chosen.

The time is spent in the 16979 executions of the outer subquery,
particularly in the inner subquery.

Because the query uses correlated subqueries, PostgreSQL has to execute
these conditions in the fashion of a nested loop, that is, the subquery
is executed for every row found.

If you manage to rewrite the query so that it uses (outer) joins instead
of correlated subqueries, the optimizer can use different strategies
that may be more efficient.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com