Обсуждение: Query optimization advice for beginners

Поиск

Список

Период

Сортировка

Query optimization advice for beginners

От

Kemal Ortanca

Дата:

27 января 2020 г., 16:15:59

Hello mail group members,

I started a new job as PostgreSQL DBA. This is my first mail, I hope the mail I sent meets the rules.

There is a query that runs slowly when I look at the logs of the database. When I check the resources of the system, there is no problem in the resources, but this query running slowly. There is no "Seq Scan" in the queries, so the tables are already indexed. But I did not fully understand if the indexes were made correctly. When I analyze the query result on explain.depesz, it seems that the query is taking too long.

How should I fix the query below? How should I read the output of explain.depesz?

Thank you in advance for your help.

select pro.id as pro_id
   , pro.code
   , coalesce(s.is_pick, false)
   , coalesce(sum(sb.quantity), 0) as pick_quantity
from mainproduct_productmetaproduction pro, order_basketitemdetail bid
left join shelf_shelvedproductbatch sb on sb.basketitem_detail_id = bid.id
left join shelf_shelvedproducts sp on sp.id = sb.shelved_product_id
left join shelf_shelf s on s.id = sp.shelf_id
where pro.id = bid.production_id
and (
		  select coalesce(sum(bid.quantity), 0)
		  from order_basketitem bi
			   , order_basketitemdetail bid
			   , order_order o
		  where o.type in (2,7,9) and o.id = bi.order_id
		  and o.is_cancelled = false
		  and bi.is_cancelled = false
		  and o.is_closed = false
          and o.is_picked = false
          and o.is_invoiced = false
          and o.is_sent = false
		  and bi.id = bid.basketitem_id
		  and bid.quantity > (
								select coalesce(sum(picked_quantity),0)
								from order_basketitembatch bib
								where bib.detail_id=bid.id
							 )
		  and bid.code = pro.code
	  ) > 0
group by 1,2,3 --,bid.pallet_item_quantity
having coalesce(s.is_pick, false)
and round((coalesce(sum(sb.quantity), 0) / GREATEST(MAX(bid.pallet_item_quantity), 1)::float)::numeric, 2) <= 0.15

https://explain.depesz.com/s/G4vq

Yours truly,
Kemal Ortanca

Re: Query optimization advice for beginners

От

Andreas Kretschmer

Дата:

27 января 2020 г., 16:57:11


Am 27.01.20 um 14:15 schrieb Kemal Ortanca:
>
> https://explain.depesz.com/s/G4vq
>
>

the estimates and the real values are very different, seems like 
problems with autoanalyze.

which version?



Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com

Re: Query optimization advice for beginners

От

Kemal Ortanca

Дата:

27 января 2020 г., 18:33:29

Firstly, thank you for coming back.

PostgreSQL version = 11.5

Is there a resource or postgresql configuration you want me to check in addition?

From: Andreas Kretschmer <andreas@a-kretschmer.de>
Sent: Monday, January 27, 2020 3:57 PM
To: pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>
Subject: Re: Query optimization advice for beginners

Am 27.01.20 um 14:15 schrieb Kemal Ortanca:
>
> https://explain.depesz.com/s/G4vq
>
>

the estimates and the real values are very different, seems like
problems with autoanalyze.

which version?

Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com

Re: Query optimization advice for beginners

От

Michael Lewis

Дата:

27 января 2020 г., 19:46:18

You've got two references to order_basketitemdetail both aliased to bid and ALSO a table called order_basketitembatch aliased to bib. I assume that confuses the planner, but even if it doesn't it certainly confuses any new developers trying to understand the query's intention.

The biggest thing that leaps out at me on the explain plan is the 822 thousand loops on index order_basketitembatch_detail_id_9268ccff. That seems to be the subquery in the where clause of the subquery in the main where clause. I never get great results when I nest sub-queries multiple levels. Without knowing your data, we can only make guesses about restructuring the query so it performs better.

select bi.id AS basketitem_id --coalesce(sum(bid.quantity), 0)
		  from order_basketitem bi
			   --, order_basketitemdetail bid
			   , order_order o
		  where o.type in (2,7,9) and o.id = bi.order_id
		  and o.is_cancelled = false
		  and bi.is_cancelled = false
		  and o.is_closed = false
          and o.is_picked = false
          and o.is_invoiced = false
          and o.is_sent = false
		  --and bi.id = bid.basketitem_id

For a query like the above, how restrictive is it? That is, of ALL the records in order_basketitem table, how many are returned by the above condition? I would think that the number of orders that have been picked or invoiced or sent or closed or cancelled would be LARGE and so this query may eliminate most of the orders from being considered. Not to mention the order type id restriction.

If I found that the above query resulted in 1% of the table being returned perhaps, there are a number of ways to influence the planner to do this work first such as-

1) put this in a sub-query as the FROM and include OFFSET 0 hack to prevent in-lining

2) put in a CTE using the WITH keyword (note- need to use MATERIALIZED option once on PG12 since default behavior changes)

3) if the number of records returned is large (10 thousand maybe?) and the optimizer is making bad choices on the rest of the query that uses this result set, put this query into a temp table, analyze it, and then use it.

Re: Query optimization advice for beginners

От

Laurenz Albe

Дата:

28 января 2020 г., 11:37:42

On Mon, 2020-01-27 at 13:15 +0000, Kemal Ortanca wrote:
> There is a query that runs slowly when I look at the logs of the database. When I check the
> resources of the system, there is no problem in the resources, but this query running slowly.
> There is no "Seq Scan" in the queries, so the tables are already indexed. But I did not
> fully understand if the indexes were made correctly. When I analyze the query result on
> explain.depesz, it seems that the query is taking too long. 
> 
> How should I fix the query below? How should I read the output of explain.depesz? 
> 
> https://explain.depesz.com/s/G4vq

Normally you focus on where the time is spent and the mis-estimates.

The mis-estimates are notable, but this time not the reason for a
wrong choice of join strategy: evern though there are overestimates,
a nested loop join is chosen.

The time is spent in the 16979 executions of the outer subquery,
particularly in the inner subquery.

Because the query uses correlated subqueries, PostgreSQL has to execute
these conditions in the fashion of a nested loop, that is, the subquery
is executed for every row found.

If you manage to rewrite the query so that it uses (outer) joins instead
of correlated subqueries, the optimizer can use different strategies
that may be more efficient.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Query optimization advice for beginners

Query optimization advice for beginners

Re: Query optimization advice for beginners

Re: Query optimization advice for beginners

Re: Query optimization advice for beginners

Re: Query optimization advice for beginners