Обсуждение: postgresql-14 slow query

Поиск
Список
Период
Сортировка

postgresql-14 slow query

От
Kenny Bachman
Дата:
Hello Team,

How can I tune this query? It got even slower when I created the index for (state_id, order_id desc). The following explain analyze output is without an index. It takes 13 seconds if I create that index. Could you help me?

Thank you so much for your help.

SELECT DISTINCT ON (order_history.order_id) order_id, order_history.creation_date  AS c_date
FROM work.order_history WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
ORDER BY order_history.order_id DESC;

EXPLAIN ANALYZE output:

 Unique  (cost=672007.46..1519683.55 rows=206423 width=12) (actual time=1701.420..3439.095 rows=2049357 loops=1)
   ->  Gather Merge  (cost=672007.46..1502346.48 rows=6934827 width=12) (actual time=1701.419..2989.243 rows=6891551 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Sort  (cost=671007.40..675341.67 rows=1733707 width=12) (actual time=1657.609..1799.723 rows=1378310 loops=5)
               Sort Key: order_id DESC
               Sort Method: external merge  Disk: 38960kB
               Worker 0:  Sort Method: external merge  Disk: 31488kB
               Worker 1:  Sort Method: external merge  Disk: 36120kB
               Worker 2:  Sort Method: external merge  Disk: 31368kB
               Worker 3:  Sort Method: external merge  Disk: 36152kB
               ->  Parallel Seq Scan on order_history  (cost=0.00..473993.00 rows=1733707 width=12) (actual time=0.041..1211.485 rows=1378310 loops=5)
                     Filter: ((is_false = 0) AND (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[])))
                     Rows Removed by Filter: 3268432
 Planning Time: 0.405 ms
 Execution Time: 3510.433 ms

Re: postgresql-14 slow query

От
Mladen Gogala
Дата:
On 4/15/22 16:59, Kenny Bachman wrote:
Hello Team,

How can I tune this query? It got even slower when I created the index for (state_id, order_id desc). The following explain analyze output is without an index. It takes 13 seconds if I create that index. Could you help me?

Thank you so much for your help.

SELECT DISTINCT ON (order_history.order_id) order_id, order_history.creation_date  AS c_date
FROM work.order_history WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
ORDER BY order_history.order_id DESC;

EXPLAIN ANALYZE output:

 Unique  (cost=672007.46..1519683.55 rows=206423 width=12) (actual time=1701.420..3439.095 rows=2049357 loops=1)
   ->  Gather Merge  (cost=672007.46..1502346.48 rows=6934827 width=12) (actual time=1701.419..2989.243 rows=6891551 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Sort  (cost=671007.40..675341.67 rows=1733707 width=12) (actual time=1657.609..1799.723 rows=1378310 loops=5)
               Sort Key: order_id DESC
               Sort Method: external merge  Disk: 38960kB
               Worker 0:  Sort Method: external merge  Disk: 31488kB
               Worker 1:  Sort Method: external merge  Disk: 36120kB
               Worker 2:  Sort Method: external merge  Disk: 31368kB
               Worker 3:  Sort Method: external merge  Disk: 36152kB
               ->  Parallel Seq Scan on order_history  (cost=0.00..473993.00 rows=1733707 width=12) (actual time=0.041..1211.485 rows=1378310 loops=5)
                     Filter: ((is_false = 0) AND (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[])))
                     Rows Removed by Filter: 3268432
 Planning Time: 0.405 ms
 Execution Time: 3510.433 ms

Parallel sequential scan of 1.7M rows, followed by a sort, all done in 3.5 sec? Doesn't look slow to me. Would indexing state_id be an option?

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

Re: postgresql-14 slow query

От
Doug Reynolds
Дата:
I would move the WHERE filter into a subquery and then run the ORDER BY/DISTINCT on the subquery result.  I like to
avoidDISTINCT on queries using an  ANTIJOIN or SEMIJOIN if possible.  It is hard to recommend without knowing your
data.

Sent from my iPhone

> On Apr 15, 2022, at 4:59 PM, Kenny Bachman <kenny.bachman17@gmail.com> wrote:
>
> 
> Hello Team,
>
> How can I tune this query? It got even slower when I created the index for (state_id, order_id desc). The following
explainanalyze output is without an index. It takes 13 seconds if I create that index. Could you help me? 
>
> Thank you so much for your help.
>
> SELECT DISTINCT ON (order_history.order_id) order_id, order_history.creation_date  AS c_date
> FROM work.order_history WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND
order_history.is_false= 0 
> ORDER BY order_history.order_id DESC;
>
> EXPLAIN ANALYZE output:
>
>  Unique  (cost=672007.46..1519683.55 rows=206423 width=12) (actual time=1701.420..3439.095 rows=2049357 loops=1)
>    ->  Gather Merge  (cost=672007.46..1502346.48 rows=6934827 width=12) (actual time=1701.419..2989.243 rows=6891551
loops=1)
>          Workers Planned: 4
>          Workers Launched: 4
>          ->  Sort  (cost=671007.40..675341.67 rows=1733707 width=12) (actual time=1657.609..1799.723 rows=1378310
loops=5)
>                Sort Key: order_id DESC
>                Sort Method: external merge  Disk: 38960kB
>                Worker 0:  Sort Method: external merge  Disk: 31488kB
>                Worker 1:  Sort Method: external merge  Disk: 36120kB
>                Worker 2:  Sort Method: external merge  Disk: 31368kB
>                Worker 3:  Sort Method: external merge  Disk: 36152kB
>                ->  Parallel Seq Scan on order_history  (cost=0.00..473993.00 rows=1733707 width=12) (actual
time=0.041..1211.485rows=1378310 loops=5) 
>                      Filter: ((is_false = 0) AND (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[])))
>                      Rows Removed by Filter: 3268432
>  Planning Time: 0.405 ms
>  Execution Time: 3510.433 ms




Re: postgresql-14 slow query

От
Bhupendra Babu
Дата:
Add is_false as 2nd column in index.
So your index will be state_id,is_false ,ordrr_id with or without desc.
Since your plan shows lot of data to filter. New index wi help filter less rows. Of course you need to check and analyze the volum of how your data is organized.

On Fri, Apr 15, 2022, 1:59 PM Kenny Bachman <kenny.bachman17@gmail.com> wrote:
Hello Team,

How can I tune this query? It got even slower when I created the index for (state_id, order_id desc). The following explain analyze output is without an index. It takes 13 seconds if I create that index. Could you help me?

Thank you so much for your help.

SELECT DISTINCT ON (order_history.order_id) order_id, order_history.creation_date  AS c_date
FROM work.order_history WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
ORDER BY order_history.order_id DESC;

EXPLAIN ANALYZE output:

 Unique  (cost=672007.46..1519683.55 rows=206423 width=12) (actual time=1701.420..3439.095 rows=2049357 loops=1)
   ->  Gather Merge  (cost=672007.46..1502346.48 rows=6934827 width=12) (actual time=1701.419..2989.243 rows=6891551 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Sort  (cost=671007.40..675341.67 rows=1733707 width=12) (actual time=1657.609..1799.723 rows=1378310 loops=5)
               Sort Key: order_id DESC
               Sort Method: external merge  Disk: 38960kB
               Worker 0:  Sort Method: external merge  Disk: 31488kB
               Worker 1:  Sort Method: external merge  Disk: 36120kB
               Worker 2:  Sort Method: external merge  Disk: 31368kB
               Worker 3:  Sort Method: external merge  Disk: 36152kB
               ->  Parallel Seq Scan on order_history  (cost=0.00..473993.00 rows=1733707 width=12) (actual time=0.041..1211.485 rows=1378310 loops=5)
                     Filter: ((is_false = 0) AND (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[])))
                     Rows Removed by Filter: 3268432
 Planning Time: 0.405 ms
 Execution Time: 3510.433 ms

Re: postgresql-14 slow query

От
Jeff Janes
Дата:
On Fri, Apr 15, 2022 at 4:59 PM Kenny Bachman <kenny.bachman17@gmail.com> wrote:
Hello Team,

How can I tune this query? It got even slower when I created the index for (state_id, order_id desc). The following explain analyze output is without an index. It takes 13 seconds if I create that index. Could you help me?

Thank you so much for your help.

SELECT DISTINCT ON (order_history.order_id) order_id, order_history.creation_date  AS c_date
FROM work.order_history WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
ORDER BY order_history.order_id DESC;

This query doesn't make much sense to me.  You are selecting an arbitrary creation_date for each order_id, which seems like a weird thing to do on purpose.  Is your ORDER BY supposed to list another column in it to break the ties?

I wonder how much benefit you are actually getting from the parallel workers.  If you lower max_parallel_workers_per_gather, does the plan take proportionally longer?

You should turn on track_io_timing, then repeat the query with EXPLAIN (ANALYZE, BUFFERS).

Cheers,

Jeff

Re: postgresql-14 slow query

От
Avinash Vallarapu
Дата:
While I do not have much clarity into the schema and a lot of other insights, you might want to see my points below.

On Fri, Apr 15, 2022 at 4:59 PM Kenny Bachman <kenny.bachman17@gmail.com> wrote:
Hello Team,

How can I tune this query? It got even slower when I created the index for (state_id, order_id desc). The following explain analyze output is without an index. It takes 13 seconds if I create that index. Could you help me?

Thank you so much for your help.

SELECT DISTINCT ON (order_history.order_id) order_id, order_history.creation_date  AS c_date
FROM work.order_history WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
ORDER BY order_history.order_id DESC;

EXPLAIN ANALYZE output:

 Unique  (cost=672007.46..1519683.55 rows=206423 width=12) (actual time=1701.420..3439.095 rows=2049357 loops=1)
Planner estimates vs actual rows seems to have some variation.
Firstly, have you manually updated statistics and verified if estimates can be almost near to actual ? 

VACUUM ANALYZE work.order_history;

   ->  Gather Merge  (cost=672007.46..1502346.48 rows=6934827 width=12) (actual time=1701.419..2989.243 rows=6891551 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Sort  (cost=671007.40..675341.67 rows=1733707 width=12) (actual time=1657.609..1799.723 rows=1378310 loops=5)
               Sort Key: order_id DESC
               Sort Method: external merge  Disk: 38960kB
               Worker 0:  Sort Method: external merge  Disk: 31488kB
               Worker 1:  Sort Method: external merge  Disk: 36120kB
               Worker 2:  Sort Method: external merge  Disk: 31368kB
               Worker 3:  Sort Method: external merge  Disk: 36152kB
Again, I am not suggesting a query rewrite or a change in schema (indexing or etc) as I do not have much clarity and thus not discussing it here.
However, setting a session level work_mem of 40MB should show some immediate difference.

SET work_mem TO '40MB';
<Run EXPLAIN ANALYZE again and check>

               ->  Parallel Seq Scan on order_history  (cost=0.00..473993.00 rows=1733707 width=12) (actual time=0.041..1211.485 rows=1378310 loops=5)
                     Filter: ((is_false = 0) AND (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[])))
                     Rows Removed by Filter: 3268432
 Planning Time: 0.405 ms
 Execution Time: 3510.433 ms


--
Regards,
Avinash Vallarapu
CEO,
MigOps, Inc.

Re: postgresql-14 slow query

От
Paul Smith
Дата:
"for fun" try doing 

Set enable_seqscan=off

And try the queries again. This will discourage it from doing sequential scans and use indexes if possible. If it still does a sequential scan, then there's some reason it can't use the indexes. If it uses indexes now, then the planner must think that the sequential scan would be quicker 

On 15 April 2022 22:00:07 Kenny Bachman <kenny.bachman17@gmail.com> wrote:

Hello Team,

How can I tune this query? It got even slower when I created the index for (state_id, order_id desc). The following explain analyze output is without an index. It takes 13 seconds if I create that index. Could you help me?

Thank you so much for your help.

SELECT DISTINCT ON (order_history.order_id) order_id, order_history.creation_date  AS c_date
FROM work.order_history WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
ORDER BY order_history.order_id DESC;

EXPLAIN ANALYZE output:

 Unique  (cost=672007.46..1519683.55 rows=206423 width=12) (actual time=1701.420..3439.095 rows=2049357 loops=1)
   ->  Gather Merge  (cost=672007.46..1502346.48 rows=6934827 width=12) (actual time=1701.419..2989.243 rows=6891551 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Sort  (cost=671007.40..675341.67 rows=1733707 width=12) (actual time=1657.609..1799.723 rows=1378310 loops=5)
               Sort Key: order_id DESC
               Sort Method: external merge  Disk: 38960kB
               Worker 0:  Sort Method: external merge  Disk: 31488kB
               Worker 1:  Sort Method: external merge  Disk: 36120kB
               Worker 2:  Sort Method: external merge  Disk: 31368kB
               Worker 3:  Sort Method: external merge  Disk: 36152kB
               ->  Parallel Seq Scan on order_history  (cost=0.00..473993.00 rows=1733707 width=12) (actual time=0.041..1211.485 rows=1378310 loops=5)
                     Filter: ((is_false = 0) AND (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[])))
                     Rows Removed by Filter: 3268432
 Planning Time: 0.405 ms
 Execution Time: 3510.433 ms


--

Paul Smith Computer Services
Tel: 01484 855800
Vat No: GB 685 6987 53

Re: postgresql-14 slow query

От
Kenny Bachman
Дата:
Thank you for your all e-mails. I want to provide more information about the query and table.
Table count(*) is = 24M
is_false column doesn't selectivity, so is_false=0 count(*) is ~24M.
My work _mem is 64MB

Also, my original query is below. I changed the query to ( distinct on ) in order to avoid hashaggregate.

SELECT order_history.order_id,
max(order_history.creation_date) AS c_date
FROM work.order_history
WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
GROUP BY order_history.order_id;

EXPLAIN (ANALYZE ,BUFFERS) SELECT order_history.order_id,
max(order_history.creation_date) AS c_date
FROM work.order_history
WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
GROUP BY order_history.order_id;
                                                                          QUERY PLAN                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize HashAggregate  (cost=572423.43..574487.66 rows=206423 width=12) (actual time=2579.826..3394.787 rows=2049357 loops=1)
   Group Key: order_id
   Batches: 5  Memory Usage: 65585kB  Disk Usage: 56504kB
   Buffers: shared hit=350539, temp read=6850 written=12167
   ->  Gather  (cost=483661.54..568294.97 rows=825692 width=12) (actual time=1650.341..1961.442 rows=2085884 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         Buffers: shared hit=350539
         ->  Partial HashAggregate  (cost=482661.54..484725.77 rows=206423 width=12) (actual time=1641.882..1730.569 rows=417177 loops=5)
               Group Key: order_id
               Batches: 1  Memory Usage: 61457kB
               Buffers: shared hit=350539
               Worker 0:  Batches: 1  Memory Usage: 45073kB
               Worker 1:  Batches: 1  Memory Usage: 57361kB
               Worker 2:  Batches: 1  Memory Usage: 57361kB
               Worker 3:  Batches: 1  Memory Usage: 61457kB
               ->  Parallel Seq Scan on order_history  (cost=0.00..473993.00 rows=1733707 width=12) (actual time=0.025..1279.366 rows=1378310 loops=5)
                     Filter: ((is_false = 0) AND (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[])))
                     Rows Removed by Filter: 3268432
                     Buffers: shared hit=350539
 Planning:
   Buffers: shared hit=7
 Planning Time: 0.644 ms
 Execution Time: 3482.662 ms


And the execution plan after creating an index;

CREATE INDEX ON work.order_history (state_id ,order_id);
CREATE INDEX
ANALYZE work.order_history;
ANALYZE
EXPLAIN (ANALYZE ,BUFFERS) SELECT order_history.order_id,
max(order_history.creation_date) AS c_date
FROM work.order_history
WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
GROUP BY order_history.order_id;
                                                                                               QUERY PLAN                                                                                                
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=558809.02..560882.30 rows=207328 width=12) (actual time=6050.759..7598.165 rows=2049357 loops=1)
   Group Key: order_id
   Batches: 5  Memory Usage: 65585kB  Disk Usage: 226272kB
   Buffers: shared hit=6007222 read=9864, temp read=28099 written=49004
   ->  Index Scan using order_history_state_id_order_id_idx on order_history  (cost=0.44..524241.93 rows=6913417 width=12) (actual time=0.079..4137.626 rows=6891551 loops=1)
         Index Cond: (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[]))
         Filter: (is_false = 0)
         Rows Removed by Filter: 5301
         Buffers: shared hit=6007222 read=9864
 Planning:
   Buffers: shared hit=53 read=2
 Planning Time: 0.634 ms
 Execution Time: 7695.625 ms

Paul Smith <paul@pscs.co.uk>, 16 Nis 2022 Cmt, 11:47 tarihinde şunu yazdı:
"for fun" try doing 

Set enable_seqscan=off

And try the queries again. This will discourage it from doing sequential scans and use indexes if possible. If it still does a sequential scan, then there's some reason it can't use the indexes. If it uses indexes now, then the planner must think that the sequential scan would be quicker 

On 15 April 2022 22:00:07 Kenny Bachman <kenny.bachman17@gmail.com> wrote:

Hello Team,

How can I tune this query? It got even slower when I created the index for (state_id, order_id desc). The following explain analyze output is without an index. It takes 13 seconds if I create that index. Could you help me?

Thank you so much for your help.

SELECT DISTINCT ON (order_history.order_id) order_id, order_history.creation_date  AS c_date
FROM work.order_history WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
ORDER BY order_history.order_id DESC;

EXPLAIN ANALYZE output:

 Unique  (cost=672007.46..1519683.55 rows=206423 width=12) (actual time=1701.420..3439.095 rows=2049357 loops=1)
   ->  Gather Merge  (cost=672007.46..1502346.48 rows=6934827 width=12) (actual time=1701.419..2989.243 rows=6891551 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Sort  (cost=671007.40..675341.67 rows=1733707 width=12) (actual time=1657.609..1799.723 rows=1378310 loops=5)
               Sort Key: order_id DESC
               Sort Method: external merge  Disk: 38960kB
               Worker 0:  Sort Method: external merge  Disk: 31488kB
               Worker 1:  Sort Method: external merge  Disk: 36120kB
               Worker 2:  Sort Method: external merge  Disk: 31368kB
               Worker 3:  Sort Method: external merge  Disk: 36152kB
               ->  Parallel Seq Scan on order_history  (cost=0.00..473993.00 rows=1733707 width=12) (actual time=0.041..1211.485 rows=1378310 loops=5)
                     Filter: ((is_false = 0) AND (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[])))
                     Rows Removed by Filter: 3268432
 Planning Time: 0.405 ms
 Execution Time: 3510.433 ms


--

Paul Smith Computer Services
Tel: 01484 855800
Vat No: GB 685 6987 53