Обсуждение: postgresql-14 slow query

Поиск

Список

Период

Сортировка

postgresql-14 slow query

От

Kenny Bachman

Дата:

15 апреля 2022 г., 20:59:25

Hello Team,

How can I tune this query? It got even slower when I created the index for (state_id, order_id desc). The following explain analyze output is without an index. It takes 13 seconds if I create that index. Could you help me?

Thank you so much for your help.

SELECT DISTINCT ON (order_history.order_id) order_id, order_history.creation_date AS c_date
FROM work.order_history WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
ORDER BY order_history.order_id DESC;

EXPLAIN ANALYZE output:

Unique (cost=672007.46..1519683.55 rows=206423 width=12) (actual time=1701.420..3439.095 rows=2049357 loops=1)
-> Gather Merge (cost=672007.46..1502346.48 rows=6934827 width=12) (actual time=1701.419..2989.243 rows=6891551 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Sort (cost=671007.40..675341.67 rows=1733707 width=12) (actual time=1657.609..1799.723 rows=1378310 loops=5)
Sort Key: order_id DESC
Sort Method: external merge Disk: 38960kB
Worker 0: Sort Method: external merge Disk: 31488kB
Worker 1: Sort Method: external merge Disk: 36120kB
Worker 2: Sort Method: external merge Disk: 31368kB
Worker 3: Sort Method: external merge Disk: 36152kB
-> Parallel Seq Scan on order_history (cost=0.00..473993.00 rows=1733707 width=12) (actual time=0.041..1211.485 rows=1378310 loops=5)
Filter: ((is_false = 0) AND (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[])))
Rows Removed by Filter: 3268432
Planning Time: 0.405 ms
Execution Time: 3510.433 ms

Re: postgresql-14 slow query

От

Mladen Gogala

Дата:

15 апреля 2022 г., 22:07:16

On 4/15/22 16:59, Kenny Bachman wrote:

Hello Team,

How can I tune this query? It got even slower when I created the index for (state_id, order_id desc). The following explain analyze output is without an index. It takes 13 seconds if I create that index. Could you help me?

Thank you so much for your help.

SELECT DISTINCT ON (order_history.order_id) order_id, order_history.creation_date AS c_date
FROM work.order_history WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
ORDER BY order_history.order_id DESC;

EXPLAIN ANALYZE output:

Unique (cost=672007.46..1519683.55 rows=206423 width=12) (actual time=1701.420..3439.095 rows=2049357 loops=1)
-> Gather Merge (cost=672007.46..1502346.48 rows=6934827 width=12) (actual time=1701.419..2989.243 rows=6891551 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Sort (cost=671007.40..675341.67 rows=1733707 width=12) (actual time=1657.609..1799.723 rows=1378310 loops=5)
Sort Key: order_id DESC
Sort Method: external merge Disk: 38960kB
Worker 0: Sort Method: external merge Disk: 31488kB
Worker 1: Sort Method: external merge Disk: 36120kB
Worker 2: Sort Method: external merge Disk: 31368kB
Worker 3: Sort Method: external merge Disk: 36152kB
-> Parallel Seq Scan on order_history (cost=0.00..473993.00 rows=1733707 width=12) (actual time=0.041..1211.485 rows=1378310 loops=5)
Filter: ((is_false = 0) AND (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[])))
Rows Removed by Filter: 3268432
Planning Time: 0.405 ms
Execution Time: 3510.433 ms

Parallel sequential scan of 1.7M rows, followed by a sort, all done in 3.5 sec? Doesn't look slow to me. Would indexing state_id be an option?

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

Re: postgresql-14 slow query

От

Doug Reynolds

Дата:

15 апреля 2022 г., 22:10:33

I would move the WHERE filter into a subquery and then run the ORDER BY/DISTINCT on the subquery result.  I like to
avoidDISTINCT on queries using an  ANTIJOIN or SEMIJOIN if possible.  It is hard to recommend without knowing your
data.

Sent from my iPhone

> On Apr 15, 2022, at 4:59 PM, Kenny Bachman <kenny.bachman17@gmail.com> wrote:
>
> 
> Hello Team,
>
> How can I tune this query? It got even slower when I created the index for (state_id, order_id desc). The following
explainanalyze output is without an index. It takes 13 seconds if I create that index. Could you help me? 
>
> Thank you so much for your help.
>
> SELECT DISTINCT ON (order_history.order_id) order_id, order_history.creation_date  AS c_date
> FROM work.order_history WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND
order_history.is_false= 0 
> ORDER BY order_history.order_id DESC;
>
> EXPLAIN ANALYZE output:
>
>  Unique  (cost=672007.46..1519683.55 rows=206423 width=12) (actual time=1701.420..3439.095 rows=2049357 loops=1)
>    ->  Gather Merge  (cost=672007.46..1502346.48 rows=6934827 width=12) (actual time=1701.419..2989.243 rows=6891551
loops=1)
>          Workers Planned: 4
>          Workers Launched: 4
>          ->  Sort  (cost=671007.40..675341.67 rows=1733707 width=12) (actual time=1657.609..1799.723 rows=1378310
loops=5)
>                Sort Key: order_id DESC
>                Sort Method: external merge  Disk: 38960kB
>                Worker 0:  Sort Method: external merge  Disk: 31488kB
>                Worker 1:  Sort Method: external merge  Disk: 36120kB
>                Worker 2:  Sort Method: external merge  Disk: 31368kB
>                Worker 3:  Sort Method: external merge  Disk: 36152kB
>                ->  Parallel Seq Scan on order_history  (cost=0.00..473993.00 rows=1733707 width=12) (actual
time=0.041..1211.485rows=1378310 loops=5) 
>                      Filter: ((is_false = 0) AND (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[])))
>                      Rows Removed by Filter: 3268432
>  Planning Time: 0.405 ms
>  Execution Time: 3510.433 ms

Re: postgresql-14 slow query

От

Bhupendra Babu

Дата:

16 апреля 2022 г., 04:14:25

Add is_false as 2nd column in index.

So your index will be state_id,is_false ,ordrr_id with or without desc.

Since your plan shows lot of data to filter. New index wi help filter less rows. Of course you need to check and analyze the volum of how your data is organized.

On Fri, Apr 15, 2022, 1:59 PM Kenny Bachman <kenny.bachman17@gmail.com> wrote:

Hello Team,

How can I tune this query? It got even slower when I created the index for (state_id, order_id desc). The following explain analyze output is without an index. It takes 13 seconds if I create that index. Could you help me?

Thank you so much for your help.

SELECT DISTINCT ON (order_history.order_id) order_id, order_history.creation_date AS c_date
FROM work.order_history WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
ORDER BY order_history.order_id DESC;

EXPLAIN ANALYZE output:

Unique (cost=672007.46..1519683.55 rows=206423 width=12) (actual time=1701.420..3439.095 rows=2049357 loops=1)
-> Gather Merge (cost=672007.46..1502346.48 rows=6934827 width=12) (actual time=1701.419..2989.243 rows=6891551 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Sort (cost=671007.40..675341.67 rows=1733707 width=12) (actual time=1657.609..1799.723 rows=1378310 loops=5)
Sort Key: order_id DESC
Sort Method: external merge Disk: 38960kB
Worker 0: Sort Method: external merge Disk: 31488kB
Worker 1: Sort Method: external merge Disk: 36120kB
Worker 2: Sort Method: external merge Disk: 31368kB
Worker 3: Sort Method: external merge Disk: 36152kB
-> Parallel Seq Scan on order_history (cost=0.00..473993.00 rows=1733707 width=12) (actual time=0.041..1211.485 rows=1378310 loops=5)
Filter: ((is_false = 0) AND (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[])))
Rows Removed by Filter: 3268432
Planning Time: 0.405 ms
Execution Time: 3510.433 ms

Re: postgresql-14 slow query

От

Jeff Janes

Дата:

16 апреля 2022 г., 04:37:10

On Fri, Apr 15, 2022 at 4:59 PM Kenny Bachman <kenny.bachman17@gmail.com> wrote:

Hello Team,

How can I tune this query? It got even slower when I created the index for (state_id, order_id desc). The following explain analyze output is without an index. It takes 13 seconds if I create that index. Could you help me?

Thank you so much for your help.

SELECT DISTINCT ON (order_history.order_id) order_id, order_history.creation_date AS c_date
FROM work.order_history WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
ORDER BY order_history.order_id DESC;

This query doesn't make much sense to me. You are selecting an arbitrary creation_date for each order_id, which seems like a weird thing to do on purpose. Is your ORDER BY supposed to list another column in it to break the ties?

I wonder how much benefit you are actually getting from the parallel workers. If you lower max_parallel_workers_per_gather, does the plan take proportionally longer?

You should turn on track_io_timing, then repeat the query with EXPLAIN (ANALYZE, BUFFERS).

Cheers,

Jeff

Re: postgresql-14 slow query

От

Avinash Vallarapu

Дата:

16 апреля 2022 г., 05:21:52

While I do not have much clarity into the schema and a lot of other insights, you might want to see my points below.

On Fri, Apr 15, 2022 at 4:59 PM Kenny Bachman <kenny.bachman17@gmail.com> wrote:

Hello Team,

How can I tune this query? It got even slower when I created the index for (state_id, order_id desc). The following explain analyze output is without an index. It takes 13 seconds if I create that index. Could you help me?

Thank you so much for your help.

SELECT DISTINCT ON (order_history.order_id) order_id, order_history.creation_date AS c_date
FROM work.order_history WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
ORDER BY order_history.order_id DESC;

EXPLAIN ANALYZE output:

Unique (cost=672007.46..1519683.55 rows=206423 width=12) (actual time=1701.420..3439.095 rows=2049357 loops=1)

Planner estimates vs actual rows seems to have some variation.

Firstly, have you manually updated statistics and verified if estimates can be almost near to actual ?

VACUUM ANALYZE work.order_history;

-> Gather Merge (cost=672007.46..1502346.48 rows=6934827 width=12) (actual time=1701.419..2989.243 rows=6891551 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Sort (cost=671007.40..675341.67 rows=1733707 width=12) (actual time=1657.609..1799.723 rows=1378310 loops=5)
Sort Key: order_id DESC
Sort Method: external merge Disk: 38960kB
Worker 0: Sort Method: external merge Disk: 31488kB
Worker 1: Sort Method: external merge Disk: 36120kB
Worker 2: Sort Method: external merge Disk: 31368kB
Worker 3: Sort Method: external merge Disk: 36152kB

Again, I am not suggesting a query rewrite or a change in schema (indexing or etc) as I do not have much clarity and thus not discussing it here.

However, setting a session level work_mem of 40MB should show some immediate difference.

SET work_mem TO '40MB';

-> Parallel Seq Scan on order_history (cost=0.00..473993.00 rows=1733707 width=12) (actual time=0.041..1211.485 rows=1378310 loops=5)
Filter: ((is_false = 0) AND (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[])))
Rows Removed by Filter: 3268432
Planning Time: 0.405 ms
Execution Time: 3510.433 ms

Regards,
Avinash Vallarapu

CEO,

MigOps, Inc.

www.migops.com

Re: postgresql-14 slow query

От

Paul Smith

Дата:

16 апреля 2022 г., 08:41:50

"for fun" try doing

Set enable_seqscan=off

And try the queries again. This will discourage it from doing sequential scans and use indexes if possible. If it still does a sequential scan, then there's some reason it can't use the indexes. If it uses indexes now, then the planner must think that the sequential scan would be quicker

On 15 April 2022 22:00:07 Kenny Bachman <kenny.bachman17@gmail.com> wrote:

Hello Team,

How can I tune this query? It got even slower when I created the index for (state_id, order_id desc). The following explain analyze output is without an index. It takes 13 seconds if I create that index. Could you help me?

Thank you so much for your help.

SELECT DISTINCT ON (order_history.order_id) order_id, order_history.creation_date AS c_date
FROM work.order_history WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
ORDER BY order_history.order_id DESC;

EXPLAIN ANALYZE output:

Unique (cost=672007.46..1519683.55 rows=206423 width=12) (actual time=1701.420..3439.095 rows=2049357 loops=1)
-> Gather Merge (cost=672007.46..1502346.48 rows=6934827 width=12) (actual time=1701.419..2989.243 rows=6891551 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Sort (cost=671007.40..675341.67 rows=1733707 width=12) (actual time=1657.609..1799.723 rows=1378310 loops=5)
Sort Key: order_id DESC
Sort Method: external merge Disk: 38960kB
Worker 0: Sort Method: external merge Disk: 31488kB
Worker 1: Sort Method: external merge Disk: 36120kB
Worker 2: Sort Method: external merge Disk: 31368kB
Worker 3: Sort Method: external merge Disk: 36152kB
-> Parallel Seq Scan on order_history (cost=0.00..473993.00 rows=1733707 width=12) (actual time=0.041..1211.485 rows=1378310 loops=5)
Filter: ((is_false = 0) AND (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[])))
Rows Removed by Filter: 3268432
Planning Time: 0.405 ms
Execution Time: 3510.433 ms

Paul Smith Computer Services

Tel: 01484 855800

Vat No: GB 685 6987 53

Re: postgresql-14 slow query

От

Kenny Bachman

Дата:

16 апреля 2022 г., 09:26:02

Thank you for your all e-mails. I want to provide more information about the query and table.

Table count(*) is = 24M

is_false column doesn't selectivity, so is_false=0 count(*) is ~24M.

My work _mem is 64MB

Also, my original query is below. I changed the query to ( distinct on ) in order to avoid hashaggregate.

SELECT order_history.order_id,
max(order_history.creation_date) AS c_date
FROM work.order_history
WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
GROUP BY order_history.order_id;

EXPLAIN (ANALYZE ,BUFFERS) SELECT order_history.order_id,
max(order_history.creation_date) AS c_date
FROM work.order_history
WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
GROUP BY order_history.order_id;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize HashAggregate (cost=572423.43..574487.66 rows=206423 width=12) (actual time=2579.826..3394.787 rows=2049357 loops=1)
Group Key: order_id
Batches: 5 Memory Usage: 65585kB Disk Usage: 56504kB
Buffers: shared hit=350539, temp read=6850 written=12167
-> Gather (cost=483661.54..568294.97 rows=825692 width=12) (actual time=1650.341..1961.442 rows=2085884 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=350539
-> Partial HashAggregate (cost=482661.54..484725.77 rows=206423 width=12) (actual time=1641.882..1730.569 rows=417177 loops=5)
Group Key: order_id
Batches: 1 Memory Usage: 61457kB
Buffers: shared hit=350539
Worker 0: Batches: 1 Memory Usage: 45073kB
Worker 1: Batches: 1 Memory Usage: 57361kB
Worker 2: Batches: 1 Memory Usage: 57361kB
Worker 3: Batches: 1 Memory Usage: 61457kB
-> Parallel Seq Scan on order_history (cost=0.00..473993.00 rows=1733707 width=12) (actual time=0.025..1279.366 rows=1378310 loops=5)
Filter: ((is_false = 0) AND (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[])))
Rows Removed by Filter: 3268432
Buffers: shared hit=350539
Planning:
Buffers: shared hit=7
Planning Time: 0.644 ms
Execution Time: 3482.662 ms

And the execution plan after creating an index;

CREATE INDEX ON work.order_history (state_id ,order_id);
CREATE INDEX
ANALYZE work.order_history;
ANALYZE
EXPLAIN (ANALYZE ,BUFFERS) SELECT order_history.order_id,
max(order_history.creation_date) AS c_date
FROM work.order_history
WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
GROUP BY order_history.order_id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=558809.02..560882.30 rows=207328 width=12) (actual time=6050.759..7598.165 rows=2049357 loops=1)
Group Key: order_id
Batches: 5 Memory Usage: 65585kB Disk Usage: 226272kB
Buffers: shared hit=6007222 read=9864, temp read=28099 written=49004
-> Index Scan using order_history_state_id_order_id_idx on order_history (cost=0.44..524241.93 rows=6913417 width=12) (actual time=0.079..4137.626 rows=6891551 loops=1)
Index Cond: (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[]))
Filter: (is_false = 0)
Rows Removed by Filter: 5301
Buffers: shared hit=6007222 read=9864
Planning:
Buffers: shared hit=53 read=2
Planning Time: 0.634 ms
Execution Time: 7695.625 ms

Paul Smith <paul@pscs.co.uk>, 16 Nis 2022 Cmt, 11:47 tarihinde şunu yazdı:

"for fun" try doing

Set enable_seqscan=off

And try the queries again. This will discourage it from doing sequential scans and use indexes if possible. If it still does a sequential scan, then there's some reason it can't use the indexes. If it uses indexes now, then the planner must think that the sequential scan would be quicker

On 15 April 2022 22:00:07 Kenny Bachman <kenny.bachman17@gmail.com> wrote:
Hello Team,

How can I tune this query? It got even slower when I created the index for (state_id, order_id desc). The following explain analyze output is without an index. It takes 13 seconds if I create that index. Could you help me?

Thank you so much for your help.

SELECT DISTINCT ON (order_history.order_id) order_id, order_history.creation_date AS c_date
FROM work.order_history WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
ORDER BY order_history.order_id DESC;

EXPLAIN ANALYZE output:

Unique (cost=672007.46..1519683.55 rows=206423 width=12) (actual time=1701.420..3439.095 rows=2049357 loops=1)
-> Gather Merge (cost=672007.46..1502346.48 rows=6934827 width=12) (actual time=1701.419..2989.243 rows=6891551 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Sort (cost=671007.40..675341.67 rows=1733707 width=12) (actual time=1657.609..1799.723 rows=1378310 loops=5)
Sort Key: order_id DESC
Sort Method: external merge Disk: 38960kB
Worker 0: Sort Method: external merge Disk: 31488kB
Worker 1: Sort Method: external merge Disk: 36120kB
Worker 2: Sort Method: external merge Disk: 31368kB
Worker 3: Sort Method: external merge Disk: 36152kB
-> Parallel Seq Scan on order_history (cost=0.00..473993.00 rows=1733707 width=12) (actual time=0.041..1211.485 rows=1378310 loops=5)
Filter: ((is_false = 0) AND (state_id = ANY ('{30,51,63,136,195,233,348}'::integer[])))
Rows Removed by Filter: 3268432
Planning Time: 0.405 ms
Execution Time: 3510.433 ms

--

Paul Smith Computer Services
Tel: 01484 855800
Vat No: GB 685 6987 53

Sign up for news & updates

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: postgresql-14 slow query