Обсуждение: Query taking seq scan on a table

Поиск

Список

Период

Сортировка

Query taking seq scan on a table

От

Shrikant Bhende

Дата:

20 сентября 2020 г., 13:18:46

Hello all,

I am facing issues with one of the queries running on Amazon Aurora (PG VERSION 9.6) which is taking more than a minute to complete. As per the initial investigation I assume that one part of the query is taking a sequential scan on a table("wldbowner.member") which is consuming most of the execution time. I tried to use index (Btree and GIN ) as well but none of them were helping to get the query to run faster. Attached is the query along with the explain analyze of the same, any help on the same would be appreciated.

NOTE : I have done the vacuum on the table already to get rid of bloat issues.

Reindex and test with more work mem is also helping.

Thanks and regards

Вложения

Query_Explain.docx

Re: Query taking seq scan on a table

От

Avinash Kumar

Дата:

20 сентября 2020 г., 13:25:25

On Sun., Sep. 20, 2020, 10:19 a.m. Shrikant Bhende, <shrikantpostgresql@gmail.com> wrote:

Hello all,

I am facing issues with one of the queries running on Amazon Aurora (PG VERSION 9.6) which is taking more than a minute to complete. As per the initial investigation I assume that one part of the query is taking a sequential scan on a table("wldbowner.member") which is consuming most of the execution time. I tried to use index (Btree and GIN ) as well but none of them were helping to get the query to run faster. Attached is the query along with the explain analyze of the same, any help on the same would be appreciated.

NOTE : I have done the vacuum on the table already to get rid of bloat issues.
Reindex and test with more work mem is also helping.

Just wanted to clarify one thing before looking at the query.

What is random_page_cost set to ?

Thanks and regards

Re: Query taking seq scan on a table

От

Shrikant Bhende

Дата:

20 сентября 2020 г., 13:48:22

Hello,

show seq_page_cost; 1

show random_page_cost; 4

storage type : SSD

Table size : 39 GB

There are no columns added, also I haven't found anything where we need to update the data for older rows.

Thanks

On Sun, Sep 20, 2020 at 6:55 PM Avinash Kumar <avinash.vallarapu@gmail.com> wrote:

On Sun., Sep. 20, 2020, 10:19 a.m. Shrikant Bhende, <shrikantpostgresql@gmail.com> wrote:
Hello all,

I am facing issues with one of the queries running on Amazon Aurora (PG VERSION 9.6) which is taking more than a minute to complete. As per the initial investigation I assume that one part of the query is taking a sequential scan on a table("wldbowner.member") which is consuming most of the execution time. I tried to use index (Btree and GIN ) as well but none of them were helping to get the query to run faster. Attached is the query along with the explain analyze of the same, any help on the same would be appreciated.

NOTE : I have done the vacuum on the table already to get rid of bloat issues.
Reindex and test with more work mem is also helping.
Just wanted to clarify one thing before looking at the query.

What is random_page_cost set to ?

Thanks and regards

Re: Query taking seq scan on a table

От

Avinash Kumar

Дата:

20 сентября 2020 г., 13:55:12

On Sun., Sep. 20, 2020, 10:48 a.m. Shrikant Bhende, <shrikantpostgresql@gmail.com> wrote:

Hello,

show seq_page_cost; 1

show random_page_cost; 4

Set this to 1 and rerun the explain analyze and let us know what you see.

You are using an SSD and the value of 4 should not be apt.

storage type : SSD

Table size : 39 GB

There are no columns added, also I haven't found anything where we need to update the data for older rows.

Thanks

On Sun, Sep 20, 2020 at 6:55 PM Avinash Kumar <avinash.vallarapu@gmail.com> wrote:

On Sun., Sep. 20, 2020, 10:19 a.m. Shrikant Bhende, <shrikantpostgresql@gmail.com> wrote:
Hello all,

I am facing issues with one of the queries running on Amazon Aurora (PG VERSION 9.6) which is taking more than a minute to complete. As per the initial investigation I assume that one part of the query is taking a sequential scan on a table("wldbowner.member") which is consuming most of the execution time. I tried to use index (Btree and GIN ) as well but none of them were helping to get the query to run faster. Attached is the query along with the explain analyze of the same, any help on the same would be appreciated.

NOTE : I have done the vacuum on the table already to get rid of bloat issues.
Reindex and test with more work mem is also helping.
Just wanted to clarify one thing before looking at the query.

What is random_page_cost set to ?

Thanks and regards

Re: Query taking seq scan on a table

От

Shrikant Bhende

Дата:

21 сентября 2020 г., 04:51:25

Hello,

Below is the explain analyze after setting random_page_cost to 1;

Limit (cost=11638306.20..11638311.56 rows=102 width=209) (actual time=73231.121..73231.122 rows=1 loops=1)
-> Unique (cost=11638306.20..11638329.83 rows=450 width=209) (actual time=73231.120..73231.120 rows=1 loops=1)
-> Sort (cost=11638306.20..11638307.33 rows=450 width=209) (actual time=73231.120..73231.120 rows=1 loops=1)
Sort Key: m.mem_id, m.member_type_id, m.lname, m.fname, m.email, m.addr, m.zip, m.join_date, p.product_desc, b.product_name, c.cln_name, m.member_end_date, (CASE mc2.member_cancel_type_id WHEN '2'::numeric THEN mc2.member_cancel_type_id ELSE mc1.member_cancel_type_id END), m.active_fl, (CASE WHEN (ca.mem_id IS NULL) THEN NULL::integer ELSE 66 END), (CASE WHEN (gmc.mem_id IS NOT NULL) THEN 1 ELSE 0 END), (CASE WHEN (ccpa.mem_id IS NOT NULL) THEN 1 ELSE 0 END), gmc.declassification_start_date, gmc.anonymization_date, gmc.request_source
Sort Method: quicksort Memory: 25kB
-> Nested Loop Left Join (cost=1.46..11638286.37 rows=450 width=209) (actual time=73231.063..73231.098 rows=1 loops=1)
Join Filter: (m.mem_id = ccpa.mem_id)
-> Nested Loop Left Join (cost=1.46..11638267.69 rows=450 width=213) (actual time=73231.053..73231.088 rows=1 loops=1)
Join Filter: (m.mem_id = gmc.mem_id)
-> Nested Loop Left Join (cost=1.46..11637236.72 rows=450 width=167) (actual time=73231.048..73231.083 rows=1 loops=1)
-> Nested Loop (cost=1.02..11636127.22 rows=450 width=162) (actual time=73231.037..73231.071 rows=1 loops=1)
Join Filter: (s.sol_template_id = st.sol_template_id)
Rows Removed by Join Filter: 5565
-> Nested Loop (cost=1.02..11598155.50 rows=515 width=167) (actual time=73228.007..73228.038 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.73..11597971.55 rows=515 width=174) (actual time=73227.994..73228.024 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.29..11596701.79 rows=515 width=169) (actual time=73227.969..73227.999 rows=1 loops=1)
Join Filter: (ca.mem_id = m.mem_id)
-> Nested Loop (cost=0.29..11582996.41 rows=515 width=151) (actual time=73227.962..73227.991 rows=1 loops=1)
Join Filter: (websites.wbs_client_id = c.cln_client_id)
Rows Removed by Join Filter: 2343
-> Seq Scan on clients c (cost=0.00..46.75 rows=2375 width=21) (actual time=0.006..0.184 rows=2344 loops=1)
-> Materialize (cost=0.29..11564604.07 rows=515 width=142) (actual time=31.240..31.240 rows=1 loops=2344)
-> Nested Loop (cost=0.29..11564601.50 rows=515 width=142) (actual time=73226.808..73226.836 rows=1 loops=1)
Join Filter: (campaigns.cam_website_id = websites.wbs_website_id)
Rows Removed by Join Filter: 3721
-> Seq Scan on websites (cost=0.00..97.68 rows=3768 width=12) (actual time=0.002..0.350 rows=3722 loops=1)
-> Materialize (cost=0.29..11535397.31 rows=515 width=142) (actual time=11.071..19.674 rows=1 loops=3722)
-> Nested Loop (cost=0.29..11535394.73 rows=515 width=142) (actual time=41204.676..73224.912 rows=1 loops=1)
-> Nested Loop (cost=0.00..11534442.93 rows=515 width=130) (actual time=41204.658..73224.892 rows=1 loops=1)
Join Filter: (m.brn_id = b.brn_id)
Rows Removed by Join Filter: 202
-> Nested Loop (cost=0.00..11532858.96 rows=515 width=116) (actual time=41204.622..73224.764 rows=1 loops=1)
Join Filter: (m.product_id = p.product_id)
Rows Removed by Join Filter: 114
-> Seq Scan on member m (cost=0.00..11531974.88 rows=515 width=100) (actual time=41204.577..73224.687 rows=1 loops=1)
Filter: ((lower(unaccent_string((lname)::text)) ~~ 'info%'::text) AND (lower(unaccent_string((fname)::text)) ~~ 'travel%'::text))
Rows Removed by Filter: 20595444
-> Materialize (cost=0.00..3.71 rows=114 width=27) (actual time=0.012..0.054 rows=115 loops=1)
-> Seq Scan on product p (cost=0.00..3.14 rows=114 width=27) (actual time=0.006..0.021 rows=115 loops=1)
-> Materialize (cost=0.00..24.03 rows=202 width=26) (actual time=0.006..0.088 rows=203 loops=1)
-> Seq Scan on brand b (cost=0.00..23.02 rows=202 width=26) (actual time=0.002..0.054 rows=203 loops=1)
-> Index Scan using pk_campaigns on campaigns (cost=0.29..1.84 rows=1 width=12) (actual time=0.013..0.014 rows=1 loops=1)
Index Cond: (cam_campaign_id = m.sol_id)
-> Materialize (cost=0.00..36.55 rows=1770 width=18) (actual time=0.007..0.007 rows=0 loops=1)
-> Seq Scan on iot_2009_ca_member ca (cost=0.00..27.70 rows=1770 width=18) (actual time=0.002..0.002 rows=0 loops=1)
-> Index Scan using pk_member_cancel on member_cancel mc1 (cost=0.44..2.46 rows=1 width=11) (actual time=0.020..0.020 rows=0 loops=1)
Index Cond: (m.mem_id = mem_id)
Filter: (member_cancel_type_id = '1'::numeric)
-> Index Scan using pk_solicitation on solicitation s (cost=0.29..0.35 rows=1 width=11) (actual time=0.009..0.010 rows=1 loops=1)
Index Cond: (sol_id = campaigns.cam_campaign_id)
-> Materialize (cost=0.00..285.92 rows=4880 width=5) (actual time=0.018..2.349 rows=5566 loops=1)
-> Seq Scan on solicitation_template st (cost=0.00..261.52 rows=4880 width=5) (actual time=0.009..1.547 rows=5566 loops=1)
Filter: (country_id = ANY ('{1,2,3,121,121,4,5,6,7,8,9,10,11,12,13,14}'::numeric[]))
-> Index Scan using pk_member_cancel on member_cancel mc2 (cost=0.44..2.46 rows=1 width=11) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: (m.mem_id = mem_id)
Filter: (member_cancel_type_id = '2'::numeric)
-> Materialize (cost=0.00..25.59 rows=149 width=46) (actual time=0.004..0.004 rows=0 loops=1)
-> Seq Scan on gdpr_member_classification gmc (cost=0.00..24.85 rows=149 width=46) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((declassification_start_date IS NOT NULL) AND (current_fl = 1) AND (date_trunc('day'::text, declassification_start_date) <= to_date('9/17/2020'::text, 'MM/DD/YYYY'::text)) AND ((declassification_end_date IS NULL) OR (date_trunc('day'::text, declassification_end_date) > to_date('9/17/2020'::text, 'MM/DD/YYYY'::text))))
-> Materialize (cost=0.00..10.81 rows=1 width=18) (actual time=0.003..0.003 rows=0 loops=1)
-> Seq Scan on ccpa_member_classification ccpa (cost=0.00..10.80 rows=1 width=18) (actual time=0.003..0.003 rows=0 loops=1)
Filter: ((anonymization_date IS NOT NULL) AND (current_fl = 1) AND (date_trunc('day'::text, anonymization_date) <= to_date('9/17/2020'::text, 'MM/DD/YYYY'::text)))
Planning time: 7.622 ms
Execution time: 73231.463 ms

Also I have tried to add a GIN index for better text search as below,

CREATE INDEX idx_fnmae_lname_gin_composite ON wldbowner.member USING gin
(lower((((fname)::text || ' '::text) || (lname)::text)) rdsadmin.gin_trgm_ops);

Thanks

On Sun, Sep 20, 2020 at 7:25 PM Avinash Kumar <avinash.vallarapu@gmail.com> wrote:

On Sun., Sep. 20, 2020, 10:48 a.m. Shrikant Bhende, <shrikantpostgresql@gmail.com> wrote:
Hello,

show seq_page_cost; 1

show random_page_cost; 4
Set this to 1 and rerun the explain analyze and let us know what you see.
You are using an SSD and the value of 4 should not be apt.

storage type : SSD

Table size : 39 GB

There are no columns added, also I haven't found anything where we need to update the data for older rows.

Thanks

On Sun, Sep 20, 2020 at 6:55 PM Avinash Kumar <avinash.vallarapu@gmail.com> wrote:

On Sun., Sep. 20, 2020, 10:19 a.m. Shrikant Bhende, <shrikantpostgresql@gmail.com> wrote:
Hello all,

I am facing issues with one of the queries running on Amazon Aurora (PG VERSION 9.6) which is taking more than a minute to complete. As per the initial investigation I assume that one part of the query is taking a sequential scan on a table("wldbowner.member") which is consuming most of the execution time. I tried to use index (Btree and GIN ) as well but none of them were helping to get the query to run faster. Attached is the query along with the explain analyze of the same, any help on the same would be appreciated.

NOTE : I have done the vacuum on the table already to get rid of bloat issues.
Reindex and test with more work mem is also helping.
Just wanted to clarify one thing before looking at the query.

What is random_page_cost set to ?

Thanks and regards

Re: Query taking seq scan on a table

От

Shrikant Bhende

Дата:

21 сентября 2020 г., 05:04:12

Hello,

Configuration of instance :

Type : Amazon Aurora

DB instance : db.r4Large

Vcpu : 2
ECU : 7
Total Memory : 15.25 GB

Shared buffers are set to 75% of total RAM as per the AWS recommendations and random_page_cost is already altered to 1 and shared the relevant Explain analyze plan for the same.

Thanks

On Mon, Sep 21, 2020 at 10:21 AM Shrikant Bhende <shrikantpostgresql@gmail.com> wrote:

Hello,

Below is the explain analyze after setting random_page_cost to 1;

Limit (cost=11638306.20..11638311.56 rows=102 width=209) (actual time=73231.121..73231.122 rows=1 loops=1)
-> Unique (cost=11638306.20..11638329.83 rows=450 width=209) (actual time=73231.120..73231.120 rows=1 loops=1)
-> Sort (cost=11638306.20..11638307.33 rows=450 width=209) (actual time=73231.120..73231.120 rows=1 loops=1)
Sort Key: m.mem_id, m.member_type_id, m.lname, m.fname, m.email, m.addr, m.zip, m.join_date, p.product_desc, b.product_name, c.cln_name, m.member_end_date, (CASE mc2.member_cancel_type_id WHEN '2'::numeric THEN mc2.member_cancel_type_id ELSE mc1.member_cancel_type_id END), m.active_fl, (CASE WHEN (ca.mem_id IS NULL) THEN NULL::integer ELSE 66 END), (CASE WHEN (gmc.mem_id IS NOT NULL) THEN 1 ELSE 0 END), (CASE WHEN (ccpa.mem_id IS NOT NULL) THEN 1 ELSE 0 END), gmc.declassification_start_date, gmc.anonymization_date, gmc.request_source
Sort Method: quicksort Memory: 25kB
-> Nested Loop Left Join (cost=1.46..11638286.37 rows=450 width=209) (actual time=73231.063..73231.098 rows=1 loops=1)
Join Filter: (m.mem_id = ccpa.mem_id)
-> Nested Loop Left Join (cost=1.46..11638267.69 rows=450 width=213) (actual time=73231.053..73231.088 rows=1 loops=1)
Join Filter: (m.mem_id = gmc.mem_id)
-> Nested Loop Left Join (cost=1.46..11637236.72 rows=450 width=167) (actual time=73231.048..73231.083 rows=1 loops=1)
-> Nested Loop (cost=1.02..11636127.22 rows=450 width=162) (actual time=73231.037..73231.071 rows=1 loops=1)
Join Filter: (s.sol_template_id = st.sol_template_id)
Rows Removed by Join Filter: 5565
-> Nested Loop (cost=1.02..11598155.50 rows=515 width=167) (actual time=73228.007..73228.038 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.73..11597971.55 rows=515 width=174) (actual time=73227.994..73228.024 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.29..11596701.79 rows=515 width=169) (actual time=73227.969..73227.999 rows=1 loops=1)
Join Filter: (ca.mem_id = m.mem_id)
-> Nested Loop (cost=0.29..11582996.41 rows=515 width=151) (actual time=73227.962..73227.991 rows=1 loops=1)
Join Filter: (websites.wbs_client_id = c.cln_client_id)
Rows Removed by Join Filter: 2343
-> Seq Scan on clients c (cost=0.00..46.75 rows=2375 width=21) (actual time=0.006..0.184 rows=2344 loops=1)
-> Materialize (cost=0.29..11564604.07 rows=515 width=142) (actual time=31.240..31.240 rows=1 loops=2344)
-> Nested Loop (cost=0.29..11564601.50 rows=515 width=142) (actual time=73226.808..73226.836 rows=1 loops=1)
Join Filter: (campaigns.cam_website_id = websites.wbs_website_id)
Rows Removed by Join Filter: 3721
-> Seq Scan on websites (cost=0.00..97.68 rows=3768 width=12) (actual time=0.002..0.350 rows=3722 loops=1)
-> Materialize (cost=0.29..11535397.31 rows=515 width=142) (actual time=11.071..19.674 rows=1 loops=3722)
-> Nested Loop (cost=0.29..11535394.73 rows=515 width=142) (actual time=41204.676..73224.912 rows=1 loops=1)
-> Nested Loop (cost=0.00..11534442.93 rows=515 width=130) (actual time=41204.658..73224.892 rows=1 loops=1)
Join Filter: (m.brn_id = b.brn_id)
Rows Removed by Join Filter: 202
-> Nested Loop (cost=0.00..11532858.96 rows=515 width=116) (actual time=41204.622..73224.764 rows=1 loops=1)
Join Filter: (m.product_id = p.product_id)
Rows Removed by Join Filter: 114
-> Seq Scan on member m (cost=0.00..11531974.88 rows=515 width=100) (actual time=41204.577..73224.687 rows=1 loops=1)
Filter: ((lower(unaccent_string((lname)::text)) ~~ 'info%'::text) AND (lower(unaccent_string((fname)::text)) ~~ 'travel%'::text))
Rows Removed by Filter: 20595444
-> Materialize (cost=0.00..3.71 rows=114 width=27) (actual time=0.012..0.054 rows=115 loops=1)
-> Seq Scan on product p (cost=0.00..3.14 rows=114 width=27) (actual time=0.006..0.021 rows=115 loops=1)
-> Materialize (cost=0.00..24.03 rows=202 width=26) (actual time=0.006..0.088 rows=203 loops=1)
-> Seq Scan on brand b (cost=0.00..23.02 rows=202 width=26) (actual time=0.002..0.054 rows=203 loops=1)
-> Index Scan using pk_campaigns on campaigns (cost=0.29..1.84 rows=1 width=12) (actual time=0.013..0.014 rows=1 loops=1)
Index Cond: (cam_campaign_id = m.sol_id)
-> Materialize (cost=0.00..36.55 rows=1770 width=18) (actual time=0.007..0.007 rows=0 loops=1)
-> Seq Scan on iot_2009_ca_member ca (cost=0.00..27.70 rows=1770 width=18) (actual time=0.002..0.002 rows=0 loops=1)
-> Index Scan using pk_member_cancel on member_cancel mc1 (cost=0.44..2.46 rows=1 width=11) (actual time=0.020..0.020 rows=0 loops=1)
Index Cond: (m.mem_id = mem_id)
Filter: (member_cancel_type_id = '1'::numeric)
-> Index Scan using pk_solicitation on solicitation s (cost=0.29..0.35 rows=1 width=11) (actual time=0.009..0.010 rows=1 loops=1)
Index Cond: (sol_id = campaigns.cam_campaign_id)
-> Materialize (cost=0.00..285.92 rows=4880 width=5) (actual time=0.018..2.349 rows=5566 loops=1)
-> Seq Scan on solicitation_template st (cost=0.00..261.52 rows=4880 width=5) (actual time=0.009..1.547 rows=5566 loops=1)
Filter: (country_id = ANY ('{1,2,3,121,121,4,5,6,7,8,9,10,11,12,13,14}'::numeric[]))
-> Index Scan using pk_member_cancel on member_cancel mc2 (cost=0.44..2.46 rows=1 width=11) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: (m.mem_id = mem_id)
Filter: (member_cancel_type_id = '2'::numeric)
-> Materialize (cost=0.00..25.59 rows=149 width=46) (actual time=0.004..0.004 rows=0 loops=1)
-> Seq Scan on gdpr_member_classification gmc (cost=0.00..24.85 rows=149 width=46) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((declassification_start_date IS NOT NULL) AND (current_fl = 1) AND (date_trunc('day'::text, declassification_start_date) <= to_date('9/17/2020'::text, 'MM/DD/YYYY'::text)) AND ((declassification_end_date IS NULL) OR (date_trunc('day'::text, declassification_end_date) > to_date('9/17/2020'::text, 'MM/DD/YYYY'::text))))
-> Materialize (cost=0.00..10.81 rows=1 width=18) (actual time=0.003..0.003 rows=0 loops=1)
-> Seq Scan on ccpa_member_classification ccpa (cost=0.00..10.80 rows=1 width=18) (actual time=0.003..0.003 rows=0 loops=1)
Filter: ((anonymization_date IS NOT NULL) AND (current_fl = 1) AND (date_trunc('day'::text, anonymization_date) <= to_date('9/17/2020'::text, 'MM/DD/YYYY'::text)))
Planning time: 7.622 ms
Execution time: 73231.463 ms

Also I have tried to add a GIN index for better text search as below,

CREATE INDEX idx_fnmae_lname_gin_composite ON wldbowner.member USING gin
(lower((((fname)::text || ' '::text) || (lname)::text)) rdsadmin.gin_trgm_ops);

Thanks

On Sun, Sep 20, 2020 at 7:25 PM Avinash Kumar <avinash.vallarapu@gmail.com> wrote:

On Sun., Sep. 20, 2020, 10:48 a.m. Shrikant Bhende, <shrikantpostgresql@gmail.com> wrote:
Hello,

show seq_page_cost; 1

show random_page_cost; 4
Set this to 1 and rerun the explain analyze and let us know what you see.
You are using an SSD and the value of 4 should not be apt.

storage type : SSD

Table size : 39 GB

There are no columns added, also I haven't found anything where we need to update the data for older rows.

Thanks

On Sun, Sep 20, 2020 at 6:55 PM Avinash Kumar <avinash.vallarapu@gmail.com> wrote:

On Sun., Sep. 20, 2020, 10:19 a.m. Shrikant Bhende, <shrikantpostgresql@gmail.com> wrote:
Hello all,

I am facing issues with one of the queries running on Amazon Aurora (PG VERSION 9.6) which is taking more than a minute to complete. As per the initial investigation I assume that one part of the query is taking a sequential scan on a table("wldbowner.member") which is consuming most of the execution time. I tried to use index (Btree and GIN ) as well but none of them were helping to get the query to run faster. Attached is the query along with the explain analyze of the same, any help on the same would be appreciated.

NOTE : I have done the vacuum on the table already to get rid of bloat issues.
Reindex and test with more work mem is also helping.
Just wanted to clarify one thing before looking at the query.

What is random_page_cost set to ?

Thanks and regards

Re: Query taking seq scan on a table

От

Jim Gmail

Дата:

21 сентября 2020 г., 06:40:37

If you can , avoid using DISTINCT and the sorting this invokes

Sent from my iPhone

On 21 Sep 2020, at 15:04, Shrikant Bhende <shrikantpostgresql@gmail.com> wrote:

Hello,

Configuration of instance :
Type : Amazon Aurora
DB instance : db.r4Large
Vcpu : 2
ECU : 7
Total Memory : 15.25 GB

Shared buffers are set to 75% of total RAM as per the AWS recommendations and random_page_cost is already altered to 1 and shared the relevant Explain analyze plan for the same.

Thanks

On Mon, Sep 21, 2020 at 10:21 AM Shrikant Bhende <shrikantpostgresql@gmail.com> wrote:
Hello,

Below is the explain analyze after setting random_page_cost to 1;

Limit (cost=11638306.20..11638311.56 rows=102 width=209) (actual time=73231.121..73231.122 rows=1 loops=1)
-> Unique (cost=11638306.20..11638329.83 rows=450 width=209) (actual time=73231.120..73231.120 rows=1 loops=1)
-> Sort (cost=11638306.20..11638307.33 rows=450 width=209) (actual time=73231.120..73231.120 rows=1 loops=1)
Sort Key: m.mem_id, m.member_type_id, m.lname, m.fname, m.email, m.addr, m.zip, m.join_date, p.product_desc, b.product_name, c.cln_name, m.member_end_date, (CASE mc2.member_cancel_type_id WHEN '2'::numeric THEN mc2.member_cancel_type_id ELSE mc1.member_cancel_type_id END), m.active_fl, (CASE WHEN (ca.mem_id IS NULL) THEN NULL::integer ELSE 66 END), (CASE WHEN (gmc.mem_id IS NOT NULL) THEN 1 ELSE 0 END), (CASE WHEN (ccpa.mem_id IS NOT NULL) THEN 1 ELSE 0 END), gmc.declassification_start_date, gmc.anonymization_date, gmc.request_source
Sort Method: quicksort Memory: 25kB
-> Nested Loop Left Join (cost=1.46..11638286.37 rows=450 width=209) (actual time=73231.063..73231.098 rows=1 loops=1)
Join Filter: (m.mem_id = ccpa.mem_id)
-> Nested Loop Left Join (cost=1.46..11638267.69 rows=450 width=213) (actual time=73231.053..73231.088 rows=1 loops=1)
Join Filter: (m.mem_id = gmc.mem_id)
-> Nested Loop Left Join (cost=1.46..11637236.72 rows=450 width=167) (actual time=73231.048..73231.083 rows=1 loops=1)
-> Nested Loop (cost=1.02..11636127.22 rows=450 width=162) (actual time=73231.037..73231.071 rows=1 loops=1)
Join Filter: (s.sol_template_id = st.sol_template_id)
Rows Removed by Join Filter: 5565
-> Nested Loop (cost=1.02..11598155.50 rows=515 width=167) (actual time=73228.007..73228.038 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.73..11597971.55 rows=515 width=174) (actual time=73227.994..73228.024 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.29..11596701.79 rows=515 width=169) (actual time=73227.969..73227.999 rows=1 loops=1)
Join Filter: (ca.mem_id = m.mem_id)
-> Nested Loop (cost=0.29..11582996.41 rows=515 width=151) (actual time=73227.962..73227.991 rows=1 loops=1)
Join Filter: (websites.wbs_client_id = c.cln_client_id)
Rows Removed by Join Filter: 2343
-> Seq Scan on clients c (cost=0.00..46.75 rows=2375 width=21) (actual time=0.006..0.184 rows=2344 loops=1)
-> Materialize (cost=0.29..11564604.07 rows=515 width=142) (actual time=31.240..31.240 rows=1 loops=2344)
-> Nested Loop (cost=0.29..11564601.50 rows=515 width=142) (actual time=73226.808..73226.836 rows=1 loops=1)
Join Filter: (campaigns.cam_website_id = websites.wbs_website_id)
Rows Removed by Join Filter: 3721
-> Seq Scan on websites (cost=0.00..97.68 rows=3768 width=12) (actual time=0.002..0.350 rows=3722 loops=1)
-> Materialize (cost=0.29..11535397.31 rows=515 width=142) (actual time=11.071..19.674 rows=1 loops=3722)
-> Nested Loop (cost=0.29..11535394.73 rows=515 width=142) (actual time=41204.676..73224.912 rows=1 loops=1)
-> Nested Loop (cost=0.00..11534442.93 rows=515 width=130) (actual time=41204.658..73224.892 rows=1 loops=1)
Join Filter: (m.brn_id = b.brn_id)
Rows Removed by Join Filter: 202
-> Nested Loop (cost=0.00..11532858.96 rows=515 width=116) (actual time=41204.622..73224.764 rows=1 loops=1)
Join Filter: (m.product_id = p.product_id)
Rows Removed by Join Filter: 114
-> Seq Scan on member m (cost=0.00..11531974.88 rows=515 width=100) (actual time=41204.577..73224.687 rows=1 loops=1)
Filter: ((lower(unaccent_string((lname)::text)) ~~ 'info%'::text) AND (lower(unaccent_string((fname)::text)) ~~ 'travel%'::text))
Rows Removed by Filter: 20595444
-> Materialize (cost=0.00..3.71 rows=114 width=27) (actual time=0.012..0.054 rows=115 loops=1)
-> Seq Scan on product p (cost=0.00..3.14 rows=114 width=27) (actual time=0.006..0.021 rows=115 loops=1)
-> Materialize (cost=0.00..24.03 rows=202 width=26) (actual time=0.006..0.088 rows=203 loops=1)
-> Seq Scan on brand b (cost=0.00..23.02 rows=202 width=26) (actual time=0.002..0.054 rows=203 loops=1)
-> Index Scan using pk_campaigns on campaigns (cost=0.29..1.84 rows=1 width=12) (actual time=0.013..0.014 rows=1 loops=1)
Index Cond: (cam_campaign_id = m.sol_id)
-> Materialize (cost=0.00..36.55 rows=1770 width=18) (actual time=0.007..0.007 rows=0 loops=1)
-> Seq Scan on iot_2009_ca_member ca (cost=0.00..27.70 rows=1770 width=18) (actual time=0.002..0.002 rows=0 loops=1)
-> Index Scan using pk_member_cancel on member_cancel mc1 (cost=0.44..2.46 rows=1 width=11) (actual time=0.020..0.020 rows=0 loops=1)
Index Cond: (m.mem_id = mem_id)
Filter: (member_cancel_type_id = '1'::numeric)
-> Index Scan using pk_solicitation on solicitation s (cost=0.29..0.35 rows=1 width=11) (actual time=0.009..0.010 rows=1 loops=1)
Index Cond: (sol_id = campaigns.cam_campaign_id)
-> Materialize (cost=0.00..285.92 rows=4880 width=5) (actual time=0.018..2.349 rows=5566 loops=1)
-> Seq Scan on solicitation_template st (cost=0.00..261.52 rows=4880 width=5) (actual time=0.009..1.547 rows=5566 loops=1)
Filter: (country_id = ANY ('{1,2,3,121,121,4,5,6,7,8,9,10,11,12,13,14}'::numeric[]))
-> Index Scan using pk_member_cancel on member_cancel mc2 (cost=0.44..2.46 rows=1 width=11) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: (m.mem_id = mem_id)
Filter: (member_cancel_type_id = '2'::numeric)
-> Materialize (cost=0.00..25.59 rows=149 width=46) (actual time=0.004..0.004 rows=0 loops=1)
-> Seq Scan on gdpr_member_classification gmc (cost=0.00..24.85 rows=149 width=46) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((declassification_start_date IS NOT NULL) AND (current_fl = 1) AND (date_trunc('day'::text, declassification_start_date) <= to_date('9/17/2020'::text, 'MM/DD/YYYY'::text)) AND ((declassification_end_date IS NULL) OR (date_trunc('day'::text, declassification_end_date) > to_date('9/17/2020'::text, 'MM/DD/YYYY'::text))))
-> Materialize (cost=0.00..10.81 rows=1 width=18) (actual time=0.003..0.003 rows=0 loops=1)
-> Seq Scan on ccpa_member_classification ccpa (cost=0.00..10.80 rows=1 width=18) (actual time=0.003..0.003 rows=0 loops=1)
Filter: ((anonymization_date IS NOT NULL) AND (current_fl = 1) AND (date_trunc('day'::text, anonymization_date) <= to_date('9/17/2020'::text, 'MM/DD/YYYY'::text)))
Planning time: 7.622 ms
Execution time: 73231.463 ms

Also I have tried to add a GIN index for better text search as below,

CREATE INDEX idx_fnmae_lname_gin_composite ON wldbowner.member USING gin
(lower((((fname)::text || ' '::text) || (lname)::text)) rdsadmin.gin_trgm_ops);

Thanks

On Sun, Sep 20, 2020 at 7:25 PM Avinash Kumar <avinash.vallarapu@gmail.com> wrote:

On Sun., Sep. 20, 2020, 10:48 a.m. Shrikant Bhende, <shrikantpostgresql@gmail.com> wrote:
Hello,

show seq_page_cost; 1

show random_page_cost; 4
Set this to 1 and rerun the explain analyze and let us know what you see.
You are using an SSD and the value of 4 should not be apt.

storage type : SSD

Table size : 39 GB

There are no columns added, also I haven't found anything where we need to update the data for older rows.

Thanks

On Sun, Sep 20, 2020 at 6:55 PM Avinash Kumar <avinash.vallarapu@gmail.com> wrote:

On Sun., Sep. 20, 2020, 10:19 a.m. Shrikant Bhende, <shrikantpostgresql@gmail.com> wrote:
Hello all,

I am facing issues with one of the queries running on Amazon Aurora (PG VERSION 9.6) which is taking more than a minute to complete. As per the initial investigation I assume that one part of the query is taking a sequential scan on a table("wldbowner.member") which is consuming most of the execution time. I tried to use index (Btree and GIN ) as well but none of them were helping to get the query to run faster. Attached is the query along with the explain analyze of the same, any help on the same would be appreciated.

NOTE : I have done the vacuum on the table already to get rid of bloat issues.
Reindex and test with more work mem is also helping.
Just wanted to clarify one thing before looking at the query.

What is random_page_cost set to ?

Thanks and regards

Re: Query taking seq scan on a table

От

Jeff Janes

Дата:

21 сентября 2020 г., 14:31:44

On Mon, Sep 21, 2020 at 12:51 AM Shrikant Bhende <shrikantpostgresql@gmail.com> wrote:

Also I have tried to add a GIN index for better text search as below,

CREATE INDEX idx_fnmae_lname_gin_composite ON wldbowner.member USING gin
(lower((((fname)::text || ' '::text) || (lname)::text)) rdsadmin.gin_trgm_ops);

You index does not match your query:

((lower(unaccent_string((lname)::text)) ~~ 'info%'::text) AND (lower(unaccent_string((fname)::text)) ~~ 'travel%'::text))

Your index is not passing the columns through unaccent_string, and it is concatenating the columns while the query is treating them separately. You need to make the index (or indexes, as you might want one for each column) match the query.

If the wildcard is always at the end of the search-pattern strings, you could instead use btree indexes with text_pattern_ops.

Cheers,

Jeff

Re: Query taking seq scan on a table

От

Shrikant Bhende

Дата:

22 сентября 2020 г., 12:59:19

Hi Jeff,

Thanks for the update.

create index concurrently lname_test_btree_txt_pat_ops on wldbowner.member (lname text_pattern_ops)
where fname like LOWER(unaccent_string(lname) || '%')

Is this the correct way to create a b-tree index with text_pattern_ops for my requirement ?

Thanks

On Mon, Sep 21, 2020 at 8:01 PM Jeff Janes <jeff.janes@gmail.com> wrote:

On Mon, Sep 21, 2020 at 12:51 AM Shrikant Bhende <shrikantpostgresql@gmail.com> wrote:
Also I have tried to add a GIN index for better text search as below,

CREATE INDEX idx_fnmae_lname_gin_composite ON wldbowner.member USING gin
(lower((((fname)::text || ' '::text) || (lname)::text)) rdsadmin.gin_trgm_ops);

You index does not match your query:

((lower(unaccent_string((lname)::text)) ~~ 'info%'::text) AND (lower(unaccent_string((fname)::text)) ~~ 'travel%'::text))

Your index is not passing the columns through unaccent_string, and it is concatenating the columns while the query is treating them separately. You need to make the index (or indexes, as you might want one for each column) match the query.

If the wildcard is always at the end of the search-pattern strings, you could instead use btree indexes with text_pattern_ops.

Cheers,

Jeff

Re: Query taking seq scan on a table

От

Shrikant Bhende

Дата:

22 сентября 2020 г., 13:12:11

Hello,

I have created indexes on both columns and tried again but still the same result,
below is the explain analyze plan after the btree index,

Limit (cost=11638346.65..11638352.00 rows=102 width=209) (actual time=74248.165..74248.166 rows=1 loops=1)
-> Unique (cost=11638346.65..11638370.27 rows=450 width=209) (actual time=74248.163..74248.163 rows=1 loops=1)
-> Sort (cost=11638346.65..11638347.77 rows=450 width=209) (actual time=74248.163..74248.163 rows=1 loops=1)
Sort Key: m.mem_id, m.member_type_id, m.lname, m.fname, m.email, m.addr, m.zip, m.join_date, p.product_desc, b.product_name, c.cln_name, m.member_end_date, (CASE mc2.member_cancel_type_id WHEN '2'::numeric THEN mc2.member_cancel_type_id ELSE mc1.member_cancel_type_id END), m.active_fl, (CASE WHEN (ca.mem_id IS NULL) THEN NULL::integer ELSE 66 END), (CASE WHEN (gmc.mem_id IS NOT NULL) THEN 1 ELSE 0 END), (CASE WHEN (ccpa.mem_id IS NOT NULL) THEN 1 ELSE 0 END), gmc.declassification_start_date, gmc.anonymization_date, gmc.request_source
Sort Method: quicksort Memory: 25kB
-> Nested Loop Left Join (cost=1.46..11638326.81 rows=450 width=209) (actual time=74248.077..74248.114 rows=1 loops=1)
Join Filter: (m.mem_id = ccpa.mem_id)
-> Nested Loop Left Join (cost=1.46..11638308.14 rows=450 width=213) (actual time=74248.069..74248.106 rows=1 loops=1)
Join Filter: (m.mem_id = gmc.mem_id)
-> Nested Loop Left Join (cost=1.46..11637277.16 rows=450 width=167) (actual time=74248.067..74248.103 rows=1 loops=1)
-> Nested Loop (cost=1.02..11636167.66 rows=450 width=162) (actual time=74248.058..74248.094 rows=1 loops=1)
Join Filter: (s.sol_template_id = st.sol_template_id)
Rows Removed by Join Filter: 5565
-> Nested Loop (cost=1.02..11598195.94 rows=515 width=167) (actual time=74245.083..74245.113 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.73..11598011.99 rows=515 width=174) (actual time=74245.067..74245.097 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.29..11596742.22 rows=515 width=169) (actual time=74245.050..74245.080 rows=1 loops=1)
Join Filter: (ca.mem_id = m.mem_id)
-> Nested Loop (cost=0.29..11583036.85 rows=515 width=151) (actual time=74245.023..74245.052 rows=1 loops=1)
Join Filter: (websites.wbs_client_id = c.cln_client_id)
Rows Removed by Join Filter: 2343
-> Seq Scan on clients c (cost=0.00..46.75 rows=2375 width=21) (actual time=0.005..0.193 rows=2344 loops=1)
-> Materialize (cost=0.29..11564644.51 rows=515 width=142) (actual time=31.674..31.674 rows=1 loops=2344)
-> Nested Loop (cost=0.29..11564641.94 rows=515 width=142) (actual time=74243.846..74243.876 rows=1 loops=1)
Join Filter: (campaigns.cam_website_id = websites.wbs_website_id)
Rows Removed by Join Filter: 3721
-> Seq Scan on websites (cost=0.00..97.68 rows=3768 width=12) (actual time=0.005..0.345 rows=3722 loops=1)
-> Materialize (cost=0.29..11535437.75 rows=515 width=142) (actual time=8.316..19.947 rows=1 loops=3722)
-> Nested Loop (cost=0.29..11535435.17 rows=515 width=142) (actual time=30952.952..74241.900 rows=1 loops=1)
-> Nested Loop (cost=0.00..11534483.37 rows=515 width=130) (actual time=30952.931..74241.878 rows=1 loops=1)
Join Filter: (m.brn_id = b.brn_id)
Rows Removed by Join Filter: 202
-> Nested Loop (cost=0.00..11532899.40 rows=515 width=116) (actual time=30952.895..74241.751 rows=1 loops=1)
Join Filter: (m.product_id = p.product_id)
Rows Removed by Join Filter: 114
-> Seq Scan on member m (cost=0.00..11532015.32 rows=515 width=100) (actual time=30952.851..74241.676 rows=1 loops=1)
Filter: ((lower(unaccent_string((lname)::text)) ~~ 'info%'::text) AND (lower(unaccent_string((fname)::text)) ~~ 'travel%'::text))
Rows Removed by Filter: 20595516
-> Materialize (cost=0.00..3.71 rows=114 width=27) (actual time=0.013..0.050 rows=115 loops=1)
-> Seq Scan on product p (cost=0.00..3.14 rows=114 width=27) (actual time=0.007..0.019 rows=115 loops=1)
-> Materialize (cost=0.00..24.03 rows=202 width=26) (actual time=0.005..0.100 rows=203 loops=1)
-> Seq Scan on brand b (cost=0.00..23.02 rows=202 width=26) (actual time=0.003..0.060 rows=203 loops=1)
-> Index Scan using pk_campaigns on campaigns (cost=0.29..1.84 rows=1 width=12) (actual time=0.013..0.014 rows=1 loops=1)
Index Cond: (cam_campaign_id = m.sol_id)
-> Materialize (cost=0.00..36.55 rows=1770 width=18) (actual time=0.025..0.025 rows=0 loops=1)
-> Seq Scan on iot_2009_ca_member ca (cost=0.00..27.70 rows=1770 width=18) (actual time=0.002..0.002 rows=0 loops=1)
-> Index Scan using pk_member_cancel on member_cancel mc1 (cost=0.44..2.46 rows=1 width=11) (actual time=0.014..0.014 rows=0 loops=1)
Index Cond: (m.mem_id = mem_id)
Filter: (member_cancel_type_id = '1'::numeric)
-> Index Scan using pk_solicitation on solicitation s (cost=0.29..0.35 rows=1 width=11) (actual time=0.011..0.011 rows=1 loops=1)
Index Cond: (sol_id = campaigns.cam_campaign_id)
-> Materialize (cost=0.00..285.92 rows=4880 width=5) (actual time=0.016..2.278 rows=5566 loops=1)
-> Seq Scan on solicitation_template st (cost=0.00..261.52 rows=4880 width=5) (actual time=0.011..1.500 rows=5566 loops=1)
Filter: (country_id = ANY ('{1,2,3,121,121,4,5,6,7,8,9,10,11,12,13,14}'::numeric[]))
-> Index Scan using pk_member_cancel on member_cancel mc2 (cost=0.44..2.46 rows=1 width=11) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: (m.mem_id = mem_id)
Filter: (member_cancel_type_id = '2'::numeric)
-> Materialize (cost=0.00..25.59 rows=149 width=46) (actual time=0.001..0.001 rows=0 loops=1)
-> Seq Scan on gdpr_member_classification gmc (cost=0.00..24.85 rows=149 width=46) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((declassification_start_date IS NOT NULL) AND (current_fl = 1) AND (date_trunc('day'::text, declassification_start_date) <= to_date('9/17/2020'::text, 'MM/DD/YYYY'::text)) AND ((declassification_end_date IS NULL) OR (date_trunc('day'::text, declassification_end_date) > to_date('9/17/2020'::text, 'MM/DD/YYYY'::text))))
-> Materialize (cost=0.00..10.81 rows=1 width=18) (actual time=0.003..0.003 rows=0 loops=1)
-> Seq Scan on ccpa_member_classification ccpa (cost=0.00..10.80 rows=1 width=18) (actual time=0.003..0.003 rows=0 loops=1)
Filter: ((anonymization_date IS NOT NULL) AND (current_fl = 1) AND (date_trunc('day'::text, anonymization_date) <= to_date('9/17/2020'::text, 'MM/DD/YYYY'::text)))
Planning time: 11.217 ms
Execution time: 74248.572 ms

On Tue, Sep 22, 2020 at 6:29 PM Shrikant Bhende <shrikantpostgresql@gmail.com> wrote:

Hi Jeff,

Thanks for the update.

create index concurrently lname_test_btree_txt_pat_ops on wldbowner.member (lname text_pattern_ops)
where fname like LOWER(unaccent_string(lname) || '%')

Is this the correct way to create a b-tree index with text_pattern_ops for my requirement ?

Thanks

On Mon, Sep 21, 2020 at 8:01 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Sep 21, 2020 at 12:51 AM Shrikant Bhende <shrikantpostgresql@gmail.com> wrote:
Also I have tried to add a GIN index for better text search as below,

CREATE INDEX idx_fnmae_lname_gin_composite ON wldbowner.member USING gin
(lower((((fname)::text || ' '::text) || (lname)::text)) rdsadmin.gin_trgm_ops);

You index does not match your query:

((lower(unaccent_string((lname)::text)) ~~ 'info%'::text) AND (lower(unaccent_string((fname)::text)) ~~ 'travel%'::text))

Your index is not passing the columns through unaccent_string, and it is concatenating the columns while the query is treating them separately. You need to make the index (or indexes, as you might want one for each column) match the query.

If the wildcard is always at the end of the search-pattern strings, you could instead use btree indexes with text_pattern_ops.

Cheers,

Jeff

Re: Query taking seq scan on a table

От

Tom Lane

Дата:

22 сентября 2020 г., 14:07:26

Shrikant Bhende <shrikantpostgresql@gmail.com> writes:
> *create index concurrently lname_test_btree_txt_pat_ops on wldbowner.member
> (lname text_pattern_ops) where fname like LOWER(unaccent_string(lname) ||
> '%')*
> Is this the correct way to create a b-tree index with text_pattern_ops for
> my requirement ?

No.  What you're trying to optimize is

                            Filter: ((lower(unaccent_string((lname)::text))
~~ 'info%'::text) AND (lower(unaccent_string((fname)::text)) ~~
'travel%'::text))

so you need one or both of

create index on member (lower(unaccent_string(lname)) text_pattern_ops);
create index on member (lower(unaccent_string(fname)) text_pattern_ops);

If one of those two conditions is reliably more selective than the
other, perhaps just one index would do.

            regards, tom lane

Re: Query taking seq scan on a table

От

Shrikant Bhende

Дата:

22 сентября 2020 г., 14:50:16

Hello,

It worked fine for me.

Thanks for the correction and all the help!

On Tue, Sep 22, 2020 at 7:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Shrikant Bhende <shrikantpostgresql@gmail.com> writes:
> *create index concurrently lname_test_btree_txt_pat_ops on wldbowner.member
> (lname text_pattern_ops) where fname like LOWER(unaccent_string(lname) ||
> '%')*
> Is this the correct way to create a b-tree index with text_pattern_ops for
> my requirement ?

No. What you're trying to optimize is

Filter: ((lower(unaccent_string((lname)::text))
~~ 'info%'::text) AND (lower(unaccent_string((fname)::text)) ~~
'travel%'::text))

so you need one or both of

create index on member (lower(unaccent_string(lname)) text_pattern_ops);
create index on member (lower(unaccent_string(fname)) text_pattern_ops);

If one of those two conditions is reliably more selective than the
other, perhaps just one index would do.

regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Query taking seq scan on a table

Вложения