Обсуждение: improving performance for a delete
Hi all;
I have 2 tables where I basically want to delete from the first table
(seg_id_tmp7) any rows where the entire row already exists in the
second table (sl_cd_segment_dim)
I have a query that looks like this (and it's slow):
delete from seg_id_tmp7
where
customer_srcid::text ||
show_srcid::text ||
show_name::text ||
season_srcid::text ||
season_name::text ||
episode_srcid::text ||
episode_name::text ||
segment_type_id::text ||
segment_type::text ||
segment_srcid::text ||
segment_name::text
in
( select
customer_srcid::text ||
show_srcid::text ||
show_name::text ||
season_srcid::text ||
season_name::text ||
episode_srcid::text ||
episode_name::text ||
segment_type_id::text ||
segment_type::text ||
segment_srcid::text ||
segment_name::text
from sl_cd_location_dim )
;
Here's the query plan for it:
QUERY PLAN
-----------------------------------------------------------------------------------
Seq Scan on seg_id_tmp7 (cost=0.00..138870701.56 rows=2136 width=6)
Filter: (subplan)
SubPlan
-> Seq Scan on sl_cd_location_dim (cost=0.00..63931.60
rows=433040 width=8)
(4 rows)
I also tried this:
delete from seg_id_tmp7
where
( customer_srcid ,
show_srcid ,
show_name ,
season_srcid ,
season_name ,
episode_srcid ,
episode_name ,
segment_type_id ,
segment_type ,
segment_srcid ,
segment_name )
in
( select
customer_srcid ,
show_srcid ,
show_name ,
season_srcid ,
season_name ,
episode_srcid ,
episode_name ,
segment_type_id ,
segment_type ,
segment_srcid ,
segment_name
from sl_cd_location_dim )
;
and I get this query plan:
QUERY PLAN
-----------------------------------------------------------------------------------
Seq Scan on seg_id_tmp7 (cost=0.00..87997034.20 rows=2136 width=6)
Filter: (subplan)
SubPlan
-> Seq Scan on sl_cd_location_dim (cost=0.00..40114.40
rows=433040 width=8)
(4 rows)
If it helps here's the describe's (including indexes) for both tables:
# \d seg_id_tmp7
Table "public.seg_id_tmp7"
Column | Type | Modifiers
-----------------+-----------------------------+-----------
customer_srcid | bigint |
show_srcid | bigint |
show_name | character varying |
season_srcid | bigint |
season_name | character varying |
episode_srcid | bigint |
episode_name | character varying |
segment_type_id | bigint |
segment_type | character varying |
segment_srcid | bigint |
segment_name | character varying |
create_dt | timestamp without time zone |
# \d sl_cd_segment_dim
Table
"public.sl_cd_segment_dim"
Column | Type
| Modifiers
----------------------+-----------------------------
+-------------------------------------------------------------
sl_cd_segment_dim_id | bigint | not null
default nextval('sl_cd_segment_dim_seq'::regclass)
customer_srcid | bigint | not null
show_srcid | bigint | not null
show_name | character varying(500) | not null
season_srcid | bigint | not null
season_name | character varying(500) | not null
episode_srcid | bigint | not null
episode_name | character varying(500) | not null
segment_type_id | integer |
segment_type | character varying(500) |
segment_srcid | bigint |
segment_name | character varying(500) |
effective_dt | timestamp without time zone | not null
default now()
inactive_dt | timestamp without time zone |
last_update_dt | timestamp without time zone | not null
default now()
Indexes:
"sl_cd_segment_dim_pk" PRIMARY KEY, btree (sl_cd_segment_dim_id)
"seg1" btree (customer_srcid)
"seg2" btree (show_srcid)
"seg3" btree (season_srcid)
"seg4" btree (episode_srcid)
"seg5" btree (segment_srcid)
"sl_cd_segment_dim_ix1" btree (customer_srcid)
Any thoughts, suggestions, etc on how to improve performance for this
delete ?
Thanks in advance..
/Kevin
Version 8.3.1
On May 20, 2008, at 1:51 PM, kevin kempter wrote:
> Hi all;
>
> I have 2 tables where I basically want to delete from the first
> table (seg_id_tmp7) any rows where the entire row already exists in
> the second table (sl_cd_segment_dim)
>
> I have a query that looks like this (and it's slow):
>
>
> delete from seg_id_tmp7
> where
> customer_srcid::text ||
> show_srcid::text ||
> show_name::text ||
> season_srcid::text ||
> season_name::text ||
> episode_srcid::text ||
> episode_name::text ||
> segment_type_id::text ||
> segment_type::text ||
> segment_srcid::text ||
> segment_name::text
> in
> ( select
> customer_srcid::text ||
> show_srcid::text ||
> show_name::text ||
> season_srcid::text ||
> season_name::text ||
> episode_srcid::text ||
> episode_name::text ||
> segment_type_id::text ||
> segment_type::text ||
> segment_srcid::text ||
> segment_name::text
> from sl_cd_location_dim )
> ;
>
>
>
>
>
> Here's the query plan for it:
>
> QUERY PLAN
> -----------------------------------------------------------------------------------
> Seq Scan on seg_id_tmp7 (cost=0.00..138870701.56 rows=2136 width=6)
> Filter: (subplan)
> SubPlan
> -> Seq Scan on sl_cd_location_dim (cost=0.00..63931.60
> rows=433040 width=8)
> (4 rows)
>
>
>
>
>
>
>
>
> I also tried this:
>
> delete from seg_id_tmp7
> where
> ( customer_srcid ,
> show_srcid ,
> show_name ,
> season_srcid ,
> season_name ,
> episode_srcid ,
> episode_name ,
> segment_type_id ,
> segment_type ,
> segment_srcid ,
> segment_name )
> in
> ( select
> customer_srcid ,
> show_srcid ,
> show_name ,
> season_srcid ,
> season_name ,
> episode_srcid ,
> episode_name ,
> segment_type_id ,
> segment_type ,
> segment_srcid ,
> segment_name
> from sl_cd_location_dim )
> ;
>
>
> and I get this query plan:
>
> QUERY PLAN
> -----------------------------------------------------------------------------------
> Seq Scan on seg_id_tmp7 (cost=0.00..87997034.20 rows=2136 width=6)
> Filter: (subplan)
> SubPlan
> -> Seq Scan on sl_cd_location_dim (cost=0.00..40114.40
> rows=433040 width=8)
> (4 rows)
>
>
>
> If it helps here's the describe's (including indexes) for both tables:
>
> # \d seg_id_tmp7
> Table "public.seg_id_tmp7"
> Column | Type | Modifiers
> -----------------+-----------------------------+-----------
> customer_srcid | bigint |
> show_srcid | bigint |
> show_name | character varying |
> season_srcid | bigint |
> season_name | character varying |
> episode_srcid | bigint |
> episode_name | character varying |
> segment_type_id | bigint |
> segment_type | character varying |
> segment_srcid | bigint |
> segment_name | character varying |
> create_dt | timestamp without time zone |
>
>
>
>
> # \d sl_cd_segment_dim
> Table
> "public.sl_cd_segment_dim"
> Column | Type
> | Modifiers
> ----------------------+-----------------------------
> +-------------------------------------------------------------
> sl_cd_segment_dim_id | bigint | not null
> default nextval('sl_cd_segment_dim_seq'::regclass)
> customer_srcid | bigint | not null
> show_srcid | bigint | not null
> show_name | character varying(500) | not null
> season_srcid | bigint | not null
> season_name | character varying(500) | not null
> episode_srcid | bigint | not null
> episode_name | character varying(500) | not null
> segment_type_id | integer |
> segment_type | character varying(500) |
> segment_srcid | bigint |
> segment_name | character varying(500) |
> effective_dt | timestamp without time zone | not null
> default now()
> inactive_dt | timestamp without time zone |
> last_update_dt | timestamp without time zone | not null
> default now()
> Indexes:
> "sl_cd_segment_dim_pk" PRIMARY KEY, btree (sl_cd_segment_dim_id)
> "seg1" btree (customer_srcid)
> "seg2" btree (show_srcid)
> "seg3" btree (season_srcid)
> "seg4" btree (episode_srcid)
> "seg5" btree (segment_srcid)
> "sl_cd_segment_dim_ix1" btree (customer_srcid)
>
>
>
>
>
>
> Any thoughts, suggestions, etc on how to improve performance for
> this delete ?
>
>
> Thanks in advance..
>
> /Kevin
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
On Tue, 20 May 2008 22:03:30 +0200, kevin kempter
<kevin@kevinkempterllc.com> wrote:
> Version 8.3.1
>
>
> On May 20, 2008, at 1:51 PM, kevin kempter wrote:
>
>> Hi all;
>>
>> I have 2 tables where I basically want to delete from the first table
>> (seg_id_tmp7) any rows where the entire row already exists in the
>> second table (sl_cd_segment_dim)
>>
>> I have a query that looks like this (and it's slow):
>>
>>
>> delete from seg_id_tmp7
>> where
>> customer_srcid::text ||
Besides being slow as hell and not able to use any indexes, the string
concatenation can also yield incorrect results, for instance :
season_name::text || episode_srcid::text
Will have the same contents for
season_name='season 1' episode_srcid=12
season_name='season 11' episode_srcid=2
I suggest doing it the right way, one possibility being :
test=> EXPLAIN DELETE from test where (id,value) in (select id,value from
test2);
QUERY PLAN
-------------------------------------------------------------------------
Hash IN Join (cost=2943.00..6385.99 rows=2 width=6)
Hash Cond: ((test.id = test2.id) AND (test.value = test2.value))
-> Seq Scan on test (cost=0.00..1442.99 rows=99999 width=14)
-> Hash (cost=1443.00..1443.00 rows=100000 width=8)
-> Seq Scan on test2 (cost=0.00..1443.00 rows=100000 width=8)
Thanks to the hash it is very fast, one seq scan on both tables, instead
of one seq scan PER ROW in your query.
Another solution would be :
test=> EXPLAIN DELETE FROM test USING test2 WHERE test.id=test2.id AND
test.value=test2.value;
QUERY PLAN
-------------------------------------------------------------------------
Hash Join (cost=2943.00..6385.99 rows=2 width=6)
Hash Cond: ((test.id = test2.id) AND (test.value = test2.value))
-> Seq Scan on test (cost=0.00..1442.99 rows=99999 width=14)
-> Hash (cost=1443.00..1443.00 rows=100000 width=8)
-> Seq Scan on test2 (cost=0.00..1443.00 rows=100000 width=8)
Which chooses the same plan here, quite logically, as it is the best one
in this particular case.