Re: improving performance for a delete
От | PFC |
---|---|
Тема | Re: improving performance for a delete |
Дата | |
Msg-id | op.ubg0cxr9cigqcu@apollo13.peufeu.com обсуждение исходный текст |
Ответ на | Re: improving performance for a delete (kevin kempter <kevin@kevinkempterllc.com>) |
Список | pgsql-performance |
On Tue, 20 May 2008 22:03:30 +0200, kevin kempter <kevin@kevinkempterllc.com> wrote: > Version 8.3.1 > > > On May 20, 2008, at 1:51 PM, kevin kempter wrote: > >> Hi all; >> >> I have 2 tables where I basically want to delete from the first table >> (seg_id_tmp7) any rows where the entire row already exists in the >> second table (sl_cd_segment_dim) >> >> I have a query that looks like this (and it's slow): >> >> >> delete from seg_id_tmp7 >> where >> customer_srcid::text || Besides being slow as hell and not able to use any indexes, the string concatenation can also yield incorrect results, for instance : season_name::text || episode_srcid::text Will have the same contents for season_name='season 1' episode_srcid=12 season_name='season 11' episode_srcid=2 I suggest doing it the right way, one possibility being : test=> EXPLAIN DELETE from test where (id,value) in (select id,value from test2); QUERY PLAN ------------------------------------------------------------------------- Hash IN Join (cost=2943.00..6385.99 rows=2 width=6) Hash Cond: ((test.id = test2.id) AND (test.value = test2.value)) -> Seq Scan on test (cost=0.00..1442.99 rows=99999 width=14) -> Hash (cost=1443.00..1443.00 rows=100000 width=8) -> Seq Scan on test2 (cost=0.00..1443.00 rows=100000 width=8) Thanks to the hash it is very fast, one seq scan on both tables, instead of one seq scan PER ROW in your query. Another solution would be : test=> EXPLAIN DELETE FROM test USING test2 WHERE test.id=test2.id AND test.value=test2.value; QUERY PLAN ------------------------------------------------------------------------- Hash Join (cost=2943.00..6385.99 rows=2 width=6) Hash Cond: ((test.id = test2.id) AND (test.value = test2.value)) -> Seq Scan on test (cost=0.00..1442.99 rows=99999 width=14) -> Hash (cost=1443.00..1443.00 rows=100000 width=8) -> Seq Scan on test2 (cost=0.00..1443.00 rows=100000 width=8) Which chooses the same plan here, quite logically, as it is the best one in this particular case.
В списке pgsql-performance по дате отправления: