Re: Duplicate deletion optimizations
От | Jochen Erwied |
---|---|
Тема | Re: Duplicate deletion optimizations |
Дата | |
Msg-id | 1723406361.20120107151837@erwied.eu обсуждение исходный текст |
Ответ на | Re: Duplicate deletion optimizations ("Marc Mamin" <M.Mamin@intershop.de>) |
Список | pgsql-performance |
Saturday, January 7, 2012, 1:21:02 PM you wrote: > where t_imp.id is null and test.id=t_imp.id; > => > where t_imp.id is not null and test.id=t_imp.id; You're right, overlooked that one. But the increase to execute the query is - maybe not completely - suprisingly minimal. Because the query updating the id-column of t_imp fetches all rows from test to be updated, they are already cached, and the second query is run completely from cache. I suppose you will get a severe performance hit when the table cannot be cached... I ran the loop again, after 30 minutes I'm at about 3-5 seconds per loop, as long as the server isn't doing something else. Under load it's at about 10-20 seconds, with a ratio of 40% updates, 60% inserts. > and a partial index on matching rows might help (should be tested): > (after the first updat) > create index t_imp_ix on t_imp(t_value,t_record,output_id) where t_imp.id is not null. I don't think this will help much since t_imp is scanned sequentially anyway, so creating an index is just unneeded overhead. -- Jochen Erwied | home: jochen@erwied.eu +49-208-38800-18, FAX: -19 Sauerbruchstr. 17 | work: joe@mbs-software.de +49-2151-7294-24, FAX: -50 D-45470 Muelheim | mobile: jochen.erwied@vodafone.de +49-173-5404164
В списке pgsql-performance по дате отправления: