Re: Performance problems testing with Spamassassin 3.1.0
От | Andreas Pflug |
---|---|
Тема | Re: Performance problems testing with Spamassassin 3.1.0 |
Дата | |
Msg-id | 42ED161F.1020408@pse-consulting.de обсуждение исходный текст |
Ответ на | Re: Performance problems testing with Spamassassin 3.1.0 ("Jim C. Nasby" <decibel@decibel.org>) |
Список | pgsql-performance |
Jim C. Nasby wrote: > On Sun, Jul 31, 2005 at 08:51:06AM -0800, Matthew Schumacher wrote: > >>Ok, here is the current plan. >> >>Change the spamassassin API to pass a hash of tokens into the storage >>module, pass the tokens to the proc as an array, start a transaction, >>load the tokens into a temp table using copy, select the tokens distinct >>into the token table for new tokens, update the token table for known >>tokens, then commit. > > > You might consider: > UPDATE tokens > FROM temp_table (this updates existing records) > > INSERT INTO tokens > SELECT ... > FROM temp_table > WHERE NOT IN (SELECT ... FROM tokens) > > This way you don't do an update to newly inserted tokens, which helps > keep vacuuming needs in check. The subselect might be quite a big set, so avoiding a full table scan and materialization by DELETE temp_table WHERE key IN (select key FROM tokens JOIN temp_table); INSERT INTO TOKENS SELECT * FROM temp_table; or INSERT INTO TOKENS SELECT temp_table.* FROM temp_table LEFT JOIN tokens USING (key) WHERE tokens.key IS NULL might be an additional win, assuming that only a small fraction of tokens is inserted and updated. Regards, Andreas
В списке pgsql-performance по дате отправления: