Re: Performance problems testing with Spamassassin 3.1.0
От | John A Meinel |
---|---|
Тема | Re: Performance problems testing with Spamassassin 3.1.0 |
Дата | |
Msg-id | 42F299EC.8080809@arbash-meinel.com обсуждение исходный текст |
Ответ на | Re: Performance problems testing with Spamassassin 3.1.0 (Matthew Schumacher <matt.s@aptalaska.net>) |
Список | pgsql-performance |
Matthew Schumacher wrote: > John A Meinel wrote: > > >>Surely this isn't what you have. You have *no* loop here, and you have >>stuff like: >> AND >> (bayes_token_tmp) NOT IN (SELECT token FROM bayes_token); >> >>I'm guessing this isn't your last version of the function. >> >>As far as putting the CREATE TEMP TABLE inside the function, I think the >>problem is that the first time it runs, it compiles the function, and >>when it gets to the UPDATE/INSERT with the temporary table name, at >>compile time it hard-codes that table id. >> >>I tried getting around it by using "EXECUTE" which worked, but it made >>the function horribly slow. So I don't recommend it. >> >>Anyway, if you want us to evaluate it, you really need to send us the >>real final function. >> >>John >>=:-> > > > It is the final function. It doesn't need a loop because of the > bayes_token_tmp function I added. The array is passed to it and it > returns a record set so I can work off of it like it's a table. So the > function works the same way it before, but instead of using SELECT > intoken from TEMPTABLE, you use SELECT bayes_token_tmp from > bayes_token_tmp(intokenary). > > I think this is more efficient than the create table overhead, > especially because the incoming record set won't be to big. > > Thanks, > > schu > > Well, I would at least recommend that you change the "WHERE bayes_token_tmp NOT IN (SELECT token FROM bayes_token)" with a "WHERE NOT EXISTS (SELECT toke FROM bayes_token WHERE token=bayes_token_tmp)" You might try experimenting with the differences, but on my system the NOT IN has to do a full sequential scan on bayes_token and load all entries into a list, while NOT EXISTS can do effectively a nested loop. The nested loop requires that there is an index on bayes_token(token), but I'm pretty sure there is anyway. Again, in my testing, it was a difference of 4200ms versus 180ms. (800k rows in my big table, 2k in the temp one) John =:->
Вложения
В списке pgsql-performance по дате отправления: