Re: [GENERAL] speeding up INTERSECT/EXCEPT
От | Mike Mascari |
---|---|
Тема | Re: [GENERAL] speeding up INTERSECT/EXCEPT |
Дата | |
Msg-id | 19990725222249.18643.rocketmail@web125.yahoomail.com обсуждение исходный текст |
Список | pgsql-general |
--- Carl Hauser <chauser@parc.xerox.com> wrote: > I've searched the archives and docs without finding > any help on this. > > I want to use: > > select * from rel1 except select * from rel2; > > where rel1 and and rel2 each currently have about > 2000 records and are likely to grow to twice that > size or more. > > The query works but takes inordinately long. Looking > at the query plan, it seems that the statement is > being executed using a nested scan, so the query is > taking time proportional to the product of the sizes > of the two relations. The result set will usually be > a few hundred records. > > My question: are there any indices I could define on > the relations that would speed up the query? Since > noting the problem, I've added an index on one of > the fields (the same for both relations). Another > index covers 7 of the fields in both relations (7 > because that is the maximum for an index). Creating > these indices made no difference in the query plan > nor in the execution time of the query. > > Should I pursue this approach further or do the > differencing outside of the database? > > -- Carl Hauser > -- Xerox Palo Alto Research Center > > If there is a primary key on each table that could be used as the oprands in a comparison operator, then I think you might find a correlated subquery significantly faster than EXCEPT using the EXISTS clause. For example, I have two tables, supplies and deletedsupplies, with the primary key being the supply field: SELECT * FROM supplies WHERE NOT EXISTS (SELECT deletedsupplies.supply FROM deletedsupplies WHERE deletedsupplies.supply = supplies.supply) Because I have an index on the deletedsupplies' "supply" field, the plan performs a Sequential scan on the supplies table, and, for each record, performs an Index scan for the corresponding record in the deletedsupplies table. So, for your example, the query might be: SELECT * FROM rel1 WHERE NOT EXISTS (SELECT * FROM rel2 WHERE rel2.key = re1l.key) with indexes on rel1.key and rel2.key. If the ENTIRE record must match before it is excluded then you must use a multi-field index and write the query like: CREATE INDEX k_rel2 ON rel2(field1,field2,...) SELECT * FROM rel1 WHERE NOT EXISTS (SELECT * FROM rel2 WHERE (rel2.field1,rel2.field2,...) = (rel1.field1,rel1.field2,...)) I have found that the EXISTS implementation is usually good enough to yield the same functionality (with superior performance) as intersect, except, and as a poor-man's substitue for outer-joins... Hope that helps, Mike Mascari (mascarim@yahoo.com) _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
В списке pgsql-general по дате отправления: