Re: EXCEPT Queries
От | Peter Childs |
---|---|
Тема | Re: EXCEPT Queries |
Дата | |
Msg-id | Pine.LNX.4.44.0302200822130.5033-100000@RedDragon.Childs обсуждение исходный текст |
Ответ на | EXCEPT Queries (Mark Mitchell <mark@lapcrew.com>) |
Список | pgsql-sql |
On Wed, 19 Feb 2003, Mark Mitchell wrote: > I have a query that I must dedup using data in another table. > This can be done quiet easily using an EXCEPT clause as long as both > queries return the same result set, the problem with that is I only want > to dedup based on a single columns value not dedup based on entire rows. > The only other way I can see of doing this is using a NOT IN () clause, > this seems to take much more time and resources than an EXCEPT > statement. Is there any way to quickly dedup two tables based on only > one row? > > Here is an example of what I'm currently doing. > > TABLE "A" > "SUBSCRIBER_NAME" | "ACCOUNT_NUMBER" > -------------------------------------- > BOB | 000001 > JOE | 000002 > > TABLE "B" > "SUBSCRIBER_NAME" | "ACCOUNT_NUMBER" > -------------------------------------- > BOB | 000001 > > To dedup table "A" using the data in table "B" I could use the > following, except that the dedup takes place on the whole row when I > only want it to take place on the "ACCOUNT_NUMBER" column. > > SELECT > "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER" > FROM "A" > EXCEPT > SELECT > "B"."SUBSCRIBER_NAME" , "B"."ACCOUNT_NUMBER" > FROM "B" > > > I could use this but it takes WAY to long when both result sets contain > more than a few hundred records. It seems to increase exponentially the > more records you add to each result set. The data sets I'm working with > contain anywhere from 0 to 500000 records. > > SELECT > "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER" > FROM "A" > WHERE > "A"."ACCOUNT_NUMBER" > NOT IN > ( > SELECT > "B"."ACCOUNT_NUMBER" > FROM "B" > ) > Big sub queries are a bad idea. Since we know where the number is going to be why does this not work? SELECT a.subscriber_name, a.account_number FROM a WHERE EXISTS (SELECT b.account_number FROM b WHERE b.account_number=a.account_number); it should be a lot quicker if your account_number columns are indexed. I hope that helps Peter Childs
В списке pgsql-sql по дате отправления: