Re: Multiple selects versus join
От | scott.marlowe |
---|---|
Тема | Re: Multiple selects versus join |
Дата | |
Msg-id | Pine.LNX.4.33.0208200215410.2333-100000@css120.ihs.com обсуждение исходный текст |
Ответ на | Re: Multiple selects versus join (Adrian 'Dagurashibanipal' von Bidder <avbidder@fortytwo.ch>) |
Список | pgsql-general |
On 20 Aug 2002, Adrian 'Dagurashibanipal' von Bidder wrote: > On Tue, 2002-08-20 at 04:00, Jean-Christian Imbeault wrote: > > Is it more efficient to do multiple selects on multiple tables to get > > data or do a join of those tables and extract the data from the > > resulting temp table? > > > > For example: > > > > select name from a where id=1; > > select pub_date from b where id=1; > > pub_id = select publisher_id from c where id=1; > > select pub_name from d where id = pub_id; > > > > or (I don't know the syntax for join so this is just pseudo-sql); > > > > select name, pub_date, pub_name from join((join(A, B, C) on id), D) on > > pub_id where id = 1; > > > > All my tables are related to each other so I am wondering which is more > > efficient, do multile selects or joining the tables. (Joining the table > > would return just one row). > > Yo! > > If you *know* that you will get just one row, you know more than the > database system can know. Generally: a join of tables with n and m rows > can return at most n*m rows, so joins are not very efficient compared to > multiple selects. A join is basically a select on both tables with > additional processing done. If you do multiple selects and can do the > join 'by hand' because you know much about the data you expect, this > will almost always be faster. Not true. What if you're joining on fields that have a unique constraint on them? There can be only one (to paraphrase Highlander) so you know how many rows you're gonna get back. It's quite possible you're searching for a single employee record and all the tables joining that one (even if there are dozens) are all joined on a unique field that references the main employee table. In that case, it WILL be faster to let the database do it in one query, unless it has a brain dead planner. The real issue is how many rows are you tossing around versus how many is the database tossing around. So, just like you'd pick the table that would return the fewest rows (most likely anyway) to process first by hand, so you should build your query using join syntax that forces postgresql to use that table first. Indexes make all the difference on <10% table being returned too. Basically, if you're gonna return many of the rows in a table, then it may well be faster to do it in code, but if you know it will be <10% it's much faster to let the database do it. Of course, the real reason to let the database do it is to make sure it gets done right. It's way likelier that you or I will write code that makes some fatal mistake we don't see than it is that the database will do that. If you make a mistake in your join syntax you'll know right up front, when you get 0 or 1,000,000,000,000 rows :-)
В списке pgsql-general по дате отправления: