Re: Left Outer Join much faster than non-outer Join?
От | Ron Mayer |
---|---|
Тема | Re: Left Outer Join much faster than non-outer Join? |
Дата | |
Msg-id | 424BA115.8050606@cheapcomplexdevices.com обсуждение исходный текст |
Ответ на | Re: Left Outer Join much faster than non-outer Join? (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Left Outer Join much faster than non-outer Join?
|
Список | pgsql-performance |
Tom Lane wrote: > rm_pg@cheapcomplexdevices.com writes: > >> select * >> from streetname_lookup as sl >> join city_lookup as cl on (true) >> left outer join tlid_smaller as ts on (sl.geo_streetname_id = ts.geo_streetname_id and cl.geo_city_id=ts.geo_city_id) >> where str_name='alamo' and city='san antonio' and state='TX' >>; > > That's a fairly odd query; I think it's a very common type of query in data warehousing. It's reasonably typical of a traditional star schema where "streetname_lookup" and "city_lookup" are dimension tables and "tlid_smaller" is the central fact table. > why don't you have any join condition between > streetname_lookup and city_lookup? Those two tables shared no data. They merely get the "id"s for looking things up in the much larger central table. Unique indexes on the city_lookup and street_lookup make the cartesian join harmless (they each return only 1 value); and the huge fact table has a multi-column index that takes both of the ids from those lookups. With the tables I have (shown below), how else could one efficiently fetch the data for "Main St" "San Francisco"? streetname_lookup (for every street name used in the country) streetid | name | type ----------+--------+------ 1 | Main | St 2 | 1st | St city_lookup (for every city name used in the country) cityid | name | state --------+---------+------ 1 | Boston | MA 2 | Alameda| CA tlid_smaller (containing a record for every city block in the country) city_id | street_id | addresses | demographics, etc. --------+------------+-----------+---------------------- 1 | 1 | 100 block | [lots of columns] 1 | 1 | 200 block | [lots of columns] 1 | 1 | 300 block | [lots of columns] 1 | 2 | 100 block | [lots of columns] 1 | 2 | 100 block | [lots of columns] > The planner won't consider Cartesian joins unless forced to, which is > why it fails to consider the join order "((sl join cl) join ts)" unless > you have an outer join in the mix. I think that's generally a good > heuristic, and am disinclined to remove it ... IMHO it's a shame it doesn't even consider it when the estimated results are very small. I think often joins that merely look up IDs would be useful to consider for the purpose of making potential multi-column indexes (as shown in the previous email's explain analyze result where the cartesian join was 30X faster than the other approach since it could use the multi-column index on the very large table). Ron
В списке pgsql-performance по дате отправления: