Re: Performance improvement hints
От | Tom Lane |
---|---|
Тема | Re: Performance improvement hints |
Дата | |
Msg-id | 7154.968767944@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Performance improvement hints (devik@cdi.cz) |
Список | pgsql-hackers |
devik@cdi.cz writes: > 1) when optimizer computes size of join it does it as > card(R1)*card(R2)*selectivity. Suppose two relations > (R1 & R2) each 10000 rows. If you (inner) join them > using equality operator, the result is at most 10000 > rows (min(card(R1),card(R2)). But pg estimates > 1 000 000 (uses selectivity 0.01 here). 0.01 is only the default estimate used if you've never done a VACUUM ANALYZE (hint hint). After ANALYZE, there are column statistics available that will give a better estimate. Note that your claim above is incorrect unless you are joining on unique columns, anyway. In the extreme case, if all the entries have the same value in the column being used, you'd get card(R1)*card(R2) output rows. I'm unwilling to make the system assume column uniqueness without evidence to back it up, because the consequences of assuming an overly small output row count are a lot worse than assuming an overly large one. One form of evidence that the planner should take into account here is the existence of a UNIQUE index on a column --- if one has been created, we could assume column uniqueness even if no VACUUM ANALYZE has ever been done on the table. This is on the to-do list, but I don't feel it's real high priority. The planner's results are pretty much going to suck in the absence of VACUUM ANALYZE stats anyway :-( > Then when computing cost it will result in very high > cost in case of hash and loop join BUT low (right) > cost for merge join. It is because for hash and loop > joins the cost is estimated from row count but merge > join uses another estimation (as it always know that > merge join can be done only on equality op). > It then leads to use of mergejoin for majority of joins. > Unfortunately I found that in majority of such cases > the hash join is two times faster. The mergejoin cost calculation may be overly optimistic. The cost estimates certainly need further work. > But indexscan always lookups actual record in heap even if > all needed attributes are contained in the index. > Oracle and even MSSQL reads attributes directly from index > without looking for actual tuple at heap. Doesn't work in Postgres' storage management scheme --- the heap tuple must be consulted to see if it's still valid. regards, tom lane
В списке pgsql-hackers по дате отправления: