Re: 7.4beta2 vs 7.3.3
От | Tom Lane |
---|---|
Тема | Re: 7.4beta2 vs 7.3.3 |
Дата | |
Msg-id | 23810.1064070752@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: 7.4beta2 vs 7.3.3 (Gaetano Mendola <mendola@bigfoot.com>) |
Ответы |
Re: 7.4beta2 vs 7.3.3
|
Список | pgsql-hackers |
Gaetano Mendola <mendola@bigfoot.com> writes: > What about the wrong row expected ? After I looked more closely, I realized that the planner hasn't any hope of getting a really correct answer on that. You've got WHERE ... ud.id_class = cd.id_class AND cd.id_provider = 39; Now the planner doesn't have any problem figuring out that this will select one "cd" row, but the number of "ud" rows matched varies wildly depending on which one cd.id_class value happens to be involved. Without actually pre-executing the query it has no way to know which value will be involved, and so it has to fall back to a default estimate, which is IIRC (number of rows in ud)/(number of distinct values). Then there's a similar problem with estimating the number of rows retrieved from ul. > Anyway if the rows expected are 400 ( instead of 43 ) why not an index > scan, with 400 rows on 1500000 seems a good choise do an index scan, > isn't it ? The trouble here is that because of the very skewed data statistics (in both ud and ul), the planner can't really be sure that this query will retrieve only a few rows from either table. There are other values in both tables that would have retrieved vastly more data. The hash join may be slower for this particular id_provider value, but it won't get very much worse with other id_provider values --- a nestloop plan will. Your idea of reducing id_provider to id_class using a separate query seems like a good one to me --- that will allow the planner to generate different plans depending on which id_class value is involved. regards, tom lane
В списке pgsql-hackers по дате отправления: