Re: HashJoin w/option to unique-ify inner rel
От | Tom Lane |
---|---|
Тема | Re: HashJoin w/option to unique-ify inner rel |
Дата | |
Msg-id | 24756.1240627763@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: HashJoin w/option to unique-ify inner rel (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: HashJoin w/option to unique-ify inner rel
Re: HashJoin w/option to unique-ify inner rel |
Список | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > As far as I can tell, the focus on trying to estimate the number of > tuples per bucket is entirely misguided. Supposing the relation is > mostly unique so that the values don't cluster too much, the right > answer is (of course) NTUP_PER_BUCKET. But the entire point of that code is to arrive at a sane estimate when the inner relation *isn't* mostly unique and *does* cluster. So I think you're being much too hasty to conclude that it's wrong. > Because the extra tuples that get thrown into the bucket > generally don't have the same hash value (or if they did, they would > have been in the bucket either way...) and get rejected with a simple > integer comparison, which is much cheaper than > hash_qual_cost.per_tuple. Yeah, we are charging more than we ought to for bucket entries that can be rejected on the basis of hashcode comparisons. The difficulty is to arrive at a reasonable guess of what fraction of the bucket entries will be so rejected, versus those that will incur a comparison-function call. I'm leery of assuming there are no hash collisions, which is what you seem to be proposing. regards, tom lane
В списке pgsql-hackers по дате отправления: