Tom, Greg, Robert,
Here's my suggestion:
1. First, estimate the cost of the node with a very pessimistic (50%?)
selectivity for the calculation.
2. If the cost hits a certain threshold, then run the calculation
estimation on the histogram.
That way, we avoid the subtransaction and other overhead on very small sets.
also:
> Trying it on the MCVs makes a lot of sense. I'm not so sure about
> trying it on the histogram entries. There's no reason to assume that
> those cluster in any way that will be useful. (For example, suppose
> that I have the numbers 1 through 10,000 in some particular column and
> my expression is col % 100.)
Yes, but for seriously skewed column distributions, the difference in
frequency between the MCV and a sample "random" distribution will be
huge. And it's precisely those distributions which are currently
failing in the query planner.
--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com