Manfred Koizar <mkoi-pg@aon.at> writes:
> If you know that there is a tendency for your data to be physically
> ordered by index value, you can put in a counterweight in favour of
> index scans by lowering random_page_cost. Of course this won't work,
> if you have multiple indices implying very different sort orders.
Of course, that's a hack that is quite unrelated to the real problem...
> I thought that the planner had a notion of "clustering", but I cannot
> recall where I got this idea from.
It does, as of 7.2, but it's entirely possible that the fudge-factor
being applied for that is all wrong. I have not had any time to work on
that problem recently, and so the equation that made it into 7.2 was
just a crude first hack with no theory behind it. See the
indexCorrelation adjustment code in cost_index() in
src/backend/optimizer/path/costsize.c if you're interested in fooling
with it.
Even in the uncorrelated case, the estimation equation *does* consider
the probability of multiple hits on the same heap page. Before you
assert that "the planner believes that one random page read is necessary
for each tuple", I suggest reading the code...
regards, tom lane