> > > > What is actually estimated wrong here seems to be the estimated
> > > > effective cache size, and thus the cache ratio of page fetches.
> > >
> > > Good point, but I think the estimates are only marginally sensitive
> > > to estimated cache size (if they're not, we have a problem, considering
> > > how poorly we can estimate the kernel's disk buffer size). It would
> > > be interesting for Daniel to try a few different settings of
> > > effective_cache_size and see how much the EXPLAIN costs change.
> >
> > Well, the number I told him (29370) should clearly prefer the index.
> > The estimate is very sensitive to this value :-(
> > With 29370 (=229 Mb) the index cost is 1,364 instead of 3,887 with the
> > default of 1000 pages ==> index scan.
>
> But... if I understand it right (effective_cache_size to be related to kernel
> buffer space). it turns out that the estimates are different with reality - my
> buffer cache is ca. 50 MB and I still get at least twice the performance with
> index scan instead of sequential scan - where as Tom explained things should
> be much worse.
Since pg only reads one 8k page at a time, the IO performance of a seq scan is
probably not nearly a good as it could be when a lot of other IO is done on the
same drive.
First thing you should verify is if there is actually a measurable difference
in IO throughput on the pg drive during the seq scan and the index scan. (iostat)
If there is not, then random_page_cost is too high in your scenario.
(All assuming your data is not still clustered like Tom suggested)
Andreas