Re: COUNT(*) and index-only scans
От | Tom Lane |
---|---|
Тема | Re: COUNT(*) and index-only scans |
Дата | |
Msg-id | 13161.1318429779@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: COUNT(*) and index-only scans (Greg Stark <stark@mit.edu>) |
Ответы |
Re: COUNT(*) and index-only scans
|
Список | pgsql-hackers |
Greg Stark <stark@mit.edu> writes: > Assuming you're in a steady-state situation the amount of all-visible > blocks will fluctuate from a high just after vacuum to a low just > before the next vacuum. There are other ways a block can be marked > all-visible but for the most part I would expect the fraction to go > steadily down until vacuum comes along and cleans things up. > So if vacuum tracked the fraction of blocks marked all-visible > *before* it processed them and the fraction it marked all-visible > after processing we have an upper and lower bound. If we knew how long > it's been since vacuum we could interpolate between those, or we could > just take the mean, or we could take the lower bound as a conservative > estimate. I thought of that too, but we don't do the comparable thing for dead tuple counts, and I am not convinced that we should do it for visibility. I'd rather have a simple rule that "it's right immediately after VACUUM", so that at least trivial cases like read-only tables work correctly. >> What I suggest as a first cut for that is: simply derate the visibility fraction as the fraction >> of the table expected to be scanned gets smaller. > I think there's a statistically more rigorous way of accomplishing the > same thing. If you treat the pages we estimate we're going to read as > a random sample of the population of pages then your expected value is > the fraction of the overall population that is all-visible but your > 95th percentile confidence interval will be, uh, a simple formula we > can compute but I don't recall off-hand. The problem is precisely that the pages a query is going to read are likely to *not* be a random sample, but to be correlated with recently-dirtied pages. > ... It currently uses all expected values but in many > cases it would be valuable if the planner knew what the standard > deviation of those estimates was. No doubt, but I'm not volunteering to fix that before we can have a non-toy estimate for index-only scans. regards, tom lane
В списке pgsql-hackers по дате отправления: