Обсуждение: Use "average field correlation per hard disk page" instead of global one?

Поиск
Список
Период
Сортировка

Use "average field correlation per hard disk page" instead of global one?

От
Alexey Nalbat
Дата:
Hello.

I have a table of 2'500'000 tuples and 100'000 pages, and an index
on non-unique field, to each key value corresponds approximately
50'000 tuples.

Due to the updating algorithm the physical order of tuples in the
table happens to be such that all equal keys are placed together,
but not ordered globally. Correlation computed by "VACUUM ANALYZE"
is 0.15.

When computing indexscan cost for query with clause "key = ?"
the planner makes it closer to "Mackert and Lohman formula" value
than to "selectivity * pages". As a result it chooses seqscan
rather than indexscan while in fact indexscan is 20 times faster.

The question is, which is the best way to correct this behavior?

Maybe "VACUUM ANALYZE" could calculate some average of "field
correlation per page" and even use this value somewhere inside
(not outside) "Mackert and Lohman formula"?

Are there any better ideas?



Re: Use "average field correlation per hard disk page" instead of global one?

От
Tom Lane
Дата:
Alexey Nalbat <alexey@price.ru> writes:
> Due to the updating algorithm the physical order of tuples in the
> table happens to be such that all equal keys are placed together,
> but not ordered globally.

Hmm... this is of course a weak spot of the correlation-based estimation
method.  If you were doing a range query then the computed correlation
might have some bearing on the cost, but when probing for a single key
value, your table will behave much differently than the correlation model
can guess.

> Are there any better ideas?

None at the moment, but I'm open to suggestions.  It seems like we might
need different stats for equality probes than range probes.
        regards, tom lane


Re: Use "average field correlation per hard disk

От
Philip Warner
Дата:
At 04:08 PM 10/03/2004, Tom Lane wrote:
>None at the moment, but I'm open to suggestions.  It seems like we might
>need different stats for equality probes than range probes.

What about my suggestion from August 2000:
    "There might be a way to side-step the issue here. I assume that    the index nodes contain a pointer to a record
ina file, which    has some kind of file position. By comparing the file positions    on one leaf node, and then
averagingthe node cluster values,    you might be able to get a pretty good idea of the *real* clustering."
 

I don't use the CLUSTER command, but I have clustered data and would like 
to be able to take advantage of the fact if possible. *If* the record 
pointers can be used to indicate closeness, then the same approach of 
randomly sampling index nodes would seem to work. Then again, maybe I don't 
know enough about the storage techniques...




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp.mit.edu:11371       |/