Re: Optimizer confusion?

Поиск
Список
Период
Сортировка
От Philip Warner
Тема Re: Optimizer confusion?
Дата
Msg-id 3.0.5.32.20000814114219.01fd1450@mail.rhyme.com.au
обсуждение исходный текст
Ответ на Re: Optimizer confusion?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
At 10:02 13/08/00 -0400, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> There might be a way to side-step the issue here. I assume that the index
>> nodes contain a pointer to a record in a file, which has some kind of file
>> position. By comparing the file positions on one leaf node, and then
>> averaging the node cluster values, you might be able to get a pretty good
>> idea of the *real* clustering.
>
>Hmm.  I have been thinking that an easier way of gathering statistics
>for the optimizer (on a column for which there is a btree index) is to
>scan the index sequentially.  This makes it trivial to determine the
>column min, max, and most common value, whereas right now we have very
>little chance of getting accurate MCV stats if there are more than a
>few distinct values.  If we do that we could also calculate some
>statistic about how well-ordered the pointers to main-table tuples are.

There are probably a couple of things to look for, in the sense that
well-ordered is not important, and neither is a partial sub-ordering - what
you need is 'well-clumped'.

ie. some indication of how many pages would need to be read in order to
read all records pointed to by an index node (unfortunately, this ignores
IOs from toasted values that are stored elsewhere).

The process is complicated a little by how you handle nodes with records in
adjacent pages (does this count as one IO), and in a related manner, by how
big the index entries are. But these seem pretty easy to deal with in a
VACUUM pass. Finally, you may also want to see how well sequential nodes
are clumped, but this is probably only important if the individual index
entries are large.


>Indeed one could argue that it's *more* accurate to include the deleted
>index entries than not, since they'll still provoke main-table accesses
>when scanned, which is exactly the thing we're trying to estimate.

And if we ever start reusing space automatically, then the index tree can
be updated appropriately to reflect the new costs.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Optimizer confusion?
Следующее
От: Zeugswetter Andreas SB
Дата:
Сообщение: AW: Optimizer confusion?