Re: Optimizer confusion?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Optimizer confusion?
Дата
Msg-id 8209.966175378@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Optimizer confusion?  (Philip Warner <pjw@rhyme.com.au>)
Ответы Re: Optimizer confusion?  (Philip Warner <pjw@rhyme.com.au>)
Список pgsql-hackers
Philip Warner <pjw@rhyme.com.au> writes:
> There might be a way to side-step the issue here. I assume that the index
> nodes contain a pointer to a record in a file, which has some kind of file
> position. By comparing the file positions on one leaf node, and then
> averaging the node cluster values, you might be able to get a pretty good
> idea of the *real* clustering.

Hmm.  I have been thinking that an easier way of gathering statistics
for the optimizer (on a column for which there is a btree index) is to
scan the index sequentially.  This makes it trivial to determine the
column min, max, and most common value, whereas right now we have very
little chance of getting accurate MCV stats if there are more than a
few distinct values.  If we do that we could also calculate some
statistic about how well-ordered the pointers to main-table tuples are.

The nifty thing about doing this during ANALYZE is that you'd only have
to read the index, not the main table, so it should be reasonably quick.
In most contexts that would be tres uncool because you'd not be able to
tell index entries for deleted tuples from those for live tuples --- but
for ANALYZE I think it'd be perfectly acceptable to just count 'em all.
Indeed one could argue that it's *more* accurate to include the deleted
index entries than not, since they'll still provoke main-table accesses
when scanned, which is exactly the thing we're trying to estimate.

>>> I don't suppose I can get the backend to tell me how many logical IOs and
>>> how much CPU it used?
>> 
>> Yes you can.  Run psql with
>> PGOPTIONS="-s"
>> and look in the postmaster log.  There's also -tparse, -tplan,
>> -texec if you'd rather see the query time broken down by stages.

> Thanks for this; I see almost no file IO, but lots of paging; is this a
> feature of the way Linux does file buffering?

Could be.  IIRC, it's possible to tell from the stats how many page
accesses are short-circuited by Postgres' own disk buffers (vs being
given to the kernel) and at least on HPUX it's also possible to tell
how many of the kernel requests actually resulted in physical reads
(vs being satisfied out of kernel disk buffers).  But it takes a certain
amount of reading between the lines 'cause the numbers aren't real well
labeled.  Dunno about how it works on Linux --- comments anyone?
        regards, tom lane

PS: I am leaving town in an hour to go to LinuxWorld.  Will be seeing
email erratically if at all this week, so don't be surprised at lack of
response.  Any of y'all planning to be at LinuxWorld, don't forget to
stop by the Great Bridge booth!


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Philip Warner
Дата:
Сообщение: Re: Optimizer confusion?
Следующее
От: Philip Warner
Дата:
Сообщение: Re: Optimizer confusion?