Re: question about index cost estimates
От | Tom Lane |
---|---|
Тема | Re: question about index cost estimates |
Дата | |
Msg-id | 18977.958601273@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | question about index cost estimates (Jeff Hoffmann <jeff@propertykey.com>) |
Список | pgsql-hackers |
Jeff Hoffmann <jeff@propertykey.com> writes: > calculation of pages fetched from the relation seems high. i'm not sure > what the actual should be, but some queries have it at 2-4x the number > of pages in the relation, which seemed high, so i started looking at > that. If the table is bigger than the disk cache, and we are hitting pages more than once, then we are likely to have to fetch some pages more than once. The model is assuming that the tuples we want are scattered more or less randomly over the whole table. If #tuples to be fetched approaches or exceeds #pages in table, then clearly we are going to be touching some pages more than once because we want more than one tuple from them. The $64 question is did the page fall out of buffer cache between references? Worst case (if your table vastly exceeds your available buffer space), practically every tuple fetch will require a separate physical read, and then the number of page fetches is essentially the number of tuples returned --- which of course can be a lot more than the number of pages in the table. Right now these considerations are divided between cost_index() and cost_nonsequential_access() in a way that might well be wrong. I've been intending to dig into the literature and try to find a better cost model but haven't gotten to it yet. > it just seems to me that there should be some notion of tuple size > figured in to know how many tuples fit in a page. It seemed to me that the critical ratios are #tuples fetched vs #pages in table and table size vs. cache size. I could be wrong though... regards, tom lane
В списке pgsql-hackers по дате отправления: