Sorting costs (was Caution: tonight's commits force initdb)
От | Tom Lane |
---|---|
Тема | Sorting costs (was Caution: tonight's commits force initdb) |
Дата | |
Msg-id | 2084.935506730@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | RE: [HACKERS] Caution: tonight's commits force initdb ("Hiroshi Inoue" <Inoue@tpf.co.jp>) |
Список | pgsql-hackers |
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > Hmm,Index scan is chosen to select all rows. > AFAIK,sequential scan + sort is much faster than index scan in > most cases. > cost of index scan < cost of sequential scan + cost of sort > I have felt that the current cost estimation of index scan is too small, > though I have no alternative. Hmm. Well, it's still a step forward that the system is able to consider this query plan --- if it's choosing a plan that's actually slower, then that indicates we have a problem with our cost estimates. The current cost estimate for a sort (see optimizer/path/costsize.c) is basically just P log P disk accesses (P being the estimated relation size in pages) plus N log N tuple comparisons (N being the estimated tuple count). This is fairly bogus --- for one thing it does not account for the fact that sorts smaller than SortMem kilobytes are done in-memory without temp files. I doubt that the amount of I/O for a larger sort is quite right either. We need to look at exactly what usage psort.c makes of temp files and revise the I/O estimates accordingly. I am also suspicious that indexscan costs are underestimated. The cost of reading the index is probably not too far off, but the cost of accessing the main table is bogus. Worst case, for a table whose tuples are thoroughly scattered, you would have a main-table page fetch for *each* returned tuple. In practice it's probably not anywhere near that bad, since you may have some clustering of tuples (so that the same page is hit several times in a row), and also the Postgres buffers and the underlying Unix system's disk cache will save trips to disk if there is any locality of reference at all. I have no idea how to estimate that effect --- anyone? But cost_index is clearly producing a ridiculously optimistic estimate at the moment. regards, tom lane
В списке pgsql-hackers по дате отправления: