Re: benchmarking the query planner
От | Greg Stark |
---|---|
Тема | Re: benchmarking the query planner |
Дата | |
Msg-id | 4136ffa0812121001x3fde1871t558fd8c58070b7d@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: benchmarking the query planner (Simon Riggs <simon@2ndQuadrant.com>) |
Ответы |
Re: benchmarking the query planner
|
Список | pgsql-hackers |
On Fri, Dec 12, 2008 at 5:33 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Incidentally we *do* do block sampling. We pick random blocks and then pick >> random records within those blocks. This was new in, uh, 7.4? 8.0? Sometime >> around then. It dramatically reduced the i/o requirements but there were long >> discussions of how to do it without introducing biases. > > No, we pick random rows. On bigger tables, they get further apart > typically and so we miss any clustering. I mean that we should pick a > random block and read all rows on it. I think what's happening here is that our sampling method is based on the assumption that a records location is not related to its value. Consider a table which is clustered and has two copies of each value. When we look at a block and see n/2 values and we know there are 1000 blocks then a human would conjecture that there are 1000*n/2 distinct values throughout the table and every value is represented twice throughout the whole table. But if we're assuming the records are randomly distributed then looking at the whole block will actually throw us off completely. We'll deduce from the fact that we saw every value twice that there must be hundreds of copies spread throughout the database and there must be a lot less than 1000*n/2 distinct values. I think you need to find two different formulas, one which represents a clustered table and one which represents randomly distributed data. Then you need a way to measure just how clustered the data is so you know how much weight to give each formula. Perhaps comparing the number of duplicates in whole-block samples versus overall random selections would give that measure. -- greg
В списке pgsql-hackers по дате отправления: