Re: benchmarking the query planner
От | Ron Mayer |
---|---|
Тема | Re: benchmarking the query planner |
Дата | |
Msg-id | 4942BCD2.9070306@cheapcomplexdevices.com обсуждение исходный текст |
Ответ на | Re: benchmarking the query planner (Gregory Stark <stark@enterprisedb.com>) |
Список | pgsql-hackers |
Gregory Stark wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> The amount of I/O could stay the same, just sample all rows on block. [....] > > It will also introduce strange biases. For instance in a clustered table it'll > think there are a lot more duplicates than there really are because it'll see > lots of similar values. But for ndistinct - it seems it could only help things. If the ndistinct guesser just picks max(the-current-one-row-per-block-guess, a-guess-based-on-all-the-rows-on-the-blocks) it seems we'd be no worse off for clustered tables; and much better off for randomly organized tables. In some ways I fear *not* sampling all rows on the block also introduces strange biases by largely overlooking the fact that the table's clustered. In my tables clustered on zip-code we don't notice info like "state='AZ' is present in well under 1% of blocks in the table", while if we did scan all rows on the blocks it might guess this. But I guess a histogram of blocks would be additional stat rather than an improved one.
В списке pgsql-hackers по дате отправления: