Re: benchmarking the query planner
От | Greg Stark |
---|---|
Тема | Re: benchmarking the query planner |
Дата | |
Msg-id | 4136ffa0812121033o105f8d6bie0f2a87082c77df7@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: benchmarking the query planner (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On Fri, Dec 12, 2008 at 6:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I seem to recall Greg suggesting that there were ways to estimate > ndistinct without sorting, but short of a fundamental algorithm change > there's not going to be a win here. Not sure if you meant me, but I can say the approach I saw documented before involved keeping a hash of all values seen in a full table scan. When the hash reached a maximum size you discard half the hash key space and from then on ignore any values which hash into that space. When you reach the end you have a hash table with a random sample of 1/2^n distinct values (where n is the number of times the hash table overflowed) and the counts for those values. If you're just interested in counting distinct values in the sample I suppose you could do it using a Bloom filter just as we've talked about for other hash tables. You scan through the list and increment an ndistinct counter for every value you find where the bits aren't all already set (then set those bits). That would undercount slightly and I'm not sure how much faster than qsort it would actually be. The log(n) factor in qsort's average case is pretty small when we're talking about small samples. -- greg
В списке pgsql-hackers по дате отправления: