Re: No merge sort?
От | cbbrowne@cbbrowne.com |
---|---|
Тема | Re: No merge sort? |
Дата | |
Msg-id | 20030414165257.D226D56FE4@cbbrowne.com обсуждение исходный текст |
Ответ на | Re: No merge sort? ("Ron Peacetree" <rjpeace@earthlink.net>) |
Список | pgsql-hackers |
Ron Peacetree wrote: > ""Dann Corbit"" <DCorbit@connx.com> wrote in message > news:D90A5A6C612A39408103E6ECDD77B829408ACB@voyager.corporate.connx.co > m... > > > From: Ron Peacetree [mailto:rjpeace@earthlink.net]=20 > > > Sent: Wednesday, April 09, 2003 12:38 PM > > > You only need as many bins as you have unique key values=20 > > > silly ;-) Remember, at its core Radix sort is just a=20 > > > distribution counting sort (that's the name I learned for the=20 > > > general technique). The simplest implementation uses bits as=20 > > > the atomic unit, but there's nothing saying you have to...=20=20 > > > In a DB, we know all the values of the fields we currently=20 > > > have stored in the DB. We can take advantage of that. > > > > By what magic do we know this? If a database knew a-priori what all > the > > distinct values were, it would indeed be excellent magic. > For any table already in the DB, this is self evident. If you are > talking about sorting data =before= it gets put into the DB (say for a > load), then things are different, and the best you can do is used > comparison based methods (and probably some reasonably sophisticated > external merging routines in addition if the data set to be sorted and > loaded is big enough). The original question as I understood it was > about a sort as part of a query. That means everything to be sorted > is in the DB, and we can take advantage of what we know. > > > With 80 bytes, you have 2^320 possible values. There is no way > > around that. If you are going to count them or use them as a > > radix, you will have to classify them. The only way you will > > know how many unique values you have in > > "Company+Division" is to ... > > Either sort them or by some means discover all that are distinct > Ummm, Indexes, particularly Primary Key Indexes, anyone? Finding the > unique values in an index should be perceived as trivial... ...and you > often have the index in memory for other reasons already. But this does not remove the fact that a radix sort on an 80 byte field requires O(2^320) space, that is, O(N), where N is the size of the state space of the key... Perhaps you need to reread Gonnet; it tells you that... -- output = reverse("moc.enworbbc@" "enworbbc") http://www3.sympatico.ca/cbbrowne/multiplexor.html Rules of the Evil Overlord #28. "My pet monster will be kept in a secure cage from which it cannot escape and into which I could not accidentally stumble." <http://www.eviloverlord.com/>
В списке pgsql-hackers по дате отправления: