Re: qsort again (was Re: [PERFORM] Strange Create Index
От | Craig A. James |
---|---|
Тема | Re: qsort again (was Re: [PERFORM] Strange Create Index |
Дата | |
Msg-id | 43F4A7D8.4050907@modgraph-usa.com обсуждение исходный текст |
Ответ на | Re: qsort again (was Re: [PERFORM] Strange Create Index (Markus Schaber <schabi@logix-tt.com>) |
Ответы |
Re: qsort again (was Re: [PERFORM] Strange Create Index
|
Список | pgsql-hackers |
Markus Schaber wrote: > Ron wrote: >>...and of course if you know enough about the data to be sorted so as to >>constrain it appropriately, one should use a non comparison based O(N) >>sorting algorithm rather than any of the general comparison based >>O(NlgN) methods. > > Sounds interesting, could you give us some pointers (names, URLs, > papers) to such algorithms? Most of these techniques boil down to good ol' "bucket sort". A simple example: suppose you have a column of integer percentages,range zero to 100. You know there are only 101 distinct values. So create 101 "buckets" (e.g. linked lists),make a single pass through your data and drop each row's ID into the right bucket, then make a second pass throughthe buckets, and write the row ID's out in bucket order. This is an O(N) sort technique. Any time you have a restricted data range, you can do this. Say you have 100 million rows of scientific results known tobe good to only three digits -- it can have at most 1,000 distinct values (regardless of the magnitude of the values),so you can do this with 1,000 buckets and just two passes through the data. You can also use this trick when the optimizer is asked for "fastest first result." Say you have a cursor on a column ofnumbers with good distribution. If you do a bucket sort on the first two or three digits only, you know the first "page"of results will be in the first bucket. So you only need to apply qsort to that first bucket (which is very fast,since it's small), and you can deliver the first page of data to the application. This can be particularly effectivein interactive situations, where the user typically looks at a few pages of data and then abandons the search. I doubt this is very relevant to Postgres. A relational database has to be general purpose, and it's hard to give it "hints"that would tell it when to use this particular optimization. Craig
В списке pgsql-hackers по дате отправления: