Re: [HACKERS] sorting big tables :(

Поиск
Список
Период
Сортировка
От dg@illustra.com (David Gould)
Тема Re: [HACKERS] sorting big tables :(
Дата
Msg-id 9805170818.AA02027@hawk.illustra.com
обсуждение исходный текст
Ответ на Re: [HACKERS] sorting big tables :(  (Bruce Momjian <maillist@candle.pha.pa.us>)
Ответы Re: [HACKERS] sorting big tables :(  (The Hermit Hacker <scrappy@hub.org>)
Список pgsql-hackers
>
> >
> > On Fri, 15 May 1998, Bruce Momjian wrote:
> >
> > > > I have a big table. 40M rows.
> > > > On the disk, it's size is:
> > > >  2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table.
> > > > How should one decide based on table size how much room is needed?
> >
> > > It is taking so much disk space because it is using a TAPE sorting
> > > method, by breaking the file into tape chunks and sorting in pieces, the
> > The files grow until I have 6 files of almost a gig each. At that point, I
> > start running out of space...
> > This TAPE sotring method. It is a simple merge sort? Do you know of a way
> > this could be done while using constant space and no more complexity in
> > the algorithim. Even if it is a little slower, the DBMS could decide based
> > on the table size whether it should use the tape sort or another one...
> > Bubble sort would not be my first choice tho :)
>
> Tape sort is a standard Knuth sorting.  It basically sorts in pieces,
> and merges.  If you don't do this, the accessing around gets very poor
> as you page fault all over the file, and the cache becomes useless.
>
> There is something optimal about having seven sort files.  Not sure what
> to suggest.  No one has complained about this before.

I think this is a bug. There is no reason to use more than a little bit over
three times the input size for a sort. This is: input file, run files, output
file. If we are not able to sort a 2 gig table on a 9 gig partition we need
to fix it. I suspect we have a bug in the implementation, but perhaps we
need to look at our choice of algorithm. In any case this problem should go
on the todo list.

-dg

David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
"Of course, someone who knows more about this will correct me if I'm wrong,
 and someone who knows less will correct me if I'm right."
               --David Palmer (palmer@tybalt.caltech.edu)

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [HACKERS] sorting big tables :(
Следующее
От: The Hermit Hacker
Дата:
Сообщение: Re: [HACKERS] sorting big tables :(