Re: Merge algorithms for large numbers of "tapes"
От | Dann Corbit |
---|---|
Тема | Re: Merge algorithms for large numbers of "tapes" |
Дата | |
Msg-id | D425483C2C5C9F49B5B7A41F8944154757D5F3@postal.corporate.connx.com обсуждение исходный текст |
Ответ на | Merge algorithms for large numbers of "tapes" (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Merge algorithms for large numbers of "tapes"
|
Список | pgsql-hackers |
> -----Original Message----- > From: Stephen Frost [mailto:sfrost@snowman.net] > Sent: Thursday, March 09, 2006 3:49 PM > To: Tom Lane > Cc: Luke Lonergan; Jim C. Nasby; Greg Stark; Dann Corbit; Simon Riggs; > pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Merge algorithms for large numbers of "tapes" > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > > "Luke Lonergan" <llonergan@greenplum.com> writes: > > > I would only suggest that we replace the existing algorithm with one > that > > > will work regardless of (reasonable) memory requirements. Perhaps we > can > > > agree that at least 1MB of RAM for external sorting will always be > available > > > and proceed from there? > > > > If you can sort indefinitely large amounts of data with 1MB work_mem, > > go for it. > > It seems you two are talking past each other and I'm at least slightly > confused. So, I'd like to ask for a bit of clarification and perhaps > that will help everyone. > > #1: I'm as much a fan of eliminating unnecessary code as anyone > #2: There have been claims of two-pass improving things 400% > #3: Supposedly two-pass requires on the order of sqrt(total) memory Two pass does not require sqrt(total) memory. This figure is clearly wrong. Two pass will create the count of subfiles proportional to: Subfile_count = original_stream_size/sort_memory_buffer_size The merge pass requires (sizeof record * subfile_count) memory. Example: You have a 7 gigabyte table to sort and you have 100 MB sort buffer. The number of subfiles will be: 7000000000 / 100000000 = 70 files Suppose that a record is 2K wide. The merge pass requires 70*2k = 143,360 bytes of RAM. Suppose that a record is 65535 bytes wide. The merge pass requires 70*65535 = 4,587,450 bytes of RAM. > #4: We have planner statistics to estimate size of total > #5: We have a work_mem limitation for a reason > > So, if we get a huge performance increase, what's wrong with: > if [ sqrt(est(total)) <= work_mem ]; then > two-pass-sort(); > else > tape-sort(); > fi > > ? > > If the performance isn't much different and tape-sort can do it with > less memory then I don't really see any point in removing it. > > If the intent is to remove it and then ask for the default work_mem to > be increased- I doubt going about it this way would work very well. :) > > Thanks, > > Stephen
В списке pgsql-hackers по дате отправления: