Re: Merge algorithms for large numbers of "tapes"
От | Greg Stark |
---|---|
Тема | Re: Merge algorithms for large numbers of "tapes" |
Дата | |
Msg-id | 87acc0mhhz.fsf@stark.xeocode.com обсуждение исходный текст |
Ответ на | Re: Merge algorithms for large numbers of "tapes" ("Jim C. Nasby" <jnasby@pervasive.com>) |
Ответы |
Re: Merge algorithms for large numbers of "tapes"
|
Список | pgsql-hackers |
"Jim C. Nasby" <jnasby@pervasive.com> writes: > On Wed, Mar 08, 2006 at 06:55:59PM -0500, Greg Stark wrote: > > > > "Luke Lonergan" <llonergan@greenplum.com> writes: > > > > > > I am pretty sure from this thread that PostgreSQL is not doing #1, and I > > > > have no idea if it is doing #2. > > > > > > Yep. Even Knuth says that the tape goo is only interesting from a > > > historical perspective and may not be relevant in an era of disk drives. > > > > As the size of the data grows larger the behaviour of hard drives looks more > > and more like tapes. The biggest factor controlling the speed of i/o > > operations is how many seeks are required to complete them. Effectively > > "rewinds" are still the problem it's just that the cost of rewinds becomes > > constant regardless of how long the "tape" is. > > But it will take a whole lot of those rewinds to equal the amount of > time required by an additional pass through the data. I'll venture a > guess that as long as you've got enough memory to still read chunks back > in 8k blocks that it won't be possible for a multi-pass sort to > out-perform a one-pass sort. Well that's clearly a bit overoptimistic. If we believe the random page cost of 4 then having more tapes than you have spindles would impose a penalty equal to having four times as many passes. (And that's *with* the 8k block size. And with the kernel performing pre-fetch already too.) > For that to be of any use, wouldn't you need to use only as many tapes > as spindles/2? Otherwise you're still trying to read and write from the > same set of drives, which means you're probably doing a lot of seeking. > Or do the tape algorithms re-write data as they read it? Well, spindles-1. I was thinking as many tapes as you have spindles *in total*, ie, including the output tape. You only have one output tape for each n-way merge though. -- greg
В списке pgsql-hackers по дате отправления: