Re: Parallel Seq Scan
| От | Jim Nasby |
|---|---|
| Тема | Re: Parallel Seq Scan |
| Дата | |
| Msg-id | 54C9486A.6050101@BlueTreble.com обсуждение исходный текст |
| Ответ на | Re: Parallel Seq Scan (Stephen Frost <sfrost@snowman.net>) |
| Ответы |
Re: Parallel Seq Scan
|
| Список | pgsql-hackers |
On 1/28/15 9:56 AM, Stephen Frost wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> On Wed, Jan 28, 2015 at 10:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I thought the proposal to chunk on the basis of "each worker processes >>> one 1GB-sized segment" should work all right. The kernel should see that >>> as sequential reads of different files, issued by different processes; >>> and if it can't figure out how to process that efficiently then it's a >>> very sad excuse for a kernel. > > Agreed. > >> I agree. But there's only value in doing something like that if we >> have evidence that it improves anything. Such evidence is presently a >> bit thin on the ground. > > You need an i/o subsystem that's fast enough to keep a single CPU busy, > otherwise (as you mentioned elsewhere), you're just going to be i/o > bound and having more processes isn't going to help (and could hurt). > > Such i/o systems do exist, but a single RAID5 group over spinning rust > with a simple filter isn't going to cut it with a modern CPU- we're just > too darn efficient to end up i/o bound in that case. A more complex > filter might be able to change it over to being more CPU bound than i/o > bound and produce the performance improvments you're looking for. Except we're nowhere near being IO efficient. The vast difference between Postgres IO rates and dd shows this. I suspectthat's because we're not giving the OS a list of IO to perform while we're doing our thing, but that's just a guess. > The caveat to this is if you have multiple i/o *channels* (which it > looks like you don't in this case) where you can parallelize across > those channels by having multiple processes involved. Keep in mind that multiple processes is in no way a requirement for that. Async IO would do that, or even just requestingstuff from the OS before we need it. > We only support > multiple i/o channels today with tablespaces and we can't span tables > across tablespaces. That's a problem when working with large data sets, > but I'm hopeful that this work will eventually lead to a parallelized > Append node that operates against a partitioned/inheirited table to work > across multiple tablespaces. Until we can get a single seqscan close to dd performance, I fear worrying about tablespaces and IO channels is entirelypremature. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
В списке pgsql-hackers по дате отправления: