Re: Reading data in bulk - help?
От | Magnus Naeslund(w) |
---|---|
Тема | Re: Reading data in bulk - help? |
Дата | |
Msg-id | 51478.129.178.88.66.1063194442.squirrel@mail2.fbab.net обсуждение исходный текст |
Ответ на | Re: Reading data in bulk - help? (Chris Huston <chuston@bangjafwac.com>) |
Ответы |
Re: Reading data in bulk - help?
|
Список | pgsql-performance |
Chris Huston said: > Thanks Josh that helped. I had gone looking for some kind of cluster > option but was looking under create database, create index and > initlocation - didn't see the CLUSTER index ON table. > > I ran the CLUSTER which took about 2 1/2 hours to complete. That > improved the query performance about 6x - which is great - but is still > taking 26 minutes to do what a serial read does in about 2 1/2 minutes. > > At this point I'm ok because each fetch is taking around 200 > milliseconds from call to the time the data is ready. The processing > takes 300-600ms per batch. I've got the fetch and the processing > running in separate threads so even if postgres was running faster it > wouldn't help this implementation. > > However, "iostat" is still reporting average size per transfer of about > 10kB and total thru-put of about 1MB/s. The transfers per second went > from >200/s to about 80/s. It still seams like it ought to be a faster. > > The system is currently running on a single processor 500Mhz G4. We're > likely to move to a two processor 2Ghz G5 in the next few months. Then > each block may take only a 30-60 milliseconds to complete and their can > be two concurrent blocks processing at once. > > Sometime before then I need to figure out how to cut the fetch times > from the now 200ms to something like 10ms. There are currently > 1,628,800 records in the single data table representing 6817 groups. > Each group has 2 to 284 records - with 79% having the max 284 (max > grows by 1 every day - although the value may change throughout the > day). Each record is maybe 1 or 2k so ideally each batch/group should > require 284-568k - at 10MB/s - that'd be > > RELATED QUESTION: How now do I speed up the following query: "select > distinct group_id from datatable"? Which results in a sequential scan > of the db. Why doesn't it use the group_id index? I only do this once > per run so it's not as critical as the fetch speed which is done 6817 > times. > > Thanks for the help! > - Chris > How are you fetching the data? If you are using cursors, be sure to fetch a substatial bit at a time so that youre not punished by latency. I got a big speedup when i changed my original clueless code to fetch 64 rows in a go instead of only one. Magnus
В списке pgsql-performance по дате отправления: