Re: Simple (hopefully) throughput question?
От | Robert Klemme |
---|---|
Тема | Re: Simple (hopefully) throughput question? |
Дата | |
Msg-id | ib1f0e$f73$1@dough.gmane.org обсуждение исходный текст |
Ответ на | Simple (hopefully) throughput question? (Nick Matheson <Nick.D.Matheson@noaa.gov>) |
Список | pgsql-performance |
On 11/03/2010 04:52 PM, Nick Matheson wrote: > We have an application that needs to do bulk reads of ENTIRE > Postgres tables very quickly (i.e. select * from table). We have > observed that such sequential scans run two orders of magnitude slower > than observed raw disk reads (5 MB/s versus 100 MB/s). Part of this is > due to the storage overhead we have observed in Postgres. In the > example below, it takes 1 GB to store 350 MB of nominal data. However > that suggests we would expect to get 35 MB/s bulk read rates. > Our business logic does operations on the resulting data such that > the output is several orders of magnitude smaller than the input. So > we had hoped that by putting our business logic into stored procedures > (and thus drastically reducing the amount of data flowing to the > client) our throughput would go way up. This did not happen. Can you disclose what kinds of manipulations you want to do on the data? I am asking because maybe there is a fancy query (possibly using windowing functions and / or aggregation functions) that gets you the speed that you need without transferring the whole data set to the client. > So our questions are as follows: > > Is there any way using stored procedures (maybe C code that calls > SPI directly) or some other approach to get close to the expected 35 > MB/s doing these bulk reads? Or is this the price we have to pay for > using SQL instead of some NoSQL solution. (We actually tried Tokyo > Cabinet and found it to perform quite well. However it does not measure > up to Postgres in terms of replication, data interrogation, community > support, acceptance, etc). Kind regards robert
В списке pgsql-performance по дате отправления: