Re: optimizing daily data storage in Pg
От | Andy Colson |
---|---|
Тема | Re: optimizing daily data storage in Pg |
Дата | |
Msg-id | 4C499D9B.3050101@squeakycode.net обсуждение исходный текст |
Ответ на | Re: optimizing daily data storage in Pg (P Kishor <punk.kish@gmail.com>) |
Список | pgsql-general |
On 7/23/2010 12:39 AM, P Kishor wrote: > On Thu, Jul 22, 2010 at 4:56 PM, Andy Colson<andy@squeakycode.net> wrote: >> On 7/22/2010 9:41 AM, P Kishor wrote: >>> >>> I have been struggling with this for a while now, have even gone down >>> a few paths but struck out, so I turn now to the community for ideas. >>> First, the problem: Store six daily variables for ~ 25 years for cells >>> in a grid. >>> >>> * Number of vars = 6 >>> * Number of cells ~ 13 million >>> * Number of days ~ 9125 (25 * 365) >>> >>> Optimize the store for two different kinds of queries: >>> >>> Query one: Retrieve the value of a single var for all or a portion of >>> the cells for a single day. This is analogous to an image where every >>> pixel is the value of a single var. >> >>> SELECT<var> FROM d WHERE yr = ? AND yday = ?; >>> SELECT<var> FROM d WHERE yr = ? AND yday = ? AND cell_id IN >>> (?,?,?...); >> >> >>> >>> Query two: Retrieve values for all the days or a duration of days for >>> a single var for a single cell. This is like grabbing a column out of >>> a table in which each row holds all the vars for a single day. >> >>> SELECT<var> FROM d WHERE cell_id = ?; >>> SELECT<var> FROM d WHERE cell_id IN (?,?,?...); >> >> >> >> First, I must admit to not reading your entire email. > > I am not sure how to respond to your feedback give that you haven't > read the entire email. Nevertheless, thanks for writing... > Heh, sorry, my point was, you put a lot of information into your email, and I was going to only use one bit of it: row counts. >> >> Second, Query 1 should be fast, regardless of how you layout the tables. > > It is not fast. Right now I have data for about 250,000 cells loaded. > That comes to circa 92 million rows per year. Performance is pretty > sucky. > > This query should return one record, correct? This should be very fast, PG should be able to find the record in the index within 5 seeks, and then find the data in one seek. Can you post 'explain analyze' for this query. (Or, it could be the case, I totally misunderstood your data) >> >> Third, Query 2 will return 13M rows? I dont think it matters how you layout >> the tables, returning 13M rows is always going to be slow. >> > > Yes, I understand that. In reality I will never get 13 M rows. For > display purposes, I will probably get around 10,000 rows to 50,000 > rows. When more rows are needed, it will be to feed a model, so that > can be offline (without an impatient human being waiting on the other > end). > > Right now, my main problem is that I have either too many rows (~4 B > rows) in a manageable number of tables (25 tables) or manageable > number of rows (~13 M rows) in too many tables (~9000 tables). > Searching by just cell_id is not going to be very selectable, and with large result-sets I can see this one being slow. As Scott talked about in his response, this one will come down to hardware. Have you dd tested your hardware? -Andy
В списке pgsql-general по дате отправления: