Re: General data warehousing questions
От | Shane Ambler |
---|---|
Тема | Re: General data warehousing questions |
Дата | |
Msg-id | 48EA314D.80609@Sheeky.Biz обсуждение исходный текст |
Ответ на | Re: General data warehousing questions ("Scott Marlowe" <scott.marlowe@gmail.com>) |
Список | pgsql-general |
Scott Marlowe wrote: > On Sun, Oct 5, 2008 at 7:48 PM, Sean Davis <sdavis2@mail.nih.gov> wrote: >> I am looking at the prospect of building a data warehouse of genomic >> sequence data. The machine that produces the data adds about >> 300million rows per month in a central fact table and we will >> generally want the data to be "online". We don't need instantaneous >> queries, but we would be using the data for data mining purposes and >> running some "real-time" queries for reporting and research purposes. >> I have had the pleasure of working on an Netezza box where this type >> of thing is quite standard, but we don't have that access anymore, so >> I'm looking for hints on using postgres in a data warehousing/mining >> environment. Any suggestions on how DDL, loading, backup, indexing, >> or (to a certain extent) hardware? > > I assume you're familiar with stuff like star schemas. > > For loading you might want to look at things like pg_bulkloader, copy, > > For indexing remember that you have partial and fuctional indexes in > postgresql and they can come in quite handy. > > For backup of large changing databases look into PITR. > > As for hardware, you need enough CPU horsepower and memory to handle > however many users you're gonna have running simultaneous queries, but > more important is usually the drive subsystem. Throwing drives, > battery backed cache and a good RAID controller can make a big > difference. Usual RAID-10 is preferred, as writes are much faster. If > you're really squeezed for space and money then you can use RAID-5 but > it has some seriously negative performance implications for parallel > load handling and write speed. > You may also be interested in looking at BioPostgres http://phenomics.cs.ucla.edu/ -- Shane Ambler pgSQL (at) Sheeky (dot) Biz Get Sheeky @ http://Sheeky.Biz
В списке pgsql-general по дате отправления: