Re: Fast Inserts and Hardware Questions
От | Gordon A. Runkle |
---|---|
Тема | Re: Fast Inserts and Hardware Questions |
Дата | |
Msg-id | 98p3rq$4of$1@news.tht.net обсуждение исходный текст |
Ответ на | Fast Inserts and Hardware Questions (Orion Henry <orion@trustcommerce.com>) |
Список | pgsql-general |
In article <3AAFC1C0.108A493@trustcommerce.com>, "Orion Henry" <orion@trustcommerce.com> wrote: > I am specing out a database for my company and I need some advice. The > database in mind will consist of one table, with about 300 bytes per > record. The table will be getting two or three million inserts a week > and it would be nice if it could handle a sustained 30 to 50 a second. [snip] > The customers want to be able to query their data whenever and have it > be snappy. How many week's worth of data will you be keeping? How many customers at a time will be querying? > So here is my question: > * Is there an OS that is best suited for postgres. All things being > equal I would like to run this on Linux. > * Is there an architecture best > suited for this. Should I shell out the extra cash for a 64 bit box > over a 32bit one. Quad Xeon or Quad Alpha? Quad Alpha or Quad > UltraSparc? Linux should be fine. 32-bit will also be fine. The real question is how vital is the data. If it's important, you'll want RAID. You'll want 64-bit PCI, too. I'd go with a Tier-1 vendor, such as IBM, Compaq, or Dell (in that order). > * Since most of what I am doing is inserts I will assume that the disk > will be my bottleneck over anything else. Is this correct? * Will the > 7.1 WAL save me, when it comes to insert times? I haven't pounded on 7.1, so I can't say. What I can say is that your RAID will affect this quite a bit. RAID-5 is the cheapest way to go, but also the slowest. IBM offers RAID-5E, which gives about 20% improvement over RAID-5 at the cost of one more disk. Best performance though is with RAID 1+0 or 0+1. As you might have guessed, it's the most expensive, too. If you have any boxen available to test on, I'd suggest doing some runs with simulated data driven from multiple clients. That would give you a good feel for it. If you have any "hefty" boxen, so much the better. Re: your indexes. It looks like the first would best optimize on primary_key, so it looks covered. What is the relationship (if any) between customers and groups? If the third query is being run a lot, clustering on creation_date, customer_id would help. The downside is will probably be the worst for inserts. Which would take us back to running some largish tests using realistic data (you'd be aiming to have roughly the same selectivity in your test data as your real-world app will have). Gordon. -- It doesn't get any easier, you just go faster. -- Greg LeMond
В списке pgsql-general по дате отправления: