Re: Complex database for testing, U.S. Census Tiger/UA
От | cbbrowne@cbbrowne.com |
---|---|
Тема | Re: Complex database for testing, U.S. Census Tiger/UA |
Дата | |
Msg-id | 20030408185842.CC2013E65C@cbbrowne.com обсуждение исходный текст |
Ответ на | Re: Complex database for testing, U.S. Census Tiger/UA (Dustin Sallings <dustin@spy.net>) |
Список | pgsql-hackers |
Dustin Sallings wrote: > I think it was my first application I wrote in python which parsed > the zip files containing these data and shoved it into a postgres system. > I had multiple clients on four or five computers running nonstop for about > two weeks to get it all populated. > > By the time I was done, and got my first index created, I began to > run out of disk space. I think I only had about 70GB to work with on the > RAID array. But this does not establish that this data represents a meaningful "transactional" load. Based on the sources, which presumably involve unique data, the "transactions" are all touching independent sets of data, and are likely to be totally uninteresting from the perspective of seeing how the system works under /TRANSACTION/ load. TRANSACTION loading will involve doing updates that actually have some opportunity to trample on one another. Multiple transactions concurrently updating a single balance table. Multiple transactions concurrently trying to attach links to a table entry. That sort of thing. I remember a while back when MSFT did a "enterprise scalability day," where they were trumpeting SQL Server performance on "hundreds of millions of transactions." At the time, I was at Sabre, who actually do tens of millions of transactions per day, for passenger reservations across lotso airlines. Microsoft was making loud noises to the effect that NT Server was wonderful for "enterprise transaction" work; the guys at work just laughed, because the kind of performance they got involved considerable amounts of 370 assembler to tune vital bits of the systems. What happened in the "scalability tests" was that Microsoft did much the same thing you did; they had hordes of transactions going through that were well, basically independent of one another. They could "scale" things up trivially by adding extra boxes. Need to handle 10x the transactions? Well, since they don't actually modify any shared resources, you just need to put in 10x as many servers. And that's essentially what happens any time TPC-? benchmarks reach the point of irrelevance; that happens every time someone figures out some "hack" that is able to successfully partition the work load. At that point, they merely need to add a bit of extra hardware, and increasing performance is as easy as adding extra processor boards. The real world doesn't scale so easily... -- (concatenate 'string "cbbrowne" "@acm.org") http://cbbrowne.com/info/emacs.html Send messages calling for fonts not available to the recipient(s). This can (in the case of Zmail) totally disable the user's machine and mail system for up to a whole day in some circumstances. -- from the Symbolics Guidelines for Sending Mail
В списке pgsql-hackers по дате отправления: