Re: Massive table (500M rows) update nightmare
От | Greg Smith |
---|---|
Тема | Re: Massive table (500M rows) update nightmare |
Дата | |
Msg-id | 4B477570.4080804@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: Massive table (500M rows) update nightmare ("Carlo Stonebanks" <stonec.register@sympatico.ca>) |
Список | pgsql-performance |
Carlo Stonebanks wrote: > This is hosted on a new server the client set up so I am waiting for > the exact OS and hardware config. PG Version is PostgreSQL 8.3.6, > compiled by Visual C++ build 1400, OS appears to be Windows 2003 x64 > Server. > > More than anything, I am more concerned with the long-term use of the > system. This particular challenge with the 500M row update is one > thing, but I am concerned about the exceptional effort required to do > this. Is it REALLY this exceptional to want to update 500M rows of > data in this day and age? Or is the fact that we are considering > dumping and restoring and dropping indexes, etc to do all an early > warning that we don't have a solution that is scaled to the problem? It's certainly not common or easy to handle. If someone told me I had to make that task well perform well and the tools at hand were anything other than a largish UNIX-ish server with a properly designed disk subsystem, I'd tell them it's unlikely to work well. An UPDATE is the most intensive single operation you can do in PostgreSQL; the full lifecycle of executing it requires: -Creating a whole new row -Updating all the indexes to point to the new row -Marking the original row dead -Pruning the original row out of the database once it's no longer visible anywhere (VACUUM) As a rule, if you can't fit a large chunk of the indexes involved in shared_buffers on your system, this is probably going to have terrible performance. And you're on a platform where that's very hard to do, even if there's a lot of RAM around. It sounds like your system spends all its time swapping index blocks in and out of the database buffer cache here. That suggests there's a design problem here either with those indexes (they're too big) or with the provisioned hardware/software combination (needs more RAM, faster disks, or a platform where large amounts of RAM work better). -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
В списке pgsql-performance по дате отправления: