Re: massive INSERT
От | Greg Stark |
---|---|
Тема | Re: massive INSERT |
Дата | |
Msg-id | 873clqer3f.fsf@stark.dyndns.tv обсуждение исходный текст |
Ответ на | massive INSERT (Kurt Overberg <kurt@hotdogrecords.com>) |
Список | pgsql-sql |
Kurt Overberg <kurt@hotdogrecords.com> writes: > ...I've found this to be faster then running the query, figuring out who needs > to be removed from the group, who needs to be added and whatnot. The thing that > I'm worried about is that this table is going to be pretty big (potentially > millions of rows), and everytime I rebuild this table I lose (or at least > invalidate) all my indexes. Is that the case? Is constantly deleting then > adding large numbers of rows from a table really bad for performance? I'm > worried this isn't going to scale well. Anyone know of a better way to do > this? Thoughts and comments would be appreciated. From a programming point of view it's usually cleaner to do it as you're doing than trying to find the differences and do the minimal changes. So I usually go with that. There are performance implications though. Every time you update or delete a record in postgres it generates old garbage tuples that need to get cleaned up. If you're doing a big batch job it can overflow the space set aside to recover them which means instead of VACUUM you might have to do a VACUUM FULL or else play with the fsm parameters for your database. It's not clear from your description if you're regenerating the entire table or just a specific group. If you're looping through all the groups regenerating all of them you should perhaps consider using TRUNCATE instead of delete. TRUNCATE is a bigger hammer and avoids generating any free space to clean up. Alternatively you should consider running VACUUM ANALYZE after every group update you do and possibly a VACUUM FULL at an hour you can withstand some downtime. I'm not sure what you mean by losing your indexes. They'll be just as valid afterwards as they were before, though they might become less efficient for basically the same reasons as the table above. You may find running the periodic REINDEX during downtime helps. If you're updating the majority of the table and can do it during downtime you might consider dropping the index while performing the update and then recreating it at the end. But that's optional. -- greg
В списке pgsql-sql по дате отправления: