Re: Good way to insert/update when you're not sure of duplicates?
От | Shridhar Daithankar |
---|---|
Тема | Re: Good way to insert/update when you're not sure of duplicates? |
Дата | |
Msg-id | 3F73E01C.2060109@persistent.co.in обсуждение исходный текст |
Ответ на | Re: Good way to insert/update when you're not sure of duplicates? (Richard Huxton <dev@archonet.com>) |
Ответы |
Re: Good way to insert/update when you're not sure of duplicates?
|
Список | pgsql-general |
Richard Huxton wrote: > On Thursday 25 September 2003 02:23, Curtis Stanford wrote: > >>I'm in a situation where I need to load an ASCII file into a database. No >>sweat so far right? The records are indexed by date and I have a unique >>index on date. The ASCII can overlap, meaning it can contain duplicate >>dates that have been loaded before. >> >>I started out with this ingenious idea. Just try to insert the record. If I >>get an error about duplicate keys, do an update and switch to update mode. >>If I'm updating and get zero rows updated, switch back to insert. Works >>fine, except this whole thing is in a transaction and any errors abort the >>transaction and I can't commit it. So, I decided I'd have to try to update >>every single record and, if zero rows were updated, do the insert. This >>takes a looooong time. Many hours for just 86000 records or so. >> >>Is there a standard way to do this. I can't imagine I'm the only guy that >>need to do this kind of thing. > > > Try inserting a batch of 1024. If you get an error, drop down to 512 and try > that. Repeat until the batch works or you've reached a size of 1. > If the batch worked, try the next set of records and repeat. If you reached a > batch size of 1 with no success then, switch to updating, and repeat the > cycle increasing your batch-size as you go. > > You might find it quickest to halve batch-size while having problems then > doubling while it works. The balance is going to depend on how many insert vs > update rows you have. > In addition to that, you can try inserting from multiple backends simaltenously to speed up the whole process. And I don't like the modes idea OP gave. I would rather follow insert->if-error-update mode in a transaction for each record. And fork over say 20/40 parallel backends to achieve good speed. Shridhar
В списке pgsql-general по дате отправления: