Re: Bulkloading using COPY - ignore duplicates?
От | Lee Kindness |
---|---|
Тема | Re: Bulkloading using COPY - ignore duplicates? |
Дата | |
Msg-id | 15384.49588.776088.349200@elsick.csl.co.uk обсуждение исходный текст |
Ответ на | Re: Bulkloading using COPY - ignore duplicates? (Hannu Krosing <hannu@tm.ee>) |
Ответы |
Re: Bulkloading using COPY - ignore duplicates?
|
Список | pgsql-hackers |
Hannu Krosing writes:> Lee Kindness wrote:> > The majority of database systems out there handle this situation in> > onemanner or another (MySQL ignores or replaces; Ingres ignores;> > Oracle ignores or logs; others...). Indeed PostgreSQLcurrently checks> > for duplicates in the COPY code but throws an elog(ERROR) rather than> > ignoring the row,or passing the error back up the call chain.> I guess postgresql will be able to do it once savepoints get> implemented. This is encouraging to hear. I can see how this would make the code changes relatively minimal and more manageable - the changes to the current code are simply over my head! Are savepoints relatively high up on the TODO list, once 7.2 is out the door? > > My use of PostgreSQL is very time critical, and sadly this issue alone> > may force an evaluation of Oracle's performancein this respect!> Can't you clean the duplicates _outside_ postgresql, say> cat dumpfile | sort | uniq | psqldb -c 'copy mytable from stdin' This is certainly a possibility, however it's just really moving the processing elsewhere. The combined time is still around the same. I've/we've done a lot of investigation with approaches like this and also with techniques assuming the locality of the duplicates (which is a no-goer). None improve the situation. I'm not going to compare the time of just using INSERTs rather than COPY... Thanks for your response, Lee Kindness. -- Lee Kindness, Senior Software Engineer, Concept Systems Limited.http://services.csl.co.uk/ http://www.csl.co.uk/ +44 1315575595
В списке pgsql-hackers по дате отправления: