Re: Bulkloading using COPY - ignore duplicates?
| От | Hannu Krosing |
|---|---|
| Тема | Re: Bulkloading using COPY - ignore duplicates? |
| Дата | |
| Msg-id | 3C18B445.3C7774D9@tm.ee обсуждение исходный текст |
| Ответ на | Bulkloading using COPY - ignore duplicates? (Lee Kindness <lkindness@csl.co.uk>) |
| Ответы |
Re: Bulkloading using COPY - ignore duplicates?
|
| Список | pgsql-hackers |
Lee Kindness wrote: > > Patrick Welche writes: > > On Mon, Oct 01, 2001 at 03:17:43PM +0100, Lee Kindness wrote: > > > Please, don't get me wrong - I don't want to come across arrogant. I'm > > > simply trying to improve the 'COPY FROM' command in a situation where > > > speed is a critical issue and the data is dirty... And that must be a > > > relatively common scenario. > > Isn't that when you do your bulk copy into into a holding table, then > > clean it up, and then insert into your live system? > > That's what I'm currently doing as a workaround - a SELECT DISTINCT > from a temporary table into the real table with the unique index on > it. However this takes absolute ages - say 5 seconds for the copy > (which is the ballpark figure I aiming toward and can achieve with > Ingres) plus another 30ish seconds for the SELECT DISTINCT. > > The majority of database systems out there handle this situation in > one manner or another (MySQL ignores or replaces; Ingres ignores; > Oracle ignores or logs; others...). Indeed PostgreSQL currently checks > for duplicates in the COPY code but throws an elog(ERROR) rather than > ignoring the row, or passing the error back up the call chain. I guess postgresql will be able to do it once savepoints get implemented. > My use of PostgreSQL is very time critical, and sadly this issue alone > may force an evaluation of Oracle's performance in this respect! Can't you clean the duplicates _outside_ postgresql, say cat dumpfile | sort | uniq | psql db -c 'copy mytable from stdin' with your version of uniq. or perhaps psql db -c 'copy mytable to stdout' >> dumpfile sort dumpfile | uniq | psql db -c 'copy mytable from stdin' if you already have something in mytable. ------------ Hannu
В списке pgsql-hackers по дате отправления: