Re: Bulkloading using COPY - ignore duplicates?
От | Lee Kindness |
---|---|
Тема | Re: Bulkloading using COPY - ignore duplicates? |
Дата | |
Msg-id | 15290.57850.68004.391312@elsick.csl.co.uk обсуждение исходный текст |
Ответ на | Re: Bulkloading using COPY - ignore duplicates? (Peter Eisentraut <peter_e@gmx.net>) |
Ответы |
Bulkloading using COPY - ignore duplicates?
Re: Bulkloading using COPY - ignore duplicates? |
Список | pgsql-hackers |
Peter Eisentraut writes:> However, it seems to me that COPY ignoring duplicates can easily be> done by preprocessing theinput file. Or by post-processing, like (error checking cut): void import_shots(char *impfile, int lineshoot_id){ char tab_name[128]; char tab_temp[128]; frig_file(impfile); /* add the postgres header */ sprintf(tab_name, "shot_%d", lineshoot_id); sprintf(tab_temp, "shot_%d_tmp",lineshoot_id); sprintf(cmd, "CREATE TEMPORARY TABLE %s AS SELECT * FROM shot", tab_temp); EXEC SQL EXECUTE IMMEDIATE :cmd; EXEC SQL COMMITWORK; /* will not work without comit here! */ sprintf(cmd, "COPY BINARY %s FROM '%s'", tab_temp, impfile); append_page_alloc(cmd, tab_name, impfile, 1); EXEC SQL EXECUTEIMMEDIATE :cmd; sprintf(cmd, "INSERT INTO %s SELECT DISTINCT ON(shot_time) * FROM %s", tab_name, tab_temp); EXECSQL EXECUTE IMMEDIATE :cmd; sprintf(cmd, "DROP TABLE %s", tab_temp); EXEC SQL EXECUTE IMMEDIATE :cmd; EXEC SQL COMMIT WORK ; remove(impfile);} However this is adding significant time to the import operation. Likewise I could loop round the input file first and hunt for duplicates, again with a performance hit. My main point is that Postgres can easily and quickly check for duplicates during the COPY (as it does currently) and it adds zero execution time to simply ignore these duplicate rows. Obviously this is a useful feature otherwise Oracle, Ingres and other commercial relational databases wouldn't feature similiar functionality. Yes, in an ideal world the input to COPY should be clean and consistent with defined indexes. However this is only really the case when COPY is used for database/table backup and restore. It misses the point that a major use of COPY is in speed optimisation on bulk inserts... Lee.
В списке pgsql-hackers по дате отправления: