Re: Best strategy for bulk inserts where some violate unique constraint?
От | Adrian Klaver |
---|---|
Тема | Re: Best strategy for bulk inserts where some violate unique constraint? |
Дата | |
Msg-id | 52798192.7040000@gmail.com обсуждение исходный текст |
Ответ на | Re: Best strategy for bulk inserts where some violate unique constraint? (Daniele Varrazzo <daniele.varrazzo@gmail.com>) |
Ответы |
Re: Best strategy for bulk inserts where some violate
unique constraint?
|
Список | psycopg |
On 11/05/2013 03:31 PM, Daniele Varrazzo wrote: > > As the last example do you mean the executemany() example? That is not > going to be much faster than repeated execute(). > > The easiest thing you can do is to switch to autocommit=True and do > repeated execute with insert. If one fail you can just ignore the > IntegrityError and go on. > > About as easy you can prepare a statement and execute it repeatedly > using PREPARE/EXECUTE: see > <http://www.postgresql.org/docs/9.2/static/sql-prepare.html> There is > no builtin support for that in psycopg but you can just execute() > these statements. It may save you something. You can also take a look > at this example of a PREPAREing cursor: > <https://gist.github.com/dvarrazzo/3797445>. > > However, the fastest way to insert data into Postgres is COPY, see > <http://initd.org/psycopg/docs/cursor.html#cursor.copy_from>. You will > have to present your data as a file. I can't remember what happens > when a record fails the integrity test: I think the other would still > be inserted but you will have to check. It will fail. http://www.postgresql.org/docs/9.3/interactive/sql-copy.html COPY stops operation at the first error. This should not lead to problems in the event of a COPY TO, but the target table will already have received earlier rows in a COPY FROM. These rows will not be visible or accessible, but they still occupy disk space. This might amount to a considerable amount of wasted disk space if the failure happened well into a large copy operation. You might wish to invoke VACUUM to recover the wasted space. I would suggest checking out pg_loader it is designed to deal with this scenario. A much more robust strategy is > to create a temporary table with the right schema but without > constraints, load the data there using COPY and then move the data to > the final table using INSERT INTO ... SELECT * FROM temp_table WHERE > ... and specify a condition to avoid the records that would fail the > constraint. > > I would go for COPY, it's by far faster than execute[many], even > including prepare. > > -- Daniele > > -- Adrian Klaver adrian.klaver@gmail.com
В списке psycopg по дате отправления: