Re: How to insert a bulk of data with unique-violations very fast
От | Torsten Zühlsdorff |
---|---|
Тема | Re: How to insert a bulk of data with unique-violations very fast |
Дата | |
Msg-id | hungra$vlt$2@news.eternal-september.org обсуждение исходный текст |
Ответ на | Re: How to insert a bulk of data with unique-violations very fast ("Pierre C" <lists@peufeu.com>) |
Ответы |
Re: How to insert a bulk of data with unique-violations
very fast
|
Список | pgsql-performance |
Pierre C schrieb: > >> Within the data to import most rows have 20 till 50 duplicates. >> Sometime much more, sometimes less. > > In that case (source data has lots of redundancy), after importing the > data chunks in parallel, you can run a first pass of de-duplication on > the chunks, also in parallel, something like : > > CREATE TEMP TABLE foo_1_dedup AS SELECT DISTINCT * FROM foo_1; > > or you could compute some aggregates, counts, etc. Same as before, no > WAL needed, and you can use all your cores in parallel. > > From what you say this should reduce the size of your imported data by > a lot (and hence the time spent in the non-parallel operation). Thank you very much for this advice. I've tried it inanother project with similar import-problems. This really speed the import up. Thank everyone for your time and help! Greetings, Torsten -- http://www.dddbl.de - ein Datenbank-Layer, der die Arbeit mit 8 verschiedenen Datenbanksystemen abstrahiert, Queries von Applikationen trennt und automatisch die Query-Ergebnisse auswerten kann.
В списке pgsql-performance по дате отправления: