Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling
От | Alexander Korotkov |
---|---|
Тема | Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling |
Дата | |
Msg-id | CAPpHfdvV8FC67Emeb9XJpULkMOtrJiyC0dGL7FMSyRZ2SLk=5Q@mail.gmail.com обсуждение исходный текст |
Ответ на | [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errorshandling (Alexey Kondratov <kondratov.aleksey@gmail.com>) |
Ответы |
Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling
|
Список | pgsql-hackers |
Hi, Alexey!
On Tue, Mar 28, 2017 at 1:54 AM, Alexey Kondratov <kondratov.aleksey@gmail.com> wrote:
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
Thank you for your responses and valuable comments!I have written draft proposal https://docs.google.com/document/d/1Y4mc_ PCvRTjLsae-_ fhevYfepv4sxaqwhOo4rlxvK1c/ edit It seems that COPY currently is able to return first error line and error type (extra or missing columns, type parse error, etc).Thus, the approach similar to the Stas wrote should work and, being optimised for a small number of error rows, should notaffect COPY performance in such case.I will be glad to receive any critical remarks and suggestions.
I've following questions about your proposal.
1. Suppose we have to insert N records
2. We create subtransaction with these N records
3. Error is raised on k-th line
4. Then, we can safely insert all lines from 1st and till (k - 1)
5. Report, save to errors table or silently drop k-th line
6. Next, try to insert lines from (k + 1) till N with another subtransaction
7. Repeat until the end of file
Do you assume that we start new subtransaction in 4 since subtransaction we started in 2 is rolled back?
I am planning to use background worker processes for parallel COPY execution. Each process will receive equal piece of the input file. Since file is splitted by size not by lines, each worker will start import from the first new line to do not hit a broken line.
I think that situation when backend is directly reading file during COPY is not typical. More typical case is \copy psql command. In that case "COPY ... FROM stdin;" is actually executed while psql is streaming the data.
How can we apply parallel COPY in this case?
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
В списке pgsql-hackers по дате отправления: