Re: Parallel copy
От | Heikki Linnakangas |
---|---|
Тема | Re: Parallel copy |
Дата | |
Msg-id | 60fb6859-60f4-4a5b-1866-73c3cd2f0d16@iki.fi обсуждение исходный текст |
Ответ на | Re: Parallel copy (Amit Kapila <amit.kapila16@gmail.com>) |
Список | pgsql-hackers |
On 03/11/2020 10:59, Amit Kapila wrote: > On Mon, Nov 2, 2020 at 12:40 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> However, the point of parallel copy is to maximize bandwidth. > > Okay, but this first-phase (finding the line boundaries) can anyway > be not done in parallel and we have seen in some of the initial > benchmarking that this initial phase is a small part of work > especially when the table has indexes, constraints, etc. So, I think > it won't matter much if this splitting is done in a single process > or multiple processes. Right, it won't matter performance-wise. That's not my point. The difference is in the complexity. If you don't store the line boundaries in shared memory, you get away with much simpler shared memory structures. > I think something close to that is discussed as you have noticed in > your next email but IIRC, because many people (Andres, Ants, myself > and author) favoured the current approach (single reader and multiple > consumers) we decided to go with that. I feel this patch is very much > in the POC stage due to which the code doesn't look good and as we > move forward we need to see what is the better way to improve it, > maybe one of the ways is to split it as you are suggesting so that it > can be easier to review. Sure. I think the roadmap here is: 1. Split copy.c [1]. Not strictly necessary, but I think it'd make this nice to review and work with. 2. Refactor CopyReadLine(), so that finding the line-endings and the rest of the line-parsing are separated into separate functions. 3. Implement parallel copy. > I think the other important thing which this > patch has not addressed properly is the parallel-safety checks as > pointed by me earlier. There are two things to solve there (a) the > lower-level code (like heap_* APIs, CommandCounterIncrement, xact.c > APIs, etc.) have checks which doesn't allow any writes, we need to see > which of those we can open now (or do some additional work to prevent > from those checks) after some of the work done for parallel-writes in > PG-13[1][2], and (b) in which all cases we can parallel-writes > (parallel copy) is allowed, for example need to identify whether table > or one of its partitions has any constraint/expression which is > parallel-unsafe. Agreed, that needs to be solved. I haven't given it any thought myself. - Heikki [1] https://www.postgresql.org/message-id/8e15b560-f387-7acc-ac90-763986617bfb%40iki.fi
В списке pgsql-hackers по дате отправления: