Re: Make COPY format extendable: Extract COPY TO format implementations

Поиск
Список
Период
Сортировка
От Junwang Zhao
Тема Re: Make COPY format extendable: Extract COPY TO format implementations
Дата
Msg-id CAEG8a3LxnBwNRPRwvmimDvOkPvYL8pB1+rhLBnxjeddFt3MeNw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Make COPY format extendable: Extract COPY TO format implementations  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
Список pgsql-hackers
On Fri, Feb 2, 2024 at 2:21 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Feb 02, 2024 at 09:40:56AM +0900, Sutou Kouhei wrote:
> > Thanks. It'll help us.
>
> I have done a review of v10, see v11 attached which is still WIP, with
> the patches for COPY TO and COPY FROM merged together.  Note that I'm
> thinking to merge them into a single commit.
>
> @@ -74,11 +75,11 @@ typedef struct CopyFormatOptions
>      bool        convert_selectively;    /* do selective binary conversion? */
>      CopyOnErrorChoice on_error; /* what to do when error happened */
>      List       *convert_select; /* list of column names (can be NIL) */
> +    const        CopyToRoutine *to_routine;    /* callback routines for COPY TO */
>  } CopyFormatOptions;
>
> Adding the routines to the structure for the format options is in my
> opinion incorrect.  The elements of this structure are first processed
> in the option deparsing path, and then we need to use the options to
> guess which routines we need.  A more natural location is cstate
> itself, so as the pointer to the routines is isolated within copyto.c

I agree CopyToRoutine should be placed into CopyToStateData, but
why set it after ProcessCopyOptions, the implementation of
CopyToGetRoutine doesn't make sense if we want to support custom
format in the future.

Seems the refactor of v11 only considered performance but not
*extendable copy format*.

> and copyfrom_internal.h.  My point is: the routines are an
> implementation detail that the centralized copy.c has no need to know
> about.  This also led to a strange separation with
> ProcessCopyOptionFormatFrom() and ProcessCopyOptionFormatTo() to fit
> the hole in-between.
>
> The separation between cstate and the format-related fields could be
> much better, though I am not sure if it is worth doing as it
> introduces more duplication.  For example, max_fields and raw_fields
> are specific to text and csv, while binary does not care much.
> Perhaps this is just useful to be for custom formats.

I think those can be placed in format specific fields by utilizing the opaque
space, but yeah, this will introduce duplication.

>
> copyapi.h needs more documentation, like what is expected for
> extension developers when using these, what are the arguments, etc.  I
> have added what I had in mind for now.
>
> +typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, int m);
>
> CopyReadAttributes and PostpareColumnValue are also callbacks specific
> to text and csv, except that they are used within the per-row
> callbacks.  The same can be said about CopyAttributeOutHeaderFunction.
> It seems to me that it would be less confusing to store pointers to
> them in the routine structures, where the final picture involves not
> having multiple layers of APIs like CopyToCSVStart,
> CopyAttributeOutTextValue, etc.  These *have* to be documented
> properly in copyapi.h, and this is much easier now that cstate stores
> the routine pointers.  That would also make simpler function stacks.
> Note that I have not changed that in the v11 attached.
>
> This business with the extra callbacks required for csv and text is my
> main point of contention, but I'd be OK once the model of the APIs is
> more linear, with everything in Copy{From,To}State.  The changes would
> be rather simple, and I'd be OK to put my hands on it.  Just,
> Sutou-san, would you agree with my last point about these extra
> callbacks?
> --
> Michael

If V7 and V10 have no performance reduction, then I think V6 is also
good with performance, since most of the time goes to CopyToOneRow
and CopyFromOneRow.

I just think we should take the *extendable* into consideration at
the beginning.

--
Regards
Junwang Zhao



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Is this a problem in GenericXLogFinish()?
Следующее
От: Sutou Kouhei
Дата:
Сообщение: Re: Make COPY format extendable: Extract COPY TO format implementations