Re: Make COPY format extendable: Extract COPY TO format implementations

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Make COPY format extendable: Extract COPY TO format implementations
Дата
Msg-id ZbyJ60Fd7CHt4m0i@paquier.xyz
обсуждение исходный текст
Ответ на Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
Ответы Re: Make COPY format extendable: Extract COPY TO format implementations  (Junwang Zhao <zhjwpku@gmail.com>)
Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
Список pgsql-hackers
On Fri, Feb 02, 2024 at 09:40:56AM +0900, Sutou Kouhei wrote:
> Thanks. It'll help us.

I have done a review of v10, see v11 attached which is still WIP, with
the patches for COPY TO and COPY FROM merged together.  Note that I'm
thinking to merge them into a single commit.

@@ -74,11 +75,11 @@ typedef struct CopyFormatOptions
     bool        convert_selectively;    /* do selective binary conversion? */
     CopyOnErrorChoice on_error; /* what to do when error happened */
     List       *convert_select; /* list of column names (can be NIL) */
+    const        CopyToRoutine *to_routine;    /* callback routines for COPY TO */
 } CopyFormatOptions;

Adding the routines to the structure for the format options is in my
opinion incorrect.  The elements of this structure are first processed
in the option deparsing path, and then we need to use the options to
guess which routines we need.  A more natural location is cstate
itself, so as the pointer to the routines is isolated within copyto.c
and copyfrom_internal.h.  My point is: the routines are an
implementation detail that the centralized copy.c has no need to know
about.  This also led to a strange separation with
ProcessCopyOptionFormatFrom() and ProcessCopyOptionFormatTo() to fit
the hole in-between.

The separation between cstate and the format-related fields could be
much better, though I am not sure if it is worth doing as it
introduces more duplication.  For example, max_fields and raw_fields
are specific to text and csv, while binary does not care much.
Perhaps this is just useful to be for custom formats.

copyapi.h needs more documentation, like what is expected for
extension developers when using these, what are the arguments, etc.  I
have added what I had in mind for now.

+typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, int m);

CopyReadAttributes and PostpareColumnValue are also callbacks specific
to text and csv, except that they are used within the per-row
callbacks.  The same can be said about CopyAttributeOutHeaderFunction.
It seems to me that it would be less confusing to store pointers to
them in the routine structures, where the final picture involves not
having multiple layers of APIs like CopyToCSVStart,
CopyAttributeOutTextValue, etc.  These *have* to be documented
properly in copyapi.h, and this is much easier now that cstate stores
the routine pointers.  That would also make simpler function stacks.
Note that I have not changed that in the v11 attached.

This business with the extra callbacks required for csv and text is my
main point of contention, but I'd be OK once the model of the APIs is
more linear, with everything in Copy{From,To}State.  The changes would
be rather simple, and I'd be OK to put my hands on it.  Just,
Sutou-san, would you agree with my last point about these extra
callbacks?
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Synchronizing slots from primary to standby
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Commitfest 2024-01 is now closed