Re: Make COPY format extendable: Extract COPY TO format implementations

Поиск
Список
Период
Сортировка
От Sutou Kouhei
Тема Re: Make COPY format extendable: Extract COPY TO format implementations
Дата
Msg-id 20240131.141122.279551156957581322.kou@clear-code.com
обсуждение исходный текст
Ответ на Re: Make COPY format extendable: Extract COPY TO format implementations  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Make COPY format extendable: Extract COPY TO format implementations  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
Hi,

In <Zbi1TwPfAvUpKqTd@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 30 Jan 2024 17:37:35 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:

>> We use defel->location for an error message. (We don't need
>> to set location for the default "text" DefElem.)
> 
> Yeah, but you should not need to have this error in the paths that set
> the callback routines in opts_out if the same validation happens a few
> lines before, in copy.c.

Ah, yes. defel->location is used in later patches. For
example, it's used when a COPY handler for the specified
FORMAT isn't found.

>                           I am really worried about the complexities
> this thread is getting into because we are trying to shape the
> callbacks in the most generic way possible based on *two* use cases.
> This is going to be a never-ending discussion.  I'd rather get some
> simple basics, and then we can discuss if tweaking the callbacks is
> really necessary or not.  Even after introducing the pg_proc lookups
> to get custom callbacks.

I understand your concern. Let's introduce minimal callbacks
as the first step. I think that we completed our design
discussion for this feature. We can choose minimal callbacks
based on the discussion.

>         The custom options don't seem like an absolute strong
> requirement for the first shot with the callbacks or even the
> possibility to retrieve the callbacks from a function call.  I mean,
> you could provide some control with SET commands and a few GUCs, at
> least, even if that would be strange.  Manipulations with a list of
> DefElems is the intuitive way to have custom options at query level,
> but we also have to guess the set of callbacks from this list of
> DefElems coming from the query.  You see my point, I am not sure 
> if it would be the best thing to process twice the options, especially
> when it comes to decide if a DefElem should be valid or not depending
> on the callbacks used.  Or we could use a kind of "special" DefElem
> where we could store a set of key:value fed to a callback :)

Interesting. Let's remove custom options support from the
initial minimal callbacks.

>> Can I prepare the v10 patch set as "the v7 patch set" + "the
>> removal of the "if" checks in NextCopyFromRawFields()"?
>> (+ reverting opts_out->{csv_mode,binary} changes in
>> ProcessCopyOptions().)
> 
> Yep, if I got it that would make sense to me.  If you can do that,
> that would help quite a bit.  :)

I've prepared the v10 patch set. Could you try this?

Changes since the v7 patch set:

0001:

* Remove CopyToProcessOption() callback
* Remove CopyToGetFormat() callback
* Revert passing CopyToState to ProcessCopyOptions()
* Revert moving "opts_out->{csv_mode,binary} = true" to
  ProcessCopyOptionFormatTo()
* Change to receive "const char *format" instead "DefElem  *defel"
  by ProcessCopyOptionFormatTo()

0002:

* Remove CopyFromProcessOption() callback
* Remove CopyFromGetFormat() callback
* Change to receive "const char *format" instead "DefElem
  *defel" by ProcessCopyOptionFormatFrom()
* Remove "if (cstate->opts.csv_mode)" branches from
  NextCopyFromRawFields()



FYI: Here are Copy{From,To}Routine in the v10 patch set. I
think that only Copy{From,To}OneRow are minimal callbacks
for the performance gain. But can we keep Copy{From,To}Start
and Copy{From,To}End for consistency? We can remove a few
{csv_mode,binary} conditions by Copy{From,To}{Start,End}. It
doesn't depend on the number of COPY target tuples. So they
will not affect performance.

/* Routines for a COPY FROM format implementation. */
typedef struct CopyFromRoutine
{
    /*
     * Called when COPY FROM is started. This will initialize something and
     * receive a header.
     */
    void        (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc);

    /* Copy one row. It returns false if no more tuples. */
    bool        (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls);

    /* Called when COPY FROM is ended. This will finalize something. */
    void        (*CopyFromEnd) (CopyFromState cstate);
}            CopyFromRoutine;

/* Routines for a COPY TO format implementation. */
typedef struct CopyToRoutine
{
    /* Called when COPY TO is started. This will send a header. */
    void        (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);

    /* Copy one row for COPY TO. */
    void        (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot);

    /* Called when COPY TO is ended. This will send a trailer. */
    void        (*CopyToEnd) (CopyToState cstate);
}            CopyToRoutine;




Thanks,
-- 
kou
From f827f1f1632dc330ef5d78141b85df8ca1bce63b Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 31 Jan 2024 13:22:04 +0900
Subject: [PATCH v10 1/2] Extract COPY TO format implementations

This doesn't change the current behavior. This just introduces
CopyToRoutine, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.

This is for performance. We can remove "if (cstate->opts.csv_mode)"
and "if (!cstate->opts.binary)" branches in CopyOneRowTo() by using
callbacks for each format. It improves performance.
---
 src/backend/commands/copy.c    |   8 +
 src/backend/commands/copyto.c  | 494 +++++++++++++++++++++++----------
 src/include/commands/copy.h    |   6 +-
 src/include/commands/copyapi.h |  35 +++
 4 files changed, 389 insertions(+), 154 deletions(-)
 create mode 100644 src/include/commands/copyapi.h

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cc0786c6f4..c88510f8c7 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -487,6 +487,8 @@ ProcessCopyOptions(ParseState *pstate,
                         (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                          errmsg("COPY format \"%s\" not recognized", fmt),
                          parser_errposition(pstate, defel->location)));
+            if (!is_from)
+                ProcessCopyOptionFormatTo(pstate, opts_out, fmt);
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
@@ -622,6 +624,12 @@ ProcessCopyOptions(ParseState *pstate,
                             defel->defname),
                      parser_errposition(pstate, defel->location)));
     }
+    if (!format_specified)
+    {
+        /* Set the default format. */
+        if (!is_from)
+            ProcessCopyOptionFormatTo(pstate, opts_out, "text");
+    }
 
     /*
      * Check for incompatible options (must do these two before inserting
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d3dc3fc854..70a28ab44d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -131,6 +131,345 @@ static void CopySendEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
+/*
+ * CopyToRoutine implementations.
+ */
+
+/*
+ * CopyToRoutine implementation for "text" and "csv". CopyToTextBased*() are
+ * shared by both of "text" and "csv". CopyToText*() are only for "text" and
+ * CopyToCSV*() are only for "csv".
+ *
+ * We can use the same functions for all callbacks by referring
+ * cstate->opts.csv_mode but splitting callbacks to eliminate "if
+ * (cstate->opts.csv_mode)" branches from all callbacks has performance
+ * merit when many tuples are copied. So we use separated callbacks for "text"
+ * and "csv".
+ */
+
+static void
+CopyToTextBasedSendEndOfRow(CopyToState cstate)
+{
+    switch (cstate->copy_dest)
+    {
+        case COPY_FILE:
+            /* Default line termination depends on platform */
+#ifndef WIN32
+            CopySendChar(cstate, '\n');
+#else
+            CopySendString(cstate, "\r\n");
+#endif
+            break;
+        case COPY_FRONTEND:
+            /* The FE/BE protocol uses \n as newline for all platforms */
+            CopySendChar(cstate, '\n');
+            break;
+        default:
+            break;
+    }
+    CopySendEndOfRow(cstate);
+}
+
+typedef void (*CopyAttributeOutHeaderFunction) (CopyToState cstate, char *string);
+
+/*
+ * We can use CopyAttributeOutText() directly but define this for consistency
+ * with CopyAttributeOutCSVHeader(). "static inline" will prevent performance
+ * penalty by this wrapping.
+ */
+static inline void
+CopyAttributeOutTextHeader(CopyToState cstate, char *string)
+{
+    CopyAttributeOutText(cstate, string);
+}
+
+static inline void
+CopyAttributeOutCSVHeader(CopyToState cstate, char *string)
+{
+    CopyAttributeOutCSV(cstate, string, false,
+                        list_length(cstate->attnumlist) == 1);
+}
+
+/*
+ * We don't use this function as a callback directly. We define
+ * CopyToTextStart() and CopyToCSVStart() and use them instead. It's for
+ * eliminating a "if (cstate->opts.csv_mode)" branch. This callback is called
+ * only once per COPY TO. So this optimization may be meaningless but done for
+ * consistency with CopyToTextBasedOneRow().
+ *
+ * This must initialize cstate->out_functions for CopyToTextBasedOneRow().
+ */
+static inline void
+CopyToTextBasedStart(CopyToState cstate, TupleDesc tupDesc, CopyAttributeOutHeaderFunction out)
+{
+    int            num_phys_attrs;
+    ListCell   *cur;
+
+    num_phys_attrs = tupDesc->natts;
+    /* Get info about the columns we need to process. */
+    cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Oid            out_func_oid;
+        bool        isvarlena;
+        Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+        getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
+        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+    }
+
+    /*
+     * For non-binary copy, we need to convert null_print to file encoding,
+     * because it will be sent directly with CopySendString.
+     */
+    if (cstate->need_transcoding)
+        cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
+                                                          cstate->opts.null_print_len,
+                                                          cstate->file_encoding);
+
+    /* if a header has been requested send the line */
+    if (cstate->opts.header_line)
+    {
+        bool        hdr_delim = false;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            char       *colname;
+
+            if (hdr_delim)
+                CopySendChar(cstate, cstate->opts.delim[0]);
+            hdr_delim = true;
+
+            colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+            out(cstate, colname);
+        }
+
+        CopyToTextBasedSendEndOfRow(cstate);
+    }
+}
+
+static void
+CopyToTextStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    CopyToTextBasedStart(cstate, tupDesc, CopyAttributeOutTextHeader);
+}
+
+static void
+CopyToCSVStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    CopyToTextBasedStart(cstate, tupDesc, CopyAttributeOutCSVHeader);
+}
+
+typedef void (*CopyAttributeOutValueFunction) (CopyToState cstate, char *string, int attnum);
+
+static inline void
+CopyAttributeOutTextValue(CopyToState cstate, char *string, int attnum)
+{
+    CopyAttributeOutText(cstate, string);
+}
+
+static inline void
+CopyAttributeOutCSVValue(CopyToState cstate, char *string, int attnum)
+{
+    CopyAttributeOutCSV(cstate, string,
+                        cstate->opts.force_quote_flags[attnum - 1],
+                        list_length(cstate->attnumlist) == 1);
+}
+
+/*
+ * We don't use this function as a callback directly. We define
+ * CopyToTextOneRow() and CopyToCSVOneRow() and use them instead. It's for
+ * eliminating a "if (cstate->opts.csv_mode)" branch. This callback is called
+ * per tuple. So this optimization will be valuable when many tuples are
+ * copied.
+ *
+ * cstate->out_functions must be initialized in CopyToTextBasedStart().
+ */
+static void
+CopyToTextBasedOneRow(CopyToState cstate, TupleTableSlot *slot, CopyAttributeOutValueFunction out)
+{
+    bool        need_delim = false;
+    FmgrInfo   *out_functions = cstate->out_functions;
+    ListCell   *cur;
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (need_delim)
+            CopySendChar(cstate, cstate->opts.delim[0]);
+        need_delim = true;
+
+        if (isnull)
+        {
+            CopySendString(cstate, cstate->opts.null_print_client);
+        }
+        else
+        {
+            char       *string;
+
+            string = OutputFunctionCall(&out_functions[attnum - 1], value);
+            out(cstate, string, attnum);
+        }
+    }
+
+    CopyToTextBasedSendEndOfRow(cstate);
+}
+
+static void
+CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextBasedOneRow(cstate, slot, CopyAttributeOutTextValue);
+}
+
+static void
+CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextBasedOneRow(cstate, slot, CopyAttributeOutCSVValue);
+}
+
+static void
+CopyToTextBasedEnd(CopyToState cstate)
+{
+}
+
+/*
+ * CopyToRoutine implementation for "binary".
+ */
+
+/*
+ * This must initialize cstate->out_functions for CopyToBinaryOneRow().
+ */
+static void
+CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    int            num_phys_attrs;
+    ListCell   *cur;
+
+    num_phys_attrs = tupDesc->natts;
+    /* Get info about the columns we need to process. */
+    cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Oid            out_func_oid;
+        bool        isvarlena;
+        Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+        getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
+        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+    }
+
+    {
+        /* Generate header for a binary copy */
+        int32        tmp;
+
+        /* Signature */
+        CopySendData(cstate, BinarySignature, 11);
+        /* Flags field */
+        tmp = 0;
+        CopySendInt32(cstate, tmp);
+        /* No header extension */
+        tmp = 0;
+        CopySendInt32(cstate, tmp);
+    }
+}
+
+/*
+ * cstate->out_functions must be initialized in CopyToBinaryStart().
+ */
+static void
+CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    FmgrInfo   *out_functions = cstate->out_functions;
+    ListCell   *cur;
+
+    /* Binary per-tuple header */
+    CopySendInt16(cstate, list_length(cstate->attnumlist));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (isnull)
+        {
+            CopySendInt32(cstate, -1);
+        }
+        else
+        {
+            bytea       *outputbytes;
+
+            outputbytes = SendFunctionCall(&out_functions[attnum - 1], value);
+            CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+            CopySendData(cstate, VARDATA(outputbytes),
+                         VARSIZE(outputbytes) - VARHDRSZ);
+        }
+    }
+
+    CopySendEndOfRow(cstate);
+}
+
+static void
+CopyToBinaryEnd(CopyToState cstate)
+{
+    /* Generate trailer for a binary copy */
+    CopySendInt16(cstate, -1);
+    /* Need to flush out the trailer */
+    CopySendEndOfRow(cstate);
+}
+
+/*
+ * CopyToTextBased*() are shared with "csv". CopyToText*() are only for "text".
+ */
+static const CopyToRoutine CopyToRoutineText = {
+    .CopyToStart = CopyToTextStart,
+    .CopyToOneRow = CopyToTextOneRow,
+    .CopyToEnd = CopyToTextBasedEnd,
+};
+
+/*
+ * CopyToTextBased*() are shared with "text". CopyToCSV*() are only for "csv".
+ */
+static const CopyToRoutine CopyToRoutineCSV = {
+    .CopyToStart = CopyToCSVStart,
+    .CopyToOneRow = CopyToCSVOneRow,
+    .CopyToEnd = CopyToTextBasedEnd,
+};
+
+static const CopyToRoutine CopyToRoutineBinary = {
+    .CopyToStart = CopyToBinaryStart,
+    .CopyToOneRow = CopyToBinaryOneRow,
+    .CopyToEnd = CopyToBinaryEnd,
+};
+
+/*
+ * Process the FORMAT option for COPY TO.
+ *
+ * 'format' must be "text", "csv" or "binary".
+ */
+void
+ProcessCopyOptionFormatTo(ParseState *pstate,
+                          CopyFormatOptions *opts_out,
+                          const char *format)
+{
+    if (strcmp(format, "text") == 0)
+        opts_out->to_routine = &CopyToRoutineText;
+    else if (strcmp(format, "csv") == 0)
+    {
+        opts_out->to_routine = &CopyToRoutineCSV;
+    }
+    else if (strcmp(format, "binary") == 0)
+    {
+        opts_out->to_routine = &CopyToRoutineBinary;
+    }
+}
 
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
@@ -198,16 +537,6 @@ CopySendEndOfRow(CopyToState cstate)
     switch (cstate->copy_dest)
     {
         case COPY_FILE:
-            if (!cstate->opts.binary)
-            {
-                /* Default line termination depends on platform */
-#ifndef WIN32
-                CopySendChar(cstate, '\n');
-#else
-                CopySendString(cstate, "\r\n");
-#endif
-            }
-
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -242,10 +571,6 @@ CopySendEndOfRow(CopyToState cstate)
             }
             break;
         case COPY_FRONTEND:
-            /* The FE/BE protocol uses \n as newline for all platforms */
-            if (!cstate->opts.binary)
-                CopySendChar(cstate, '\n');
-
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
@@ -748,8 +1073,6 @@ DoCopyTo(CopyToState cstate)
     bool        pipe = (cstate->filename == NULL && cstate->data_dest_cb == NULL);
     bool        fe_copy = (pipe && whereToSendOutput == DestRemote);
     TupleDesc    tupDesc;
-    int            num_phys_attrs;
-    ListCell   *cur;
     uint64        processed;
 
     if (fe_copy)
@@ -759,32 +1082,11 @@ DoCopyTo(CopyToState cstate)
         tupDesc = RelationGetDescr(cstate->rel);
     else
         tupDesc = cstate->queryDesc->tupDesc;
-    num_phys_attrs = tupDesc->natts;
     cstate->opts.null_print_client = cstate->opts.null_print;    /* default */
 
     /* We use fe_msgbuf as a per-row buffer regardless of copy_dest */
     cstate->fe_msgbuf = makeStringInfo();
 
-    /* Get info about the columns we need to process. */
-    cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
-    foreach(cur, cstate->attnumlist)
-    {
-        int            attnum = lfirst_int(cur);
-        Oid            out_func_oid;
-        bool        isvarlena;
-        Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
-
-        if (cstate->opts.binary)
-            getTypeBinaryOutputInfo(attr->atttypid,
-                                    &out_func_oid,
-                                    &isvarlena);
-        else
-            getTypeOutputInfo(attr->atttypid,
-                              &out_func_oid,
-                              &isvarlena);
-        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
-    }
-
     /*
      * Create a temporary memory context that we can reset once per row to
      * recover palloc'd memory.  This avoids any problems with leaks inside
@@ -795,57 +1097,7 @@ DoCopyTo(CopyToState cstate)
                                                "COPY TO",
                                                ALLOCSET_DEFAULT_SIZES);
 
-    if (cstate->opts.binary)
-    {
-        /* Generate header for a binary copy */
-        int32        tmp;
-
-        /* Signature */
-        CopySendData(cstate, BinarySignature, 11);
-        /* Flags field */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-        /* No header extension */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-    }
-    else
-    {
-        /*
-         * For non-binary copy, we need to convert null_print to file
-         * encoding, because it will be sent directly with CopySendString.
-         */
-        if (cstate->need_transcoding)
-            cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
-                                                              cstate->opts.null_print_len,
-                                                              cstate->file_encoding);
-
-        /* if a header has been requested send the line */
-        if (cstate->opts.header_line)
-        {
-            bool        hdr_delim = false;
-
-            foreach(cur, cstate->attnumlist)
-            {
-                int            attnum = lfirst_int(cur);
-                char       *colname;
-
-                if (hdr_delim)
-                    CopySendChar(cstate, cstate->opts.delim[0]);
-                hdr_delim = true;
-
-                colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
-
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, colname, false,
-                                        list_length(cstate->attnumlist) == 1);
-                else
-                    CopyAttributeOutText(cstate, colname);
-            }
-
-            CopySendEndOfRow(cstate);
-        }
-    }
+    cstate->opts.to_routine->CopyToStart(cstate, tupDesc);
 
     if (cstate->rel)
     {
@@ -884,13 +1136,7 @@ DoCopyTo(CopyToState cstate)
         processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
     }
 
-    if (cstate->opts.binary)
-    {
-        /* Generate trailer for a binary copy */
-        CopySendInt16(cstate, -1);
-        /* Need to flush out the trailer */
-        CopySendEndOfRow(cstate);
-    }
+    cstate->opts.to_routine->CopyToEnd(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -906,71 +1152,15 @@ DoCopyTo(CopyToState cstate)
 static void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-    bool        need_delim = false;
-    FmgrInfo   *out_functions = cstate->out_functions;
     MemoryContext oldcontext;
-    ListCell   *cur;
-    char       *string;
 
     MemoryContextReset(cstate->rowcontext);
     oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-    if (cstate->opts.binary)
-    {
-        /* Binary per-tuple header */
-        CopySendInt16(cstate, list_length(cstate->attnumlist));
-    }
-
     /* Make sure the tuple is fully deconstructed */
     slot_getallattrs(slot);
 
-    foreach(cur, cstate->attnumlist)
-    {
-        int            attnum = lfirst_int(cur);
-        Datum        value = slot->tts_values[attnum - 1];
-        bool        isnull = slot->tts_isnull[attnum - 1];
-
-        if (!cstate->opts.binary)
-        {
-            if (need_delim)
-                CopySendChar(cstate, cstate->opts.delim[0]);
-            need_delim = true;
-        }
-
-        if (isnull)
-        {
-            if (!cstate->opts.binary)
-                CopySendString(cstate, cstate->opts.null_print_client);
-            else
-                CopySendInt32(cstate, -1);
-        }
-        else
-        {
-            if (!cstate->opts.binary)
-            {
-                string = OutputFunctionCall(&out_functions[attnum - 1],
-                                            value);
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, string,
-                                        cstate->opts.force_quote_flags[attnum - 1],
-                                        list_length(cstate->attnumlist) == 1);
-                else
-                    CopyAttributeOutText(cstate, string);
-            }
-            else
-            {
-                bytea       *outputbytes;
-
-                outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-                                               value);
-                CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-                CopySendData(cstate, VARDATA(outputbytes),
-                             VARSIZE(outputbytes) - VARHDRSZ);
-            }
-        }
-    }
-
-    CopySendEndOfRow(cstate);
+    cstate->opts.to_routine->CopyToOneRow(cstate, slot);
 
     MemoryContextSwitchTo(oldcontext);
 }
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index b3da3cb0be..18486a3715 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
 #ifndef COPY_H
 #define COPY_H
 
+#include "commands/copyapi.h"
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
@@ -74,11 +75,11 @@ typedef struct CopyFormatOptions
     bool        convert_selectively;    /* do selective binary conversion? */
     CopyOnErrorChoice on_error; /* what to do when error happened */
     List       *convert_select; /* list of column names (can be NIL) */
+    const        CopyToRoutine *to_routine;    /* callback routines for COPY TO */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
+/* This is private in commands/copyfrom.c */
 typedef struct CopyFromStateData *CopyFromState;
-typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
 typedef void (*copy_data_dest_cb) (void *data, int len);
@@ -88,6 +89,7 @@ extern void DoCopy(ParseState *pstate, const CopyStmt *stmt,
                    uint64 *processed);
 
 extern void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, List *options);
+extern void ProcessCopyOptionFormatTo(ParseState *pstate, CopyFormatOptions *opts_out, const char *format);
 extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *whereClause,
                                    const char *filename,
                                    bool is_program, copy_data_source_cb data_source_cb, List *attnamelist, List
*options);
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 0000000000..2f9ecd0e2b
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,35 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyapi.h
+ *      API for COPY TO/FROM handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "executor/tuptable.h"
+
+/* This is private in commands/copyto.c */
+typedef struct CopyToStateData *CopyToState;
+
+/* Routines for a COPY TO format implementation. */
+typedef struct CopyToRoutine
+{
+    /* Called when COPY TO is started. This will send a header. */
+    void        (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);
+
+    /* Copy one row for COPY TO. */
+    void        (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot);
+
+    /* Called when COPY TO is ended. This will send a trailer. */
+    void        (*CopyToEnd) (CopyToState cstate);
+}            CopyToRoutine;
+
+#endif                            /* COPYAPI_H */
-- 
2.43.0

From 9487884f2c0a8976945778821abd850418b6623c Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 31 Jan 2024 13:37:02 +0900
Subject: [PATCH v10 2/2] Extract COPY FROM format implementations

This doesn't change the current behavior. This just introduces
CopyFromRoutine, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.

This is for performance. We can remove "if (cstate->opts.csv_mode)"
and "if (!cstate->opts.binary)" branches in NextCopyFrom() by using
callbacks for each format. It improves performance.
---
 src/backend/commands/copy.c              |   8 +-
 src/backend/commands/copyfrom.c          | 217 +++++++++---
 src/backend/commands/copyfromparse.c     | 420 +++++++++++++----------
 src/include/commands/copy.h              |   6 +-
 src/include/commands/copyapi.h           |  20 ++
 src/include/commands/copyfrom_internal.h |   4 +
 6 files changed, 447 insertions(+), 228 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c88510f8c7..cd79e614b9 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -487,7 +487,9 @@ ProcessCopyOptions(ParseState *pstate,
                         (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                          errmsg("COPY format \"%s\" not recognized", fmt),
                          parser_errposition(pstate, defel->location)));
-            if (!is_from)
+            if (is_from)
+                ProcessCopyOptionFormatFrom(pstate, opts_out, fmt);
+            else
                 ProcessCopyOptionFormatTo(pstate, opts_out, fmt);
         }
         else if (strcmp(defel->defname, "freeze") == 0)
@@ -627,7 +629,9 @@ ProcessCopyOptions(ParseState *pstate,
     if (!format_specified)
     {
         /* Set the default format. */
-        if (!is_from)
+        if (is_from)
+            ProcessCopyOptionFormatFrom(pstate, opts_out, "text");
+        else
             ProcessCopyOptionFormatTo(pstate, opts_out, "text");
     }
 
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1fe70b9133..b51096fc0d 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -108,6 +108,171 @@ static char *limit_printout_length(const char *str);
 
 static void ClosePipeFromProgram(CopyFromState cstate);
 
+
+/*
+ * CopyFromRoutine implementations.
+ */
+
+/*
+ * CopyFromRoutine implementation for "text" and "csv". CopyFromTextBased*()
+ * are shared by both of "text" and "csv". CopyFromText*() are only for "text"
+ * and CopyFromCSV*() are only for "csv".
+ *
+ * We can use the same functions for all callbacks by referring
+ * cstate->opts.csv_mode but splitting callbacks to eliminate "if
+ * (cstate->opts.csv_mode)" branches from all callbacks has performance merit
+ * when many tuples are copied. So we use separated callbacks for "text" and
+ * "csv".
+ */
+
+/*
+ * This must initialize cstate->in_functions for CopyFromTextBasedOneRow().
+ */
+static void
+CopyFromTextBasedStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    AttrNumber    num_phys_attrs = tupDesc->natts;
+    AttrNumber    attr_count;
+
+    /*
+     * If encoding conversion is needed, we need another buffer to hold the
+     * converted input data.  Otherwise, we can just point input_buf to the
+     * same buffer as raw_buf.
+     */
+    if (cstate->need_transcoding)
+    {
+        cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
+        cstate->input_buf_index = cstate->input_buf_len = 0;
+    }
+    else
+        cstate->input_buf = cstate->raw_buf;
+    cstate->input_reached_eof = false;
+
+    initStringInfo(&cstate->line_buf);
+
+    /*
+     * Pick up the required catalog information for each attribute in the
+     * relation, including the input function, the element type (to pass to
+     * the input function).
+     */
+    cstate->in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+    cstate->typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
+    for (int attnum = 1; attnum <= num_phys_attrs; attnum++)
+    {
+        Form_pg_attribute att = TupleDescAttr(tupDesc, attnum - 1);
+        Oid            in_func_oid;
+
+        /* We don't need info for dropped attributes */
+        if (att->attisdropped)
+            continue;
+
+        /* Fetch the input function and typioparam info */
+        getTypeInputInfo(att->atttypid,
+                         &in_func_oid, &cstate->typioparams[attnum - 1]);
+        fmgr_info(in_func_oid, &cstate->in_functions[attnum - 1]);
+    }
+
+    /* create workspace for CopyReadAttributes results */
+    attr_count = list_length(cstate->attnumlist);
+    cstate->max_fields = attr_count;
+    cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+}
+
+static void
+CopyFromTextBasedEnd(CopyFromState cstate)
+{
+}
+
+/*
+ * CopyFromRoutine implementation for "binary".
+ */
+
+/*
+ * This must initialize cstate->in_functions for CopyFromBinaryOneRow().
+ */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    AttrNumber    num_phys_attrs = tupDesc->natts;
+
+    /*
+     * Pick up the required catalog information for each attribute in the
+     * relation, including the input function, the element type (to pass to
+     * the input function).
+     */
+    cstate->in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+    cstate->typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
+    for (int attnum = 1; attnum <= num_phys_attrs; attnum++)
+    {
+        Form_pg_attribute att = TupleDescAttr(tupDesc, attnum - 1);
+        Oid            in_func_oid;
+
+        /* We don't need info for dropped attributes */
+        if (att->attisdropped)
+            continue;
+
+        /* Fetch the input function and typioparam info */
+        getTypeBinaryInputInfo(att->atttypid,
+                               &in_func_oid, &cstate->typioparams[attnum - 1]);
+        fmgr_info(in_func_oid, &cstate->in_functions[attnum - 1]);
+    }
+
+    /* Read and verify binary header */
+    ReceiveCopyBinaryHeader(cstate);
+}
+
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+}
+
+/*
+ * CopyFromTextBased*() are shared with "csv". CopyFromText*() are only for "text".
+ */
+static const CopyFromRoutine CopyFromRoutineText = {
+    .CopyFromStart = CopyFromTextBasedStart,
+    .CopyFromOneRow = CopyFromTextOneRow,
+    .CopyFromEnd = CopyFromTextBasedEnd,
+};
+
+/*
+ * CopyFromTextBased*() are shared with "text". CopyFromCSV*() are only for "csv".
+ */
+static const CopyFromRoutine CopyFromRoutineCSV = {
+    .CopyFromStart = CopyFromTextBasedStart,
+    .CopyFromOneRow = CopyFromCSVOneRow,
+    .CopyFromEnd = CopyFromTextBasedEnd,
+};
+
+static const CopyFromRoutine CopyFromRoutineBinary = {
+    .CopyFromStart = CopyFromBinaryStart,
+    .CopyFromOneRow = CopyFromBinaryOneRow,
+    .CopyFromEnd = CopyFromBinaryEnd,
+};
+
+/*
+ * Process the FORMAT option for COPY FROM.
+ *
+ * 'format' must be "text", "csv" or "binary".
+ */
+void
+ProcessCopyOptionFormatFrom(ParseState *pstate,
+                            CopyFormatOptions *opts_out,
+                            const char *format)
+{
+    if (strcmp(format, "text") == 0)
+        opts_out->from_routine = &CopyFromRoutineText;
+    else if (strcmp(format, "csv") == 0)
+    {
+        opts_out->from_routine = &CopyFromRoutineCSV;
+    }
+    else if (strcmp(format, "binary") == 0)
+    {
+        opts_out->from_routine = &CopyFromRoutineBinary;
+    }
+}
+
+
 /*
  * error context callback for COPY FROM
  *
@@ -1384,9 +1549,6 @@ BeginCopyFrom(ParseState *pstate,
     TupleDesc    tupDesc;
     AttrNumber    num_phys_attrs,
                 num_defaults;
-    FmgrInfo   *in_functions;
-    Oid           *typioparams;
-    Oid            in_func_oid;
     int           *defmap;
     ExprState **defexprs;
     MemoryContext oldcontext;
@@ -1571,25 +1733,6 @@ BeginCopyFrom(ParseState *pstate,
     cstate->raw_buf_index = cstate->raw_buf_len = 0;
     cstate->raw_reached_eof = false;
 
-    if (!cstate->opts.binary)
-    {
-        /*
-         * If encoding conversion is needed, we need another buffer to hold
-         * the converted input data.  Otherwise, we can just point input_buf
-         * to the same buffer as raw_buf.
-         */
-        if (cstate->need_transcoding)
-        {
-            cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
-            cstate->input_buf_index = cstate->input_buf_len = 0;
-        }
-        else
-            cstate->input_buf = cstate->raw_buf;
-        cstate->input_reached_eof = false;
-
-        initStringInfo(&cstate->line_buf);
-    }
-
     initStringInfo(&cstate->attribute_buf);
 
     /* Assign range table and rteperminfos, we'll need them in CopyFrom. */
@@ -1608,8 +1751,6 @@ BeginCopyFrom(ParseState *pstate,
      * the input function), and info about defaults and constraints. (Which
      * input function we use depends on text/binary format choice.)
      */
-    in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
-    typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
     defmap = (int *) palloc(num_phys_attrs * sizeof(int));
     defexprs = (ExprState **) palloc(num_phys_attrs * sizeof(ExprState *));
 
@@ -1621,15 +1762,6 @@ BeginCopyFrom(ParseState *pstate,
         if (att->attisdropped)
             continue;
 
-        /* Fetch the input function and typioparam info */
-        if (cstate->opts.binary)
-            getTypeBinaryInputInfo(att->atttypid,
-                                   &in_func_oid, &typioparams[attnum - 1]);
-        else
-            getTypeInputInfo(att->atttypid,
-                             &in_func_oid, &typioparams[attnum - 1]);
-        fmgr_info(in_func_oid, &in_functions[attnum - 1]);
-
         /* Get default info if available */
         defexprs[attnum - 1] = NULL;
 
@@ -1689,8 +1821,6 @@ BeginCopyFrom(ParseState *pstate,
     cstate->bytes_processed = 0;
 
     /* We keep those variables in cstate. */
-    cstate->in_functions = in_functions;
-    cstate->typioparams = typioparams;
     cstate->defmap = defmap;
     cstate->defexprs = defexprs;
     cstate->volatile_defexprs = volatile_defexprs;
@@ -1763,20 +1893,7 @@ BeginCopyFrom(ParseState *pstate,
 
     pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-    if (cstate->opts.binary)
-    {
-        /* Read and verify binary header */
-        ReceiveCopyBinaryHeader(cstate);
-    }
-
-    /* create workspace for CopyReadAttributes results */
-    if (!cstate->opts.binary)
-    {
-        AttrNumber    attr_count = list_length(cstate->attnumlist);
-
-        cstate->max_fields = attr_count;
-        cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
-    }
+    cstate->opts.from_routine->CopyFromStart(cstate, tupDesc);
 
     MemoryContextSwitchTo(oldcontext);
 
@@ -1789,6 +1906,8 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+    cstate->opts.from_routine->CopyFromEnd(cstate);
+
     /* No COPY FROM related resources except memory. */
     if (cstate->is_program)
     {
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7cacd0b752..658d2429a9 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -740,8 +740,19 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
     return copied_bytes;
 }
 
+typedef int (*CopyReadAttributes) (CopyFromState cstate);
+
 /*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text or csv
+ * mode. CopyReadAttributesText() must be used for text mode and
+ * CopyReadAttributesCSV() for csv mode. This inconvenient is for
+ * optimization. If "if (cstate->opts.csv_mode)" branch is removed, there is
+ * performance merit for COPY FROM with many tuples.
+ *
+ * NextCopyFromRawFields() can be used instead for convenience
+ * use. NextCopyFromRawFields() chooses CopyReadAttributesText() or
+ * CopyReadAttributesCSV() internally.
+ *
  * Return false if no more lines.
  *
  * An internal temporary buffer is returned via 'fields'. It is valid until
@@ -751,8 +762,8 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
  *
  * NOTE: force_not_null option are not applied to the returned fields.
  */
-bool
-NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+static inline bool
+NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields, CopyReadAttributes
copy_read_attributes)
 {
     int            fldct;
     bool        done;
@@ -775,11 +786,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
         {
             int            fldnum;
 
-            if (cstate->opts.csv_mode)
-                fldct = CopyReadAttributesCSV(cstate);
-            else
-                fldct = CopyReadAttributesText(cstate);
-
+            fldct = copy_read_attributes(cstate);
             if (fldct != list_length(cstate->attnumlist))
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -830,16 +837,240 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
         return false;
 
     /* Parse the line into de-escaped field values */
-    if (cstate->opts.csv_mode)
-        fldct = CopyReadAttributesCSV(cstate);
-    else
-        fldct = CopyReadAttributesText(cstate);
+    fldct = copy_read_attributes(cstate);
 
     *fields = cstate->raw_fields;
     *nfields = fldct;
     return true;
 }
 
+/*
+ * See NextCopyFromRawFieldsInternal() for details.
+ */
+bool
+NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+{
+    if (cstate->opts.csv_mode)
+        return NextCopyFromRawFieldsInternal(cstate, fields, nfields, CopyReadAttributesCSV);
+    else
+        return NextCopyFromRawFieldsInternal(cstate, fields, nfields, CopyReadAttributesText);
+}
+
+typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, int m);
+
+static inline char *
+PostpareColumnValueText(CopyFromState cstate, char *string, int m)
+{
+    /* do nothing */
+    return string;
+}
+
+static inline char *
+PostpareColumnValueCSV(CopyFromState cstate, char *string, int m)
+{
+    if (string == NULL &&
+        cstate->opts.force_notnull_flags[m])
+    {
+        /*
+         * FORCE_NOT_NULL option is set and column is NULL - convert it to the
+         * NULL string.
+         */
+        string = cstate->opts.null_print;
+    }
+    else if (string != NULL && cstate->opts.force_null_flags[m]
+             && strcmp(string, cstate->opts.null_print) == 0)
+    {
+        /*
+         * FORCE_NULL option is set and column matches the NULL string. It
+         * must have been quoted, or otherwise the string would already have
+         * been set to NULL. Convert it to NULL as specified.
+         */
+        string = NULL;
+    }
+    return string;
+}
+
+/*
+ * We don't use this function as a callback directly. We define
+ * CopyFromTextOneRow() and CopyFromCSVOneRow() and use them instead. It's for
+ * eliminating a "if (cstate->opts.csv_mode)" branch. This callback is called
+ * per tuple. So this optimization will be valuable when many tuples are
+ * copied.
+ *
+ * cstate->in_functions must be initialized in CopyFromTextBasedStart().
+ */
+static inline bool
+CopyFromTextBasedOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls, CopyReadAttributes
copy_read_attributes,PostpareColumnValue postpare_column_value)
 
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    ExprState **defexprs = cstate->defexprs;
+    char      **field_strings;
+    ListCell   *cur;
+    int            fldct;
+    int            fieldno;
+    char       *string;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    /* read raw fields in the next line */
+    if (!NextCopyFromRawFieldsInternal(cstate, &field_strings, &fldct, copy_read_attributes))
+        return false;
+
+    /* check for overflowing fields */
+    if (attr_count > 0 && fldct > attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("extra data after last expected column")));
+
+    fieldno = 0;
+
+    /* Loop to read the user attributes on the line. */
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        if (fieldno >= fldct)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("missing data for column \"%s\"",
+                            NameStr(att->attname))));
+        string = field_strings[fieldno++];
+
+        if (cstate->convert_select_flags &&
+            !cstate->convert_select_flags[m])
+        {
+            /* ignore input field, leaving column as NULL */
+            continue;
+        }
+
+        cstate->cur_attname = NameStr(att->attname);
+        cstate->cur_attval = string;
+
+        string = postpare_column_value(cstate, string, m);
+
+        if (string != NULL)
+            nulls[m] = false;
+
+        if (cstate->defaults[m])
+        {
+            /*
+             * The caller must supply econtext and have switched into the
+             * per-tuple memory context in it.
+             */
+            Assert(econtext != NULL);
+            Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
+
+            values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
+        }
+
+        /*
+         * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+         */
+        else if (!InputFunctionCallSafe(&in_functions[m],
+                                        string,
+                                        typioparams[m],
+                                        att->atttypmod,
+                                        (Node *) cstate->escontext,
+                                        &values[m]))
+        {
+            cstate->num_errors++;
+            return true;
+        }
+
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+    }
+
+    Assert(fieldno == attr_count);
+
+    return true;
+}
+
+bool
+CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    return CopyFromTextBasedOneRow(cstate, econtext, values, nulls, CopyReadAttributesText, PostpareColumnValueText);
+}
+
+bool
+CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    return CopyFromTextBasedOneRow(cstate, econtext, values, nulls, CopyReadAttributesCSV, PostpareColumnValueCSV);
+}
+
+/*
+ * cstate->in_functions must be initialized in CopyFromBinaryStart().
+ */
+bool
+CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    int16        fld_count;
+    ListCell   *cur;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    cstate->cur_lineno++;
+
+    if (!CopyGetInt16(cstate, &fld_count))
+    {
+        /* EOF detected (end of file, or protocol-level EOF) */
+        return false;
+    }
+
+    if (fld_count == -1)
+    {
+        /*
+         * Received EOF marker.  Wait for the protocol-level EOF, and complain
+         * if it doesn't come immediately.  In COPY FROM STDIN, this ensures
+         * that we correctly handle CopyFail, if client chooses to send that
+         * now.  When copying from file, we could ignore the rest of the file
+         * like in text mode, but we choose to be consistent with the COPY
+         * FROM STDIN case.
+         */
+        char        dummy;
+
+        if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("received copy data after EOF marker")));
+        return false;
+    }
+
+    if (fld_count != attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("row field count is %d, expected %d",
+                        (int) fld_count, attr_count)));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        cstate->cur_attname = NameStr(att->attname);
+        values[m] = CopyReadBinaryAttribute(cstate,
+                                            &in_functions[m],
+                                            typioparams[m],
+                                            att->atttypmod,
+                                            &nulls[m]);
+        cstate->cur_attname = NULL;
+    }
+
+    return true;
+}
+
 /*
  * Read next tuple from file for COPY FROM. Return false if no more tuples.
  *
@@ -857,181 +1088,22 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 {
     TupleDesc    tupDesc;
     AttrNumber    num_phys_attrs,
-                attr_count,
                 num_defaults = cstate->num_defaults;
-    FmgrInfo   *in_functions = cstate->in_functions;
-    Oid           *typioparams = cstate->typioparams;
     int            i;
     int           *defmap = cstate->defmap;
     ExprState **defexprs = cstate->defexprs;
 
     tupDesc = RelationGetDescr(cstate->rel);
     num_phys_attrs = tupDesc->natts;
-    attr_count = list_length(cstate->attnumlist);
 
     /* Initialize all values for row to NULL */
     MemSet(values, 0, num_phys_attrs * sizeof(Datum));
     MemSet(nulls, true, num_phys_attrs * sizeof(bool));
     MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
-    if (!cstate->opts.binary)
-    {
-        char      **field_strings;
-        ListCell   *cur;
-        int            fldct;
-        int            fieldno;
-        char       *string;
-
-        /* read raw fields in the next line */
-        if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
-            return false;
-
-        /* check for overflowing fields */
-        if (attr_count > 0 && fldct > attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("extra data after last expected column")));
-
-        fieldno = 0;
-
-        /* Loop to read the user attributes on the line. */
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            if (fieldno >= fldct)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("missing data for column \"%s\"",
-                                NameStr(att->attname))));
-            string = field_strings[fieldno++];
-
-            if (cstate->convert_select_flags &&
-                !cstate->convert_select_flags[m])
-            {
-                /* ignore input field, leaving column as NULL */
-                continue;
-            }
-
-            if (cstate->opts.csv_mode)
-            {
-                if (string == NULL &&
-                    cstate->opts.force_notnull_flags[m])
-                {
-                    /*
-                     * FORCE_NOT_NULL option is set and column is NULL -
-                     * convert it to the NULL string.
-                     */
-                    string = cstate->opts.null_print;
-                }
-                else if (string != NULL && cstate->opts.force_null_flags[m]
-                         && strcmp(string, cstate->opts.null_print) == 0)
-                {
-                    /*
-                     * FORCE_NULL option is set and column matches the NULL
-                     * string. It must have been quoted, or otherwise the
-                     * string would already have been set to NULL. Convert it
-                     * to NULL as specified.
-                     */
-                    string = NULL;
-                }
-            }
-
-            cstate->cur_attname = NameStr(att->attname);
-            cstate->cur_attval = string;
-
-            if (string != NULL)
-                nulls[m] = false;
-
-            if (cstate->defaults[m])
-            {
-                /*
-                 * The caller must supply econtext and have switched into the
-                 * per-tuple memory context in it.
-                 */
-                Assert(econtext != NULL);
-                Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
-
-                values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
-            }
-
-            /*
-             * If ON_ERROR is specified with IGNORE, skip rows with soft
-             * errors
-             */
-            else if (!InputFunctionCallSafe(&in_functions[m],
-                                            string,
-                                            typioparams[m],
-                                            att->atttypmod,
-                                            (Node *) cstate->escontext,
-                                            &values[m]))
-            {
-                cstate->num_errors++;
-                return true;
-            }
-
-            cstate->cur_attname = NULL;
-            cstate->cur_attval = NULL;
-        }
-
-        Assert(fieldno == attr_count);
-    }
-    else
-    {
-        /* binary */
-        int16        fld_count;
-        ListCell   *cur;
-
-        cstate->cur_lineno++;
-
-        if (!CopyGetInt16(cstate, &fld_count))
-        {
-            /* EOF detected (end of file, or protocol-level EOF) */
-            return false;
-        }
-
-        if (fld_count == -1)
-        {
-            /*
-             * Received EOF marker.  Wait for the protocol-level EOF, and
-             * complain if it doesn't come immediately.  In COPY FROM STDIN,
-             * this ensures that we correctly handle CopyFail, if client
-             * chooses to send that now.  When copying from file, we could
-             * ignore the rest of the file like in text mode, but we choose to
-             * be consistent with the COPY FROM STDIN case.
-             */
-            char        dummy;
-
-            if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("received copy data after EOF marker")));
-            return false;
-        }
-
-        if (fld_count != attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("row field count is %d, expected %d",
-                            (int) fld_count, attr_count)));
-
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            cstate->cur_attname = NameStr(att->attname);
-            values[m] = CopyReadBinaryAttribute(cstate,
-                                                &in_functions[m],
-                                                typioparams[m],
-                                                att->atttypmod,
-                                                &nulls[m]);
-            cstate->cur_attname = NULL;
-        }
-    }
+    if (!cstate->opts.from_routine->CopyFromOneRow(cstate, econtext, values,
+                                                   nulls))
+        return false;
 
     /*
      * Now compute and insert any defaults available for the columns not
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 18486a3715..799219c9ae 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -75,12 +75,11 @@ typedef struct CopyFormatOptions
     bool        convert_selectively;    /* do selective binary conversion? */
     CopyOnErrorChoice on_error; /* what to do when error happened */
     List       *convert_select; /* list of column names (can be NIL) */
+    const        CopyFromRoutine *from_routine;    /* callback routines for COPY
+                                                 * FROM */
     const        CopyToRoutine *to_routine;    /* callback routines for COPY TO */
 } CopyFormatOptions;
 
-/* This is private in commands/copyfrom.c */
-typedef struct CopyFromStateData *CopyFromState;
-
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
 typedef void (*copy_data_dest_cb) (void *data, int len);
 
@@ -89,6 +88,7 @@ extern void DoCopy(ParseState *pstate, const CopyStmt *stmt,
                    uint64 *processed);
 
 extern void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, List *options);
+extern void ProcessCopyOptionFormatFrom(ParseState *pstate, CopyFormatOptions *opts_out, const char *format);
 extern void ProcessCopyOptionFormatTo(ParseState *pstate, CopyFormatOptions *opts_out, const char *format);
 extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *whereClause,
                                    const char *filename,
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 2f9ecd0e2b..38406a8447 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -15,6 +15,26 @@
 #define COPYAPI_H
 
 #include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/* This is private in commands/copyfrom.c */
+typedef struct CopyFromStateData *CopyFromState;
+
+/* Routines for a COPY FROM format implementation. */
+typedef struct CopyFromRoutine
+{
+    /*
+     * Called when COPY FROM is started. This will initialize something and
+     * receive a header.
+     */
+    void        (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc);
+
+    /* Copy one row. It returns false if no more tuples. */
+    bool        (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls);
+
+    /* Called when COPY FROM is ended. This will finalize something. */
+    void        (*CopyFromEnd) (CopyFromState cstate);
+}            CopyFromRoutine;
 
 /* This is private in commands/copyto.c */
 typedef struct CopyToStateData *CopyToState;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..096b55011e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -183,4 +183,8 @@ typedef struct CopyFromStateData
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls);
+extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls);
+
 #endif                            /* COPYFROM_INTERNAL_H */
-- 
2.43.0


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Smith
Дата:
Сообщение: Re: Synchronizing slots from primary to standby
Следующее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: Improve eviction algorithm in ReorderBuffer