Re: Make COPY format extendable: Extract COPY TO format implementations
| От | Sutou Kouhei |
|---|---|
| Тема | Re: Make COPY format extendable: Extract COPY TO format implementations |
| Дата | |
| Msg-id | 20251117.154013.2019068769091256791.kou@clear-code.com обсуждение исходный текст |
| Ответ на | Re: Make COPY format extendable: Extract COPY TO format implementations (Masahiko Sawada <sawada.mshk@gmail.com>) |
| Список | pgsql-hackers |
Hi,
In <CAD21AoDCEfe0PQhMEx8G1rpS7RrzGCJPobeqm3Mpn2bgbUH9nQ@mail.gmail.com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 14 Nov 2025 12:19:47 -0800,
Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> This thread has involved extensive discussion, and the patch needs to
> be rebased. I'd like to summarize the current status of this patch and
> our discussions. I've attached updated patches that implement the
> whole ideas of this feature to help provide a clearer overall picture.
Thanks!
I'm not sure whether we should include option parsing
feature to this patch's scope or not but I'm OK with this
approach.
Here are my review comments but they are minor
comments. They don't require design change.
0001:
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..6a0a66507ba 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -155,6 +120,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
/* text format */
static const CopyToRoutine CopyToRoutineText = {
+ .CopyToEstimateStateSpace = CopyToEstimateStateTextLike,
How about including "Space"
(CopyToEstimateStateSpaceTextLike)?
How about renaming this to
"CopyToTextLikeEstimateStateSpace" because other functions
use "CopyToTextLikeXXX" style.
@@ -171,62 +138,77 @@ static const CopyToRoutine CopyToRoutineCSV = {
...
/* Implementation of the start callback for text and CSV formats */
static void
-CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
+CopyToTextLikeStart(CopyToState ccstate, TupleDesc tupDesc)
{
+ CopyToStateTextLike *cstate = (CopyToStateTextLike *) ccstate;
How about always using "cstate" for "CopyToState"? In other
functions in this patch, "CopyToState" is referred "cstate"
or "cctate".
We can use "state" or something for
"CopyToStateTextLike". (0002 uses "state" for
"CopyFromStateBuiltins".)
+ cstate->base.opts.null_print_client = pg_server_to_any(cstate->base.opts.null_print,
+ cstate->base.opts.null_print_len,
+ cstate->file_encoding);
We can use "ccstate->" instead of "cstate->base." in this
function.
@@ -614,6 +603,54 @@ EndCopy(CopyToState cstate)
pfree(cstate);
}
+/*
+ * Allocate COPY TO state data based on the format's EsimateStateSpace
+ * callback.
+ */
+static CopyToState
+create_copyto_state(ParseState *pstate, List *options)
"Esimate" ->
"Estimate"
How about using CamelCase like "CreateCopyToState" for
function name like other functions in this file?
+ Size req_size;
"state_size" may be better.
@@ -1233,16 +1246,16 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
...
static void
-CopyAttributeOutText(CopyToState cstate, const char *string)
+CopyAttributeOutText(CopyToStateTextLike * cstate, const char *string)
{
CopyAttributeOutText(CopyToState cstate, const char *string)
{
CopyToStateTextLike *state = (CopyToStateTextLike *)cstate;
may reduce "cstate->base." and "(CopyToState) cstate" in
this function.
BTW, could you remove a needless space after "*" in
"CopyToStateTextLike * cstate"?
0002:
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 12781963b4f..0c51e5ba5e1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -129,6 +131,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate);
/* text format */
static const CopyFromRoutine CopyFromRoutineText = {
+ .CopyFromEstimateStateSpace = CopyFromBuiltinsEstimateSpace,
How about including "State"
("CopyFromBuiltinsEstimateStateSpace")?
@@ -145,54 +149,129 @@ static const CopyFromRoutine CopyFromRoutineCSV = {
...
+/*
+ * Common routine to initialize CopyFromStateBuiltins data.
+ */
+static void
+initialize_copyfrom_bultins_state(CopyFromStateBuiltins * state, TupleDesc tupDesc)
* How about using CamelCase like other functions in this file?
* How about using the same name as the struct?
InitializeCopyFromStateBuiltins?
@@ -1379,8 +1462,8 @@ CopyFrom(CopyFromState cstate)
/* Add this tuple to the tuple buffer */
CopyMultiInsertInfoStore(&multiInsertInfo,
resultRelInfo, myslot,
- cstate->line_buf.len,
- cstate->cur_lineno);
+ rowinfo.tuplen,
+ rowinfo.lineno);
How about passing "CopyFromRowInfo *" instead of
"rowinfo.tuplen" and "rowinfo.lineno"?
@@ -1512,6 +1595,50 @@ CopyFrom(CopyFromState cstate)
return processed;
}
+static CopyFromState
+create_copyfrom_state(ParseState *pstate, List *options)
How about using CamelCase like other functions in this file?
CreateCopyFromState?
@@ -1138,19 +1156,29 @@ CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
...
+ if (rowinfo)
+ {
+ /*
+ * XXX: We used to use line_buf.len but we don't actually use line_buf
+ * in binary format.
+ */
+ rowinfo->lineno = cstate->base.cur_lineno;
+ rowinfo->tuplen = cstate->line_buf.len;
}
How about always setting "0" to "rowinfo->tuplen" instead of
using "cstate->line_buf.len"?
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index d75a70715a4..30a1d2bff6e 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -48,6 +48,12 @@ typedef enum CopyLogVerbosityChoice
...
+typedef struct CopyFromRowInfo
+{
+ uint64 lineno;
+ int tuplen;
+} CopyFromRowInfo;
I don't have a strong opinion nor alternative but I'm not
sure whether this name is suitable or not...
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 3b9982d54b8..c3d2199a0b6 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -99,12 +104,11 @@ typedef struct CopyFromRoutine
* Returns false if there are no more tuples to read.
*/
bool (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext,
- Datum *values, bool *nulls);
+ Datum *values, bool *nulls, CopyFromRowInfo * rowinfo);
Can we add some docstrings for "rowinfo"?
diff --git a/src/include/commands/copystate.h b/src/include/commands/copystate.h
index 7561083a323..145dccd0f8f 100644
--- a/src/include/commands/copystate.h
+++ b/src/include/commands/copystate.h
@@ -65,4 +67,99 @@ typedef struct CopyToStateData
...
+/*
+ * Represents the different source cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopySource
+{
+ COPY_FILE, /* from file (or a piped program) */
+ COPY_FRONTEND, /* from frontend */
+ COPY_CALLBACK, /* from callback function */
+} CopySource;
Can we use "COPY_SOURCE_" prefix instead of "COPY_" prefix
such as "COPY_SOURCE_FILE"?
0003:
diff --git a/src/backend/commands/copy_custom_format.c b/src/backend/commands/copy_custom_format.c
new file mode 100644
index 00000000000..8bef6e779ac
--- /dev/null
+++ b/src/backend/commands/copy_custom_format.c
+/*
+ * Returns true if the given custom format name is registered.
+ */
+bool
+FindCustomCopyFormat(const char *fmt_name)
+{
+ for (int i = 0; i < CopyCustomFormatAssigned; i++)
+ {
+ if (strcmp(CopyCustomFormatArray[i].fmt_name, fmt_name) == 0)
+ return true;
+ }
+
+ return false;
+}
How about using other word than "Find" here? I expect that
"FindXXX()" returns a found value instead of "whether found
or not" as bool.
CustomCopyFormatExists()?
+bool
+GetCustomCopyToRoutine(const char *fmt_name, const CopyToRoutine **to_routine)
+{
+ for (int i = 0; i < CopyCustomFormatAssigned; i++)
+ {
+ if (strcmp(CopyCustomFormatArray[i].fmt_name, fmt_name) == 0)
+ {
+ *to_routine = CopyCustomFormatArray[i].to_routine;
+ return true;
+ }
+ }
+
+ return false;
+}
How about returning "const CopyToRoutine *" instead of
"bool"? We can use "FindCustomCopyFormat()" whether the
given format name exists or not.
+bool
+GetCustomCopyFromRoutine(const char *fmt_name, const CopyFromRoutine **from_routine)
Ditto.
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 30a1d2bff6e..82f07b05823 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -120,6 +120,12 @@ extern uint64 CopyFrom(CopyFromState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ProcessCopyBuiltinOptions(List *options, CopyFormatOptions *opts_out,
+ bool is_from, List **other_options, ParseState *pstate);
It seems that this change should exist in 0004.
0004:
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index b1b3ae141eb..ea31fa911f9 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1570,3 +1571,32 @@ CreateCopyDestReceiver(void)
return (DestReceiver *) self;
}
+
+static void
+ProcessCopyToOptions(CopyToState cstate, List *options, ParseState *pstate)
+{
+ bool temp_state = false;
+ List *other_options = NIL;
+ CopyFormatOptions *opts;
+
+ if (cstate == NULL)
+ {
+ cstate = create_copyto_state(pstate, options);
+ temp_state = true;
+ }
+
+ opts = &cstate->opts;
+
+ ProcessCopyBuiltinOptions(options, opts, false, &other_options, pstate);
+
+ foreach_node(DefElem, option, options)
options ->
other_options
(This change exists in 0005.)
0006:
diff --git a/src/test/modules/test_copy_custom_format/test_copy_custom_format.c
b/src/test/modules/test_copy_custom_format/test_copy_custom_format.c
new file mode 100644
index 00000000000..a63390e875b
--- /dev/null
+++ b/src/test/modules/test_copy_custom_format/test_copy_custom_format.c
+static Size
+TestCopyToEsimateSpace(void)
+{
+ return sizeof(TestCopyToState);
+}
+
+static Size
+TestCopyFromEsimateSpace(void)
+{
+ return sizeof(TestCopyFromState);
+}
EsimateSpace ->
EstimateStateSpace
+ if (strcmp(option->defname, "common_int") == 0)
+ {
+ int val = defGetInt32(option);
+
+ opt->common_int = val;
+
+ return true;
+ }
+ else if (strcmp(option->defname, "common_bool") == 0)
+ {
+ bool val = defGetBoolean(option);
+
+ opt->common_bool = val;
+
+ return true;
We may not need to use local variables:
opt->common_int = defGetInt32(option);
opt->common_bool = defGetBoolean(option);
> One potential improvement would be adding support for random file
> access in COPY FROM operations. For example, with parquet files, it
> would be much more efficient to read the footer section first since it
> contains metadata, allowing selective reading of necessary file
> sections. The current sequential read API (CopyFromGetData()) requires
> reading all data to access the metadata.
This is outside the scope of this patch but I've created
a custom COPY format implementation for Apache Parquet:
https://github.com/kou/pg-copy-parquet
We need to start parsing from footer as mentioned above. So
the implementation reads all data before it starts parsing:
https://github.com/kou/pg-copy-parquet/blob/7da367ea81d8964f5045fe0b1514a798d4ecbbc7/copy_parquet.cc#L410-L434
If we have random access API, we don't need to read all
data. It improves performance.
Anyway, this is outside the scope of this patch. We can
discuss this in a separated thread after we merge this
patch.
Thanks,
--
kou
В списке pgsql-hackers по дате отправления: