Re: Make COPY format extendable: Extract COPY TO format implementations
От | Andres Freund |
---|---|
Тема | Re: Make COPY format extendable: Extract COPY TO format implementations |
Дата | |
Msg-id | 20240209192705.5qdilvviq3py2voq@awork3.anarazel.de обсуждение исходный текст |
Ответ на | Re: Make COPY format extendable: Extract COPY TO format implementations (Michael Paquier <michael@paquier.xyz>) |
Ответы |
Re: Make COPY format extendable: Extract COPY TO format implementations
(Michael Paquier <michael@paquier.xyz>)
Re: Make COPY format extendable: Extract COPY TO format implementations (Sutou Kouhei <kou@clear-code.com>) |
Список | pgsql-hackers |
Hi, On 2024-02-09 13:21:34 +0900, Michael Paquier wrote: > +static void > +CopyFromTextInFunc(CopyFromState cstate, Oid atttypid, > + FmgrInfo *finfo, Oid *typioparam) > +{ > + Oid func_oid; > + > + getTypeInputInfo(atttypid, &func_oid, typioparam); > + fmgr_info(func_oid, finfo); > +} FWIW, we should really change the copy code to initialize FunctionCallInfoData instead of re-initializing that on every call, realy makes a difference performance wise. > +/* > + * CopyFromTextStart > + * > + * Start of COPY FROM for text/CSV format. > + */ > +static void > +CopyFromTextStart(CopyFromState cstate, TupleDesc tupDesc) > +{ > + AttrNumber attr_count; > + > + /* > + * If encoding conversion is needed, we need another buffer to hold the > + * converted input data. Otherwise, we can just point input_buf to the > + * same buffer as raw_buf. > + */ > + if (cstate->need_transcoding) > + { > + cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); > + cstate->input_buf_index = cstate->input_buf_len = 0; > + } > + else > + cstate->input_buf = cstate->raw_buf; > + cstate->input_reached_eof = false; > + > + initStringInfo(&cstate->line_buf); Seems kinda odd that we have a supposedly extensible API that then stores all this stuff in the non-extensible CopyFromState. > + /* create workspace for CopyReadAttributes results */ > + attr_count = list_length(cstate->attnumlist); > + cstate->max_fields = attr_count; Why is this here? This seems like generic code, not text format specific. > + cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); > + /* Set read attribute callback */ > + if (cstate->opts.csv_mode) > + cstate->copy_read_attributes = CopyReadAttributesCSV; > + else > + cstate->copy_read_attributes = CopyReadAttributesText; > +} Isn't this precisely repeating the mistake of 2889fd23be56? And, why is this done here? Shouldn't this decision have been made prior to even calling CopyFromTextStart()? > +/* > + * CopyFromTextOneRow > + * > + * Copy one row to a set of `values` and `nulls` for the text and CSV > + * formats. > + */ I'm very doubtful it's a good idea to combine text and CSV here. They have basically no shared parsing code, so what's the point in sending them through one input routine? > +bool > +CopyFromTextOneRow(CopyFromState cstate, > + ExprContext *econtext, > + Datum *values, > + bool *nulls) > +{ > + TupleDesc tupDesc; > + AttrNumber attr_count; > + FmgrInfo *in_functions = cstate->in_functions; > + Oid *typioparams = cstate->typioparams; > + ExprState **defexprs = cstate->defexprs; > + char **field_strings; > + ListCell *cur; > + int fldct; > + int fieldno; > + char *string; > + > + tupDesc = RelationGetDescr(cstate->rel); > + attr_count = list_length(cstate->attnumlist); > + > + /* read raw fields in the next line */ > + if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) > + return false; > + > + /* check for overflowing fields */ > + if (attr_count > 0 && fldct > attr_count) > + ereport(ERROR, > + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), > + errmsg("extra data after last expected column"))); It bothers me that we look to be ending up with different error handling across the various output formats, particularly if they're ending up in extensions. That'll make it harder to evolve this code in the future. > + fieldno = 0; > + > + /* Loop to read the user attributes on the line. */ > + foreach(cur, cstate->attnumlist) > + { > + int attnum = lfirst_int(cur); > + int m = attnum - 1; > + Form_pg_attribute att = TupleDescAttr(tupDesc, m); > + > + if (fieldno >= fldct) > + ereport(ERROR, > + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), > + errmsg("missing data for column \"%s\"", > + NameStr(att->attname)))); > + string = field_strings[fieldno++]; > + > + if (cstate->convert_select_flags && > + !cstate->convert_select_flags[m]) > + { > + /* ignore input field, leaving column as NULL */ > + continue; > + } > + > + cstate->cur_attname = NameStr(att->attname); > + cstate->cur_attval = string; > + > + if (cstate->opts.csv_mode) > + { More unfortunate intermingling of multiple formats in a single routine. > + > + if (cstate->defaults[m]) > + { > + /* > + * The caller must supply econtext and have switched into the > + * per-tuple memory context in it. > + */ > + Assert(econtext != NULL); > + Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); > + > + values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); > + } I don't think it's good that we end up with this code in different copy implementations. Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: