Re: Make COPY format extendable: Extract COPY TO format implementations

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Make COPY format extendable: Extract COPY TO format implementations
Дата
Msg-id 20240209192705.5qdilvviq3py2voq@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: Make COPY format extendable: Extract COPY TO format implementations  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Make COPY format extendable: Extract COPY TO format implementations  (Michael Paquier <michael@paquier.xyz>)
Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
Список pgsql-hackers
Hi,

On 2024-02-09 13:21:34 +0900, Michael Paquier wrote:
> +static void
> +CopyFromTextInFunc(CopyFromState cstate, Oid atttypid,
> +                   FmgrInfo *finfo, Oid *typioparam)
> +{
> +    Oid            func_oid;
> +
> +    getTypeInputInfo(atttypid, &func_oid, typioparam);
> +    fmgr_info(func_oid, finfo);
> +}

FWIW, we should really change the copy code to initialize FunctionCallInfoData
instead of re-initializing that on every call, realy makes a difference
performance wise.


> +/*
> + * CopyFromTextStart
> + *
> + * Start of COPY FROM for text/CSV format.
> + */
> +static void
> +CopyFromTextStart(CopyFromState cstate, TupleDesc tupDesc)
> +{
> +    AttrNumber    attr_count;
> +
> +    /*
> +     * If encoding conversion is needed, we need another buffer to hold the
> +     * converted input data.  Otherwise, we can just point input_buf to the
> +     * same buffer as raw_buf.
> +     */
> +    if (cstate->need_transcoding)
> +    {
> +        cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
> +        cstate->input_buf_index = cstate->input_buf_len = 0;
> +    }
> +    else
> +        cstate->input_buf = cstate->raw_buf;
> +    cstate->input_reached_eof = false;
> +
> +    initStringInfo(&cstate->line_buf);

Seems kinda odd that we have a supposedly extensible API that then stores all
this stuff in the non-extensible CopyFromState.


> +    /* create workspace for CopyReadAttributes results */
> +    attr_count = list_length(cstate->attnumlist);
> +    cstate->max_fields = attr_count;

Why is this here? This seems like generic code, not text format specific.


> +    cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
> +    /* Set read attribute callback */
> +    if (cstate->opts.csv_mode)
> +        cstate->copy_read_attributes = CopyReadAttributesCSV;
> +    else
> +        cstate->copy_read_attributes = CopyReadAttributesText;
> +}

Isn't this precisely repeating the mistake of 2889fd23be56?

And, why is this done here? Shouldn't this decision have been made prior to
even calling CopyFromTextStart()?

> +/*
> + * CopyFromTextOneRow
> + *
> + * Copy one row to a set of `values` and `nulls` for the text and CSV
> + * formats.
> + */

I'm very doubtful it's a good idea to combine text and CSV here. They have
basically no shared parsing code, so what's the point in sending them through
one input routine?


> +bool
> +CopyFromTextOneRow(CopyFromState cstate,
> +                   ExprContext *econtext,
> +                   Datum *values,
> +                   bool *nulls)
> +{
> +    TupleDesc    tupDesc;
> +    AttrNumber    attr_count;
> +    FmgrInfo   *in_functions = cstate->in_functions;
> +    Oid           *typioparams = cstate->typioparams;
> +    ExprState **defexprs = cstate->defexprs;
> +    char      **field_strings;
> +    ListCell   *cur;
> +    int            fldct;
> +    int            fieldno;
> +    char       *string;
> +
> +    tupDesc = RelationGetDescr(cstate->rel);
> +    attr_count = list_length(cstate->attnumlist);
> +
> +    /* read raw fields in the next line */
> +    if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
> +        return false;
> +
> +    /* check for overflowing fields */
> +    if (attr_count > 0 && fldct > attr_count)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> +                 errmsg("extra data after last expected column")));

It bothers me that we look to be ending up with different error handling
across the various output formats, particularly if they're ending up in
extensions. That'll make it harder to evolve this code in the future.


> +    fieldno = 0;
> +
> +    /* Loop to read the user attributes on the line. */
> +    foreach(cur, cstate->attnumlist)
> +    {
> +        int            attnum = lfirst_int(cur);
> +        int            m = attnum - 1;
> +        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
> +
> +        if (fieldno >= fldct)
> +            ereport(ERROR,
> +                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> +                     errmsg("missing data for column \"%s\"",
> +                            NameStr(att->attname))));
> +        string = field_strings[fieldno++];
> +
> +        if (cstate->convert_select_flags &&
> +            !cstate->convert_select_flags[m])
> +        {
> +            /* ignore input field, leaving column as NULL */
> +            continue;
> +        }
> +
> +        cstate->cur_attname = NameStr(att->attname);
> +        cstate->cur_attval = string;
> +
> +        if (cstate->opts.csv_mode)
> +        {

More unfortunate intermingling of multiple formats in a single routine.


> +
> +        if (cstate->defaults[m])
> +        {
> +            /*
> +             * The caller must supply econtext and have switched into the
> +             * per-tuple memory context in it.
> +             */
> +            Assert(econtext != NULL);
> +            Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
> +
> +            values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
> +        }

I don't think it's good that we end up with this code in different copy
implementations.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mats Kindahl
Дата:
Сообщение: Re: glibc qsort() vulnerability
Следующее
От: Mats Kindahl
Дата:
Сообщение: Re: glibc qsort() vulnerability