Re: Make COPY format extendable: Extract COPY TO format implementations

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Make COPY format extendable: Extract COPY TO format implementations
Дата
Msg-id 20240206014125.qofww7ew3dx3v3uk@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: Make COPY format extendable: Extract COPY TO format implementations  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Make COPY format extendable: Extract COPY TO format implementations  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
Hi,

On 2024-02-06 10:01:36 +0900, Michael Paquier wrote:
> On Mon, Feb 05, 2024 at 10:21:18AM -0800, Andres Freund wrote:
> > Have you benchmarked the performance effects of 2889fd23be5 ? I'd not at all
> > be surprised if it lead to a measurable performance regression.
>
> Yes, I was looking at runtimes and some profiles around CopyOneRowTo()
> to see the effects that this has yesterday.  The principal point of
> contention is CopyOneRowTo() where the callback is called once per
> attribute, so more attributes stress it more.

Right.


> If you have concerns about that, I'm OK to revert, I'm not wedded to
> this level of control.  Note that I've actually seen *better*
> runtimes.

I'm somewhat worried that handling the different formats at that level will
make it harder to improve copy performance - it's quite attrociously slow
right now. The more we reduce the per-row/field overhead, the more the
dispatch overhead will matter.



> [1]: https://www.postgresql.org/message-id/Zbr6piWuVHDtFFOl@paquier.xyz
>
> > I think callbacks for individual attributes is the wrong approach - the
> > dispatch needs to happen at a higher level, otherwise there are too many
> > indirect function calls.
>
> Hmm.  Do you have concerns about v13 posted on [2] then?

As is I'm indeed not a fan. It imo doesn't make sense to have an indirect
dispatch for *both* ->copy_attribute_out *and* ->CopyToOneRow. After all, when
in ->CopyToOneRow for text, we could know that we need to call
CopyAttributeOutText etc.


> If yes, then I'd assume that this shuts down the whole thread or that it
> needs a completely different approach, because we will multiply indirect
> function calls that can control how data is generated for each row, which is
> the original case that Sutou-san wanted to tackle.

I think it could be rescued fairly easily - remove the dispatch via
->copy_attribute_out().  To avoid duplicating code you could use a static
inline function that's used with constant arguments by both csv and text mode.

I think it might also be worth ensuring that future patches can move branches
like
    if (cstate->encoding_embeds_ascii)
    if (cstate->need_transcoding)
into the choice of per-row callback.


> The End, Start and In/OutFunc callbacks are called only once per query, so
> these don't matter AFAIU.

Right.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Make COPY format extendable: Extract COPY TO format implementations
Следующее
От: Andres Freund
Дата:
Сообщение: confusing / inefficient "need_transcoding" handling in copy