Re: Make COPY format extendable: Extract COPY TO format implementations

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Make COPY format extendable: Extract COPY TO format implementations
Дата
Msg-id ZcGcQq3Y6FeX1z4e@paquier.xyz
обсуждение исходный текст
Ответ на Re: Make COPY format extendable: Extract COPY TO format implementations  (Andres Freund <andres@anarazel.de>)
Ответы Re: Make COPY format extendable: Extract COPY TO format implementations  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Mon, Feb 05, 2024 at 05:41:25PM -0800, Andres Freund wrote:
> On 2024-02-06 10:01:36 +0900, Michael Paquier wrote:
>> If you have concerns about that, I'm OK to revert, I'm not wedded to
>> this level of control.  Note that I've actually seen *better*
>> runtimes.
>
> I'm somewhat worried that handling the different formats at that level will
> make it harder to improve copy performance - it's quite attrociously slow
> right now. The more we reduce the per-row/field overhead, the more the
> dispatch overhead will matter.

Yep.  That's the hard part when it comes to design these callbacks.
We don't want something too high level because this leads to more code
duplication churns when someone wants to plug in its own routine set,
and we don't want to be at a too low level because of the indirect
calls as you said.  I'd like to think that the current CopyFromOneRow
offers a good balance here, avoiding the "if" branch with the binary
and non-binary paths.

>> Hmm.  Do you have concerns about v13 posted on [2] then?
>
> As is I'm indeed not a fan. It imo doesn't make sense to have an indirect
> dispatch for *both* ->copy_attribute_out *and* ->CopyToOneRow. After all, when
> in ->CopyToOneRow for text, we could know that we need to call
> CopyAttributeOutText etc.

Right.

>> If yes, then I'd assume that this shuts down the whole thread or that it
>> needs a completely different approach, because we will multiply indirect
>> function calls that can control how data is generated for each row, which is
>> the original case that Sutou-san wanted to tackle.
>
> I think it could be rescued fairly easily - remove the dispatch via
> ->copy_attribute_out().  To avoid duplicating code you could use a static
> inline function that's used with constant arguments by both csv and text mode.

Hmm.  So you basically mean to tweak the beginning of
CopyToTextOneRow() and CopyToTextStart() so as copy_attribute_out is
saved in a local variable outside of cstate and we'd save the "if"
checked for each attribute.  If I got that right, it would mean
something like the v13-0002 attached, on top of the v13-0001 of
upthread.  Is that what you meant?

> I think it might also be worth ensuring that future patches can move branches
> like
>     if (cstate->encoding_embeds_ascii)
>     if (cstate->need_transcoding)
> into the choice of per-row callback.

Yeah, I'm still not sure how much we should split CopyToStateData in
the initial patch set.  I'd like to think that the best result would
be to have in the state data an opaque (void *) that points to a
structure that can be set for each format, so as there is a clean
split between which variable gets set and used where (same remark
applies to COPY FROM with its raw_fields, raw_fields, for example).
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: confusing / inefficient "need_transcoding" handling in copy
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: [PoC] Improve dead tuple storage for lazy vacuum