Re: Emitting JSON to file using COPY TO

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: Emitting JSON to file using COPY TO
Дата
Msg-id CAKFQuwZ2e6QanxpoUarMva0mp1UJvT9bOJRwhkZ8w_E=K-s8ww@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Emitting JSON to file using COPY TO  (Joe Conway <mail@joeconway.com>)
Ответы Re: Emitting JSON to file using COPY TO  (Joe Conway <mail@joeconway.com>)
Список pgsql-hackers
On Thursday, December 7, 2023, Joe Conway <mail@joeconway.com> wrote:
On 12/7/23 08:35, Daniel Verite wrote:
        Joe Conway wrote:

The attached should fix the CopyOut response to say one column. I.e. it ought to look something like:

Spending more time with the doc I came to the opinion that in this bit
of the protocol, in CopyOutResponse (B)
...
Int16
The number of columns in the data to be copied (denoted N below).
...

this number must be the number of columns in the source.
That is for COPY table(a,b,c)   the number is 3, independently
on whether the result is formatted in text, cvs, json or binary.

I think that changing it for json can reasonably be interpreted
as a protocol break and we should not do it.

The fact that this value does not help parsing the CopyData
messages that come next is not a new issue. A reader that
doesn't know the field separator and whether it's text or csv
cannot parse these messages into fields anyway.
But just knowing how much columns there are in the original
data might be useful by itself and we don't want to break that.

Ok, that sounds reasonable to me -- I will revert that change.

The other question for me is, in the CopyData message, this
bit:
" Messages sent from the backend will always correspond to single data rows"

ISTM that considering that the "[" starting the json array is a
"data row" is a stretch.
That might be interpreted as a protocol break, depending
on how strict the interpretation is.

If we really think that is a problem I can see about changing it to this format for json array:

8<------------------
copy
(
  with ss(f1, f2) as
  (
    select 1, g.i from generate_series(1, 3) g(i)
  )
  select ss from ss
) to stdout (format json, force_array);
[{"ss":{"f1":1,"f2":1}}
,{"ss":{"f1":1,"f2":2}}
,{"ss":{"f1":1,"f2":3}}]
8<------------------

Is this acceptable to everyone?

Or maybe this is preferred?
8<------------------
[{"ss":{"f1":1,"f2":1}},
 {"ss":{"f1":1,"f2":2}},
 {"ss":{"f1":1,"f2":3}}]
8<------------------

Or as long as we are painting the shed, maybe this?
8<------------------
[{"ss":{"f1":1,"f2":1}},
{"ss":{"f1":1,"f2":2}},
{"ss":{"f1":1,"f2":3}}]
8<------------------

Those are all the same breakage though - if truly interpreted as data rows the protocol is basically written such that the array format is not supportable and only the lines format can be used.  Hence my “format 0 doesn’t work” comment for array output and we should explicitly add format 2 where we explicitly decouple lines of output from rows of data.  That said, it would seem in practice format 0 already decouples them and so the current choice of the brackets on their own lines is acceptable.

I’d prefer to keep them on their own line.

I also don’t know why you introduced another level of object nesting here.  That seems quite undesirable.

David J.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Daniel Gustafsson
Дата:
Сообщение: Re: initdb caching during tests
Следующее
От: Aleksander Alekseev
Дата:
Сообщение: Re: Proposal to add page headers to SLRU pages