Re: Emitting JSON to file using COPY TO

Поиск
Список
Период
Сортировка
От Sehrope Sarkuni
Тема Re: Emitting JSON to file using COPY TO
Дата
Msg-id CAH7T-arYm4FF2pPWDwz_6Fnp8Jj2vMNOfEd2B54HkFOPNX_kGw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Emitting JSON to file using COPY TO  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-hackers
On Wed, Dec 6, 2023 at 4:03 PM Andrew Dunstan <andrew@dunslane.net> wrote:
> The output size difference does say that maybe we should pay some
> attention to the nearby request to not always label every field.
> Perhaps there should be an option for each row to transform to
> a JSON array rather than an object?

I doubt it. People who want this are likely to want pretty much what
this patch is providing, not something they would have to transform in
order to get it. If they want space-efficient data they won't really be
wanting JSON. Maybe they want Protocol Buffers or something in that vein.

For arrays v.s. objects, it's not just about data size. There are plenty of situations where a JSON array is superior to an object (e.g. duplicate column names). Lines of JSON arrays of strings is pretty much CSV with JSON escaping rules and a pair of wrapping brackets. It's common for tabular data in node.js environments as you don't need a separate CSV parser.

Each one has its place and a default of the row_to_json(...) representation of the row still makes sense. But if the user has the option of outputting a single json/jsonb field for each row without an object or array wrapper, then it's possible to support all of these use cases as the user can explicitly pick whatever envelope makes sense:

-- Lines of JSON arrays:
COPY (SELECT json_build_array('test-' || a, b) FROM generate_series(1, 3) a, generate_series(5,6) b) TO STDOUT WITH (FORMAT JSON, SOME_OPTION_TO_DISABLE_ENVELOPE);
["test-1", 5]
["test-2", 5]
["test-3", 5]
["test-1", 6]
["test-2", 6]
["test-3", 6]

-- Lines of JSON strings:
COPY (SELECT to_json('test-' || x) FROM generate_series(1, 5) x) TO STDOUT WITH (FORMAT JSON, SOME_OPTION_TO_DISABLE_ENVELOPE);
"test-1"
"test-2"
"test-3"
"test-4"
"test-5"

I'm not sure how I feel about the behavior being automatic if it's a single top level json / jsonb field rather than requiring the explicit option. It's probably what a user would want but it also feels odd to change the output wrapper automatically based on the fields in the response. If it is automatic and the user wants the additional envelope, the option always exists to wrap it further in another: json_build_object('some_field", my_field_i_want_wrapped)

The duplicate field names would be a good test case too. I haven't gone through this patch but I'm guessing it doesn't filter out duplicates so the behavior would match up with row_to_json(...), i.e. duplicates are preserved:

=> SELECT row_to_json(t.*) FROM (SELECT 1 AS a, 2 AS a) t;
  row_to_json  
---------------
 {"a":1,"a":2}

If so, that's a good test case to add as however that's handled should be deterministic.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joe Conway
Дата:
Сообщение: Re: Emitting JSON to file using COPY TO
Следующее
От: Nathan Bossart
Дата:
Сообщение: Re: Emitting JSON to file using COPY TO