Re: Speeding up COPY TO for uuids and arrays

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Speeding up COPY TO for uuids and arrays
Дата
Msg-id ZdwhaBuEcV6P7G1Q@paquier.xyz
обсуждение исходный текст
Ответ на re: Speeding up COPY TO for uuids and arrays  (Ranier Vilela <ranier.vf@gmail.com>)
Ответы Re: Speeding up COPY TO for uuids and arrays  (Ranier Vilela <ranier.vf@gmail.com>)
Список pgsql-hackers
On Thu, Feb 22, 2024 at 04:42:37PM -0300, Ranier Vilela wrote:
> Can you share exactly script used to create a table?

Stressing the internals of array_out() for the area of the patch is
not that difficult, as we want to quote each element that's returned
in output.

The trick is to have the following to stress the second quoting loop a
maximum:
- a high number of rows.
- a high number of items in the arrays.
- a *minimum* number of characters in each element of the array, with
characters that require quoting.

The best test case I can think of to demonstrate the patch would be
something like that (adjust rows and elts as you see fit):
-- Number of rows
\set rows 6
-- Number of elements
\set elts 4
create table tab as
  with data as (
    select array_agg(a) as array
      from (
        select '{'::text
          from generate_series(1, :elts) as int(a)) as index(a))
  select data.array from data, generate_series(1,:rows);

Then I get:
       array
-------------------
 {"{","{","{","{"}
 {"{","{","{","{"}
 {"{","{","{","{"}
 {"{","{","{","{"}
 {"{","{","{","{"}
 {"{","{","{","{"}
(6 rows)

With "\set rows 100000" and "\set elts 10000", giving 100MB of data
with 100k rows with 10k elements each, I get for HEAD when data is in
shared buffers:
=# copy tab to '/dev/null';
COPY 100000
Time: 48620.927 ms (00:48.621)
And with v3:
=# copy tab to '/dev/null';
COPY 100000
Time: 47993.183 ms (00:47.993)

Profiles don't fundamentally change much, array_out() gets a 30.76% ->
29.72% in self runtime, with what looks like a limited impact to me.

With 1k rows and 1M elements, COPY TO gets reduced from 54338.436 ms
to 54129.978 ms, and a 29.51% -> 29.12% increase (looks like noise).

Perhaps I've hit some noise while running this set of tests, but the
impact of the proposed patch looks very limited to me.  If you have a
better set of tests and/or ideas, feel free of course.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: Can we include capturing logs of pgdata/pg_upgrade_output.d/*/log in buildfarm
Следующее
От: Andrei Lepikhov
Дата:
Сообщение: Re: "type with xxxx does not exist" when doing ExecMemoize()