Re: Use COPY for populating all pgbench tables

Поиск
Список
Период
Сортировка
От Gregory Smith
Тема Re: Use COPY for populating all pgbench tables
Дата
Msg-id CAHLJuCW4gapAYZMc2vuy9mL2jdhNj3SLL9ef9U+stD=8EXfV=g@mail.gmail.com
обсуждение исходный текст
Ответ на Use COPY for populating all pgbench tables  ("Tristan Partin" <tristan@neon.tech>)
Ответы Re: Use COPY for populating all pgbench tables  ("Tristan Partin" <tristan@neon.tech>)
Re: Use COPY for populating all pgbench tables  (Gurjeet Singh <gurjeet@singh.im>)
Список pgsql-hackers
On Tue, May 23, 2023 at 1:33 PM Tristan Partin <tristan@neon.tech> wrote:
We (Neon) have noticed that pgbench can be quite slow to populate data
in regard to higher latency connections. Higher scale factors exacerbate
this problem. Some employees work on a different continent than the
databases they might be benchmarking. By moving pgbench to use COPY for
populating all tables, we can reduce some of the time pgbench takes for
this particular step.

When latency is continent size high, pgbench should be run with server-side table generation instead of using COPY at all, for any table.  The default COPY based pgbench generation is only intended for testing where the client and server are very close on the network.

Unfortunately there's no simple command line option to change just that one thing about how pgbench runs.  You have to construct a command line that documents each and every step you want instead.  You probably just want this form:

    $ pgbench -i -I dtGvp -s 500

That's server-side table generation with all the usual steps.  I use this instead of COPY in pgbench-tools so much now, basically whenever I'm talking to a cloud system, that I have a simple 0/1 config option to switch between the modes, and this long weird one is the default now.

Try that out, and once you see the numbers my bet is you'll see extending which tables get COPY isn't needed by your use case anymore.  Basically, if you are close enough to use COPY instead of server-side generation, you are close enough that every table besides accounts will not add up to enough time to worry about optimizing the little ones.

--
Greg Smith  greg.smith@crunchydata.com
Director of Open Source Strategy
 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Gregory Smith
Дата:
Сообщение: Re: Major pgbench synthetic SELECT workload regression, Ubuntu 23.04+PG15
Следующее
От: Alexander Pyhalov
Дата:
Сообщение: Re: Partial aggregates pushdown