Re: pg_upgrade failing for 200+ million Large Objects
От | Robins Tharakan |
---|---|
Тема | Re: pg_upgrade failing for 200+ million Large Objects |
Дата | |
Msg-id | CAEP4nAxbrDM5b4g-+pPskuLmZWphZiuGGV7qpB7aAZ08ZMZSjw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: pg_upgrade failing for 200+ million Large Objects (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On Thu, 28 Dec 2023 at 01:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robins Tharakan <tharakan@gmail.com> writes:
> Applying all 4 patches, I also see good performance improvement.
> With more Large Objects, although pg_dump improved significantly,
> pg_restore is now comfortably an order of magnitude faster.
Yeah. The key thing here is that pg_dump can only parallelize
the data transfer, while (with 0004) pg_restore can parallelize
large object creation and owner-setting as well as data transfer.
I don't see any simple way to improve that on the dump side,
but I'm not sure we need to. Zillions of empty objects is not
really the use case to worry about. I suspect that a more realistic
case with moderate amounts of data in the blobs would make pg_dump
look better.
Thanks for elaborating, and yes pg_dump times do reflect that
expectation.The first test involved a fixed number (32k) of
Large Objects (LOs) with varying sizes - I chose that number
intentionally since this was being tested on a 32vCPU instance
and the patch employs 1k batches.
We again see that pg_restore is an order of magnitude faster.
LO Size (bytes) restore-HEAD restore-patched improvement (Nx)
1 24.182 1.4 17x
10 24.741 1.5 17x
100 24.574 1.6 15x
1,000 25.314 1.7 15x
10,000 25.644 1.7 15x
100,000 50.046 4.3 12x
1,000,000 281.549 30.0 9x
pg_dump also sees improvements. Really small sized LOs
see a decent ~20% improvement which grows considerably as LOs
get bigger (beyond ~10-100kb).
LO Size (bytes) dump-HEAD dump-patched improvement (%)
1 12.9 10.7 18%
10 12.9 10.4 19%
100 12.8 10.3 20%
1,000 13.0 10.3 21%
10,000 14.2 10.3 27%
100,000 32.8 11.5 65%
1,000,000 211.8 23.6 89%
To test pg_restore scaling, 1 Million LOs (100kb each)
were created and pg_restore times tested for increasing
concurrency (on a 192vCPU instance). We see major speedup
upto -j64 and the best time was at -j96, after which
performance decreases slowly - see attached image.
Concurrency pg_restore-patched
384 75.87
352 75.63
320 72.11
288 70.05
256 70.98
224 66.98
192 63.04
160 61.37
128 58.82
96 58.55
64 60.46
32 77.29
16 115.51
8 203.48
4 366.33
Test details:
- Command used to generate SQL - create 1k LOs of 1kb each
- echo "SELECT lo_from_bytea(0, '\x` printf 'ff%.0s' {1..1000}`') FROM generate_series(1,1000);" > /tmp/tempdel
- Verify the LO size: select pg_column_size(lo_get(oid));
- Only GUC changed: max_connections=1000 (for the last test)
-
Robins Tharakan
Amazon Web Services
We again see that pg_restore is an order of magnitude faster.
LO Size (bytes) restore-HEAD restore-patched improvement (Nx)
1 24.182 1.4 17x
10 24.741 1.5 17x
100 24.574 1.6 15x
1,000 25.314 1.7 15x
10,000 25.644 1.7 15x
100,000 50.046 4.3 12x
1,000,000 281.549 30.0 9x
pg_dump also sees improvements. Really small sized LOs
see a decent ~20% improvement which grows considerably as LOs
get bigger (beyond ~10-100kb).
LO Size (bytes) dump-HEAD dump-patched improvement (%)
1 12.9 10.7 18%
10 12.9 10.4 19%
100 12.8 10.3 20%
1,000 13.0 10.3 21%
10,000 14.2 10.3 27%
100,000 32.8 11.5 65%
1,000,000 211.8 23.6 89%
To test pg_restore scaling, 1 Million LOs (100kb each)
were created and pg_restore times tested for increasing
concurrency (on a 192vCPU instance). We see major speedup
upto -j64 and the best time was at -j96, after which
performance decreases slowly - see attached image.
Concurrency pg_restore-patched
384 75.87
352 75.63
320 72.11
288 70.05
256 70.98
224 66.98
192 63.04
160 61.37
128 58.82
96 58.55
64 60.46
32 77.29
16 115.51
8 203.48
4 366.33
Test details:
- Command used to generate SQL - create 1k LOs of 1kb each
- echo "SELECT lo_from_bytea(0, '\x` printf 'ff%.0s' {1..1000}`') FROM generate_series(1,1000);" > /tmp/tempdel
- Verify the LO size: select pg_column_size(lo_get(oid));
- Only GUC changed: max_connections=1000 (for the last test)
-
Robins Tharakan
Amazon Web Services
Вложения
В списке pgsql-hackers по дате отправления: