Обсуждение: pg_dump from v13 is slow

Поиск
Список
Период
Сортировка

pg_dump from v13 is slow

От
Justin Pryzby
Дата:
I ran into this while running pg_upgrade from beta2 to beta3:
$ time sudo -u postgres sh -c 'cd /var/lib/pgsql; /usr/pgsql-13/bin/pg_upgrade -b /usr/pgsql-13b2/bin/ -d ./13b2/data
-D./13/data --link'
 
    real    94m18.335s

This instances has many table partitions, and the production instance uses
tablespaces.  Some of our tables are wide.  This VM is not idle, but does not
account for being 20x slower.

pg_dump -v --section=pre-data ts |wc
    1846659 4697507 59575253
    real    39m8.524s

Compare v12 and v13:

|$ /usr/pgsql-12/bin/initdb -D 12
|$ /usr/pgsql-12/bin/postgres -D 12 -c shared_buffers=256MB -c max_locks_per_transaction=128 -c port=5678 -c
unix_socket_directories=/tmp&
|$ psql -h /tmp -p 5678 postgres </srv/cdrperfbackup/ts/2020-08-10/pg_dumpall-g 
|$ time pg_restore /srv/cdrperfbackup/ts/2020-08-10/pg_dump-section\=pre-data -d postgres -h /tmp -p 5678
--no-tablespaces# --clean --if-exist
 
|    real    4m56.627s
|$ time pg_dump --section=pre-data postgres -h /tmp -p 5678 |wc
|    1823612 4504584 58379810
|    real    1m4.452s

|/usr/pgsql-13/bin/initdb -D 13
|/usr/pgsql-13/bin/postgres -D 13 -c shared_buffers=256MB -c max_locks_per_transaction=128 -c port=5678 -c
unix_socket_directories=/tmp&
|psql -h /tmp -p 5678 postgres </srv/cdrperfbackup/ts/2020-08-10/pg_dumpall-g 
|time pg_restore /srv/cdrperfbackup/ts/2020-08-10/pg_dump-section\=pre-data -d postgres -h /tmp -p 5678
--no-tablespaces# --clean --if-exist 
 
|    real    6m49.964s
|$ time pg_dump --section=pre-data postgres -h /tmp -p 5678 |wc
|    1823612 4504584 58379813
|    real    19m42.918s

I'm trying to narrow this down, but I'd be very happy for suggestions.

-- 
Justin



Re: pg_dump from v13 is slow

От
Alvaro Herrera
Дата:
On 2020-Aug-13, Justin Pryzby wrote:

> I'm trying to narrow this down, but I'd be very happy for suggestions.

Maybe you can time "pg_dump --binary-upgrade" to see if the slowness
comes from there.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_dump from v13 is slow

От
Andres Freund
Дата:
Hi,

On 2020-08-13 17:48:23 -0500, Justin Pryzby wrote:
> I'm trying to narrow this down, but I'd be very happy for suggestions.

Would be worth knowing how much of the time pgbench is 100% CPU
utilized, and how much of the time it is basically waiting for server
side queries and largely idle.

If it's close to 100% busy for a significant part of that time, it'd be
useful to get a perf profile. If it's largely queries to the server that
are the issue, logging those would be relevant.

Greetings,

Andres Freund



Re: pg_dump from v13 is slow

От
Justin Pryzby
Дата:
I can reproduce the issue with generated data:

pryzbyj=# SELECT format('create table t%s(i int)', i) FROM generate_series(1,9999)i;
\set ECHO errors
\set QUIET
\gexec

$ time pg_dump --section=pre-data -d pryzbyj -h /tmp -p 5613 |wc   
 110015  240049 1577087
real    0m50.445s

$ time pg_dump --section=pre-data -d pryzbyj -h /tmp -p 5612 |wc
 110015  240049 1577084
real    0m11.203s

On Thu, Aug 13, 2020 at 04:30:14PM -0700, Andres Freund wrote:
> Would be worth knowing how much of the time pgbench is 100% CPU
> utilized, and how much of the time it is basically waiting for server
> side queries and largely idle.

Good question - I guess you mean pg_dump.

$ command time -v pg_dump --section=pre-data -d pryzbyj -h /tmp -p 5612 |wc   
        Command being timed: "pg_dump --section=pre-data -d pryzbyj -h /tmp -p 5612"
        User time (seconds): 0.65
        System time (seconds): 0.52
        Percent of CPU this job got: 9%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:11.85

$ command time -v pg_dump --section=pre-data -d pryzbyj -h /tmp -p 5613 |wc
        Command being timed: "pg_dump --section=pre-data -d pryzbyj -h /tmp -p 5613"
        User time (seconds): 0.79
        System time (seconds): 0.49
        Percent of CPU this job got: 2%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:48.51

So v13 was 4.5x slower and it seems to be all on the server side.

I looked queries like this:
time strace -ts999 -e sendto pg_dump  --section=pre-data -d pryzbyj -h /tmp -p 5613 2>strace-13-3 |wc
cut -c1-66 strace-13-3 |sort |uniq |less

Most of the time is spent on these three queries:

|12:58:11 sendto(3,
"Q\0\0\3\215SELECT\na.attnum,\na.attname,\na.atttypmod,\na.attstattarget,\na.attstorage,\nt.typstorage,\na.attn
|...
|12:58:30 sendto(3,
"Q\0\0\3\215SELECT\na.attnum,\na.attname,\na.atttypmod,\na.attstattarget,\na.attstorage,\nt.typstorage,\na.attn

|12:58:32 sendto(3, "Q\0\0\1\314SELECT oid, tableoid, pol.polname, pol.polcmd, pol.polpermissive, CASE WHEN
pol.polroles= '{0}' TH
 
|...
|12:58:47 sendto(3, "Q\0\0\1\314SELECT oid, tableoid, pol.polname, pol.polcmd, pol.polpermissive, CASE WHEN
pol.polroles= '{0}' TH
 

|12:58:49 sendto(3, "Q\0\0\0\213SELECT pr.tableoid, pr.oid, p.pubname FROM pg_publication_rel pr, pg_publication p
WHEREpr.prrelid
 
|...
|12:59:01 sendto(3, "Q\0\0\0\213SELECT pr.tableoid, pr.oid, p.pubname FROM pg_publication_rel pr, pg_publication p
WHEREpr.prrelid
 

Compare with v12:

|12:57:58 sendto(3,
"Q\0\0\3\215SELECT\na.attnum,\na.attname,\na.atttypmod,\na.attstattarget,\na.attstorage,\nt.typstorage,\na.attn
|...
|12:58:03 sendto(3,
"Q\0\0\3\215SELECT\na.attnum,\na.attname,\na.atttypmod,\na.attstattarget,\na.attstorage,\nt.typstorage,\na.attn

|12:58:05 sendto(3, "Q\0\0\1\314SELECT oid, tableoid, pol.polname, pol.polcmd, pol.polpermissive, CASE WHEN
pol.polroles= '{0}' TH
 
|...
|12:58:07 sendto(3, "Q\0\0\1\314SELECT oid, tableoid, pol.polname, pol.polcmd, pol.polpermissive, CASE WHEN
pol.polroles= '{0}' TH
 

|12:58:09 sendto(3, "Q\0\0\0\213SELECT pr.tableoid, pr.oid, p.pubname FROM pg_publication_rel pr, pg_publication p
WHEREpr.prrelid
 
|...
|12:58:11 sendto(3, "Q\0\0\0\213SELECT pr.tableoid, pr.oid, p.pubname FROM pg_publication_rel pr, pg_publication p
WHEREpr.prrelid
 

The first query regressed the worst.

$ psql -h /tmp -Ap 5612 pryzbyj
psql (13beta3, server 12.4)
pryzbyj=# explain analyze SELECT
a.attnum,a.attname,a.atttypmod,a.attstattarget,a.attstorage,t.typstorage,a.attnotnull,a.atthasdef,a.attisdropped,a.attlen,a.attalign,a.attislocal,pg_catalog.format_type(t.oid,
a.atttypmod)AS atttypname,a.attgenerated,CASE WHEN a.atthasmissing AND NOT a.attisdropped THEN a.attmissingval ELSE
nullEND AS attmissingval,a.attidentity,pg_catalog.array_to_string(ARRAY(SELECT pg_catalog.quote_ident(option_name) || '
'|| pg_catalog.quote_literal(option_value) FROM pg_catalog.pg_options_to_table(attfdwoptions) ORDER BY option_name),
E',   ') AS attfdwoptions,CASE WHEN a.attcollation <> t.typcollation THEN a.attcollation ELSE 0 END AS
attcollation,array_to_string(a.attoptions,', ') AS attoptions FROM pg_catalog.pg_attribute a LEFT JOIN
pg_catalog.pg_typet ON a.atttypid = t.oid WHERE a.attrelid = '191444'::pg_catalog.oid AND a.attnum > 0::pg_catalog.int2
ORDERBY a.attnum;
 
QUERY PLAN
Nested Loop Left Join  (cost=0.58..16.72 rows=1 width=217) (actual time=0.205..0.209 rows=1 loops=1)
  ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a  (cost=0.29..8.31 rows=1 width=189) (actual
time=0.030..0.032rows=1 loops=1)
 
        Index Cond: ((attrelid = '191444'::oid) AND (attnum > '0'::smallint))
  ->  Index Scan using pg_type_oid_index on pg_type t  (cost=0.29..8.30 rows=1 width=9) (actual time=0.011..0.011
rows=1loops=1)
 
        Index Cond: (oid = a.atttypid)
  SubPlan 1
    ->  Sort  (cost=0.09..0.09 rows=3 width=64) (actual time=0.119..0.119 rows=0 loops=1)
          Sort Key: pg_options_to_table.option_name
          Sort Method: quicksort  Memory: 25kB
          ->  Function Scan on pg_options_to_table  (cost=0.00..0.06 rows=3 width=64) (actual time=0.010..0.010 rows=0
loops=1)
Planning Time: 1.702 ms
Execution Time: 0.422 ms

$ psql -h /tmp -Ap 5613 pryzbyj
psql (13beta3)
pryzbyj=# explain analyze SELECT
a.attnum,a.attname,a.atttypmod,a.attstattarget,a.attstorage,t.typstorage,a.attnotnull,a.atthasdef,a.attisdropped,a.attlen,a.attalign,a.attislocal,pg_catalog.format_type(t.oid,
a.atttypmod)AS atttypname,a.attgenerated,CASE WHEN a.atthasmissing AND NOT a.attisdropped THEN a.attmissingval ELSE
nullEND AS attmissingval,a.attidentity,pg_catalog.array_to_string(ARRAY(SELECT pg_catalog.quote_ident(option_name) || '
'|| pg_catalog.quote_literal(option_value) FROM pg_catalog.pg_options_to_table(attfdwoptions) ORDER BY option_name),
E',   ') AS attfdwoptions,CASE WHEN a.attcollation <> t.typcollation THEN a.attcollation ELSE 0 END AS
attcollation,array_to_string(a.attoptions,', ') AS attoptions FROM pg_catalog.pg_attribute a LEFT JOIN
pg_catalog.pg_typet ON a.atttypid = t.oid WHERE a.attrelid = '164518'::pg_catalog.oid AND a.attnum > 0::pg_catalog.int2
ORDERBY a.attnum;
 
QUERY PLAN
Nested Loop Left Join  (cost=0.58..16.72 rows=1 width=217) (actual time=0.134..0.139 rows=1 loops=1)
  ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a  (cost=0.29..8.31 rows=1 width=189) (actual
time=0.028..0.030rows=1 loops=1)
 
        Index Cond: ((attrelid = '164518'::oid) AND (attnum > '0'::smallint))
  ->  Index Scan using pg_type_oid_index on pg_type t  (cost=0.29..8.30 rows=1 width=9) (actual time=0.008..0.008
rows=1loops=1)
 
        Index Cond: (oid = a.atttypid)
  SubPlan 1
    ->  Sort  (cost=0.09..0.09 rows=3 width=64) (actual time=0.065..0.065 rows=0 loops=1)
          Sort Key: pg_options_to_table.option_name
          Sort Method: quicksort  Memory: 25kB
          ->  Function Scan on pg_options_to_table  (cost=0.00..0.06 rows=3 width=64) (actual time=0.005..0.005 rows=0
loops=1)
Planning Time: 1.457 ms
Execution Time: 0.431 ms

I don't know if it's any issue, but I found that pg12 can process "null
statements" almost 2x as fast:

$ time for a in `seq 1 9999`; do echo ';'; done |psql -h /tmp -p 5613 postgres 
    real    0m0.745s
$ time for a in `seq 1 9999`; do echo ';'; done |psql -h /tmp -p 5612 postgres 
    real    0m0.444s

-- 
Justin



Re: pg_dump from v13 is slow

От
Alvaro Herrera
Дата:
Hmm, I wonder if you're comparing an assert-enabled pg13 build to a
non-assert-enabled pg12 build, or something like that.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_dump from v13 is slow

От
Justin Pryzby
Дата:
On Thu, Aug 13, 2020 at 08:53:46PM -0400, Alvaro Herrera wrote:
> Hmm, I wonder if you're comparing an assert-enabled pg13 build to a
> non-assert-enabled pg12 build, or something like that.

Great question - I thought of it myself but then forgot to look..

$ rpm -q postgresql1{2,3}-server
postgresql12-server-12.4-1PGDG.rhel7.x86_64
postgresql13-server-13-beta3_1PGDG.rhel7.x86_64

$ /usr/pgsql-12/bin/pg_config |grep -o cassert || echo not found
not found
$ /usr/pgsql-13/bin/pg_config |grep -o cassert || echo not found
cassert

It looks like the beta packages are compiled with cassert, which makes sense.

Thanks and sorry for noise.

-- 
Justin



Re: pg_dump from v13 is slow

От
Tom Lane
Дата:
Justin Pryzby <pryzby@telsasoft.com> writes:
> I can reproduce the issue with generated data:
> pryzbyj=# SELECT format('create table t%s(i int)', i) FROM generate_series(1,9999)i;

Hm, I tried this case and didn't really detect much runtime difference
between v12 and HEAD.

> I don't know if it's any issue, but I found that pg12 can process "null
> statements" almost 2x as fast:

Now I'm suspicious that you're comparing an assert-enabled v13 build
to a non-assert-enabled v12 build.  Check the output of
"pg_config --configure" from each installation to see if they're
configured alike.

            regards, tom lane