Обсуждение: Oh, this is embarrassing: init file logic is still broken
Chasing a problem identified by my Salesforce colleagues led me to the conclusion that my commit f3b5565dd ("Use a safer method for determining whether relcache init file is stale") is rather borked. It causes pg_trigger_tgrelid_tgname_index to be omitted from the relcache init file, because that index is not used by any syscache. I had been aware of that actually, but considered it a minor issue. It's not so minor though, because RelationCacheInitializePhase3 marks that index as nailed for performance reasons, and includes it in NUM_CRITICAL_LOCAL_INDEXES. That means that load_relcache_init_file *always* decides that the init file is busted and silently(!) ignores it. So we're taking a nontrivial hit in backend startup speed as of the last set of minor releases. Clearly, my shortcut of defining the init file contents as exactly the set of syscache-supporting rels was too short. I think the cleanest fix is to create a wrapper function in relcache.c that encapsulates the knowledge that pg_trigger_tgrelid_tgname_index is also to be included in the init file. But to prevent this type of problem from recurring, we need some defenses against the wrapper getting out of sync with reality elsewhere. I suggest that: 1. write_relcache_init_file ought to bitch if it concludes that a nailed relation is not to be written to the init file. Probably an Assert is enough. 2. load_relcache_init_file ought to complain if it takes the "wrong number of nailed relations" exit path. Here, it seems like there might be a chance of the path being taken validly, for example if a minor release were to decide to nail a rel that wasn't nailed before, or vice versa. So what I'm thinking about here is an elog(WARNING) complaint, which would make logic bugs like this pretty visible in development builds. If we ever did make a change like that in a minor release, people might get a one-time WARNING when upgrading, but I think that would be acceptable. Thoughts, better ideas? regards, tom lane
On 06/23/2015 04:44 PM, Tom Lane wrote: > Chasing a problem identified by my Salesforce colleagues led me to the > conclusion that my commit f3b5565dd ("Use a safer method for determining > whether relcache init file is stale") is rather borked. It causes > pg_trigger_tgrelid_tgname_index to be omitted from the relcache init file, > because that index is not used by any syscache. I had been aware of that > actually, but considered it a minor issue. It's not so minor though, > because RelationCacheInitializePhase3 marks that index as nailed for > performance reasons, and includes it in NUM_CRITICAL_LOCAL_INDEXES. > That means that load_relcache_init_file *always* decides that the init > file is busted and silently(!) ignores it. So we're taking a nontrivial > hit in backend startup speed as of the last set of minor releases. OK, this is pretty bad in its real performance effects. On a workload which is dominated by new connection creation, we've lost about 17% throughput. To test it, I ran pgbench -s 100 -j 2 -c 6 -r -C -S -T 1200 against a database which fits in shared_buffers on two different m3.large instances on AWS (across the network, not on unix sockets). A typical run on 9.3.6 looks like this: scaling factor: 100 query mode: simple number of clients: 6 number of threads: 2 duration: 1200 s number of transactions actually processed: 252322 tps = 210.267219 (including connections establishing) tps = 31958.233736 (excluding connections establishing) statement latencies in milliseconds: 0.002515 \set naccounts 100000 * :scale 0.000963 \setrandomaid 1 :naccounts 19.042859 SELECT abalance FROM pgbench_accounts WHERE aid = :aid; Whereas a typical run on 9.3.9 looks like this: scaling factor: 100 query mode: simple number of clients: 6 number of threads: 2 duration: 1200 s number of transactions actually processed: 208180 tps = 173.482259 (including connections establishing) tps = 31092.866153 (excluding connections establishing) statement latencies in milliseconds: 0.002518 \set naccounts 100000 * :scale 0.000988 \setrandomaid 1 :naccounts 23.076961 SELECT abalance FROM pgbench_accounts WHERE aid = :aid; Numbers are pretty consistent on four runs each on two different instances (+/- 4%), so I don't think this is Amazon variability we're seeing. I think the syscache invalidation is really costing us 17%. :-( -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
> On 06/23/2015 04:44 PM, Tom Lane wrote: >> Chasing a problem identified by my Salesforce colleagues led me to the >> conclusion that my commit f3b5565dd ("Use a safer method for determining >> whether relcache init file is stale") is rather borked. It causes >> pg_trigger_tgrelid_tgname_index to be omitted from the relcache init file, >> because that index is not used by any syscache. I had been aware of that >> actually, but considered it a minor issue. It's not so minor though, >> because RelationCacheInitializePhase3 marks that index as nailed for >> performance reasons, and includes it in NUM_CRITICAL_LOCAL_INDEXES. >> That means that load_relcache_init_file *always* decides that the init >> file is busted and silently(!) ignores it. So we're taking a nontrivial >> hit in backend startup speed as of the last set of minor releases. > > OK, this is pretty bad in its real performance effects. On a workload > which is dominated by new connection creation, we've lost about 17% > throughput. Are we going to release 9.4.5 etc. soon? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Tatsuo Ishii <ishii@postgresql.org> writes: >>> That means that load_relcache_init_file *always* decides that the init >>> file is busted and silently(!) ignores it. So we're taking a nontrivial >>> hit in backend startup speed as of the last set of minor releases. >> OK, this is pretty bad in its real performance effects. On a workload >> which is dominated by new connection creation, we've lost about 17% >> throughput. > Are we going to release 9.4.5 etc. soon? I can't see doing a release just for this. If we were due for releases anyway, sure, but we've considerably overstressed our poor packagers of late. Previous discussion was to the effect that we'd anticipate another set of releases in a month or so, after some more multixact fixes have landed. regards, tom lane
On 06/25/2015 08:12 AM, Tom Lane wrote: > Tatsuo Ishii <ishii@postgresql.org> writes: >>>> That means that load_relcache_init_file *always* decides that the init >>>> file is busted and silently(!) ignores it. So we're taking a nontrivial >>>> hit in backend startup speed as of the last set of minor releases. > >>> OK, this is pretty bad in its real performance effects. On a workload >>> which is dominated by new connection creation, we've lost about 17% >>> throughput. > >> Are we going to release 9.4.5 etc. soon? > > I can't see doing a release just for this. If we were due for releases > anyway, sure, but we've considerably overstressed our poor packagers of > late. Previous discussion was to the effect that we'd anticipate another > set of releases in a month or so, after some more multixact fixes have > landed. FWIW, I know users who will not update because of this regression. Consider applications which regularly need to spin up 200 new connections in 90 seconds due to usage peaks. On the other hand, do I want to do another update release right away? No. Hard place, meet rock. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Wed, Jun 24, 2015 at 2:52 PM, Josh Berkus <josh@agliodbs.com> wrote: > OK, this is pretty bad in its real performance effects. On a workload > which is dominated by new connection creation, we've lost about 17% > throughput. Mistakes happen, but this is the kind of regression that automated performance testing could have caught. That's another area of testing that needs considerable improvement IMV. -- Peter Geoghegan
On 06/25/2015 10:47 AM, Peter Geoghegan wrote: > On Wed, Jun 24, 2015 at 2:52 PM, Josh Berkus <josh@agliodbs.com> wrote: >> OK, this is pretty bad in its real performance effects. On a workload >> which is dominated by new connection creation, we've lost about 17% >> throughput. > > Mistakes happen, but this is the kind of regression that automated > performance testing could have caught. That's another area of testing > that needs considerable improvement IMV. Well, I have a pgbench test script for it now. If only I had some place to check it in ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > On 06/25/2015 08:12 AM, Tom Lane wrote: >> I can't see doing a release just for this. If we were due for releases >> anyway, sure, but we've considerably overstressed our poor packagers of >> late. Previous discussion was to the effect that we'd anticipate another >> set of releases in a month or so, after some more multixact fixes have >> landed. > FWIW, I know users who will not update because of this regression. > Consider applications which regularly need to spin up 200 new > connections in 90 seconds due to usage peaks. TBH, if your app is critically dependent on backend startup time, You're Doing It Wrong, because that means you're pissing away significant performance by not using a connection pooler. So I don't have a huge amount of sympathy. regards, tom lane
On 06/25/2015 11:04 AM, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> On 06/25/2015 08:12 AM, Tom Lane wrote: >>> I can't see doing a release just for this. If we were due for releases >>> anyway, sure, but we've considerably overstressed our poor packagers of >>> late. Previous discussion was to the effect that we'd anticipate another >>> set of releases in a month or so, after some more multixact fixes have >>> landed. > >> FWIW, I know users who will not update because of this regression. >> Consider applications which regularly need to spin up 200 new >> connections in 90 seconds due to usage peaks. > > TBH, if your app is critically dependent on backend startup time, > You're Doing It Wrong, because that means you're pissing away significant > performance by not using a connection pooler. So I don't have a huge > amount of sympathy. Most of these apps are using a connection pooler. But usage is very spiky, and one doesn't want to keep around 300 idle connections all the time for the 5 minutes when you need them. Indeed, pgbouncer is configured to spin down idle connections after a configured amount of time idle, as are most connection poolers. Regardless, I don't think it's a good idea to release again sooner than planned. It is a good reason to look at making connection creation time a regular test, though; you can be sure I'll be testing 9.4.5 and 9.5a! -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 6/25/15 12:51 PM, Josh Berkus wrote: > On 06/25/2015 10:47 AM, Peter Geoghegan wrote: >> On Wed, Jun 24, 2015 at 2:52 PM, Josh Berkus <josh@agliodbs.com> wrote: >>> OK, this is pretty bad in its real performance effects. On a workload >>> which is dominated by new connection creation, we've lost about 17% >>> throughput. >> >> Mistakes happen, but this is the kind of regression that automated >> performance testing could have caught. That's another area of testing >> that needs considerable improvement IMV. > > Well, I have a pgbench test script for it now. If only I had some place > to check it in ... Well, there is https://github.com/slux/PostgreSQL-Performance-Farm; but it hasn't been touched in 5 years. :( -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
F.Y.I. here is the results I did on my laptop (Ubuntu 14, i7-4600U, 16GB mem, 512GB SSD). Unlike Josh, I used Unix domain sockets. In summary: 9.4.3: 943.439840 9.4.4: 429.953953 9.4 stable as of June 30: 929.804491 So comparing with 9.4.3, 9.4.4 is 54% slow, and 9.4-stable is 1.4% slow. I think the difference between 9.4.3 seems a measurement error but the difference between 9.4.3 and 9.4.4 is real. Good thing is 9.4 stable fixes the issue as expected. Thanks Tom! Below is the details. 9.4.3: t-ishii@localhost: pgbench -p 5434 -s 100 -j 2 -c 6 -r -C -S -T 1200 test Scale option ignored, using pgbench_branches table count = 100 starting vacuum...end. transaction type: SELECT only scaling factor: 100 query mode: simple number of clients: 6 number of threads: 2 duration: 1200 s number of transactions actually processed: 1132130 latency average: 6.360 ms tps = 943.439840 (including connections establishing) tps = 64573.165915 (excluding connections establishing) statement latencies in milliseconds:0.002809 \set naccounts 100000 * :scale0.001427 \setrandom aid 1 :naccounts4.254619 SELECT abalance FROM pgbench_accounts WHERE aid = :aid; 9.4.4: t-ishii@localhost: pgbench -p 5433 -s 100 -j 2 -c 6 -r -C -S -T 1200 test Scale option ignored, using pgbench_branches table count = 100 starting vacuum...end. transaction type: SELECT only scaling factor: 100 query mode: simple number of clients: 6 number of threads: 2 duration: 1200 s number of transactions actually processed: 515946 latency average: 13.955 ms tps = 429.953953 (including connections establishing) tps = 56235.664740 (excluding connections establishing) statement latencies in milliseconds:0.003311 \set naccounts 100000 * :scale0.001653 \setrandom aid 1 :naccounts9.320204 SELECT abalance FROM pgbench_accounts WHERE aid = :aid 9.4-stable: t-ishii@localhost: pgbench -p 5435 -s 100 -j 2 -c 6 -r -C -S -T 1200 test Scale option ignored, using pgbench_branches table count = 100 starting vacuum...end. transaction type: SELECT only scaling factor: 100 query mode: simple number of clients: 6 number of threads: 2 duration: 1200 s number of transactions actually processed: 1115767 latency average: 6.453 ms tps = 929.804491 (including connections establishing) tps = 64676.863921 (excluding connections establishing) statement latencies in milliseconds:0.002803 \set naccounts 100000 * :scale0.001395 \setrandom aid 1 :naccounts4.316686 SELECT abalance FROM pgbench_accounts WHERE aid = :aid; > OK, this is pretty bad in its real performance effects. On a workload > which is dominated by new connection creation, we've lost about 17% > throughput. > > To test it, I ran pgbench -s 100 -j 2 -c 6 -r -C -S -T 1200 against a > database which fits in shared_buffers on two different m3.large > instances on AWS (across the network, not on unix sockets). A typical > run on 9.3.6 looks like this: > > scaling factor: 100 > query mode: simple > number of clients: 6 > number of threads: 2 > duration: 1200 s > number of transactions actually processed: 252322 > tps = 210.267219 (including connections establishing) > tps = 31958.233736 (excluding connections establishing) > statement latencies in milliseconds: > 0.002515 \set naccounts 100000 * :scale > 0.000963 \setrandom aid 1 :naccounts > 19.042859 SELECT abalance FROM pgbench_accounts WHERE aid > = :aid; > > Whereas a typical run on 9.3.9 looks like this: > > scaling factor: 100 > query mode: simple > number of clients: 6 > number of threads: 2 > duration: 1200 s > number of transactions actually processed: 208180 > tps = 173.482259 (including connections establishing) > tps = 31092.866153 (excluding connections establishing) > statement latencies in milliseconds: > 0.002518 \set naccounts 100000 * :scale > 0.000988 \setrandom aid 1 :naccounts > 23.076961 SELECT abalance FROM pgbench_accounts WHERE aid > = :aid; > > Numbers are pretty consistent on four runs each on two different > instances (+/- 4%), so I don't think this is Amazon variability we're > seeing. I think the syscache invalidation is really costing us 17%. :-( > > -- > Josh Berkus > PostgreSQL Experts Inc. > http://pgexperts.com > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers