Обсуждение: High COMMIT times

Поиск
Список
Период
Сортировка

High COMMIT times

От
Don Seiler
Дата:
Good morning,

This week we've noticed that we're starting to see spikes where COMMITs are taking much longer than usual. Sometimes, quite a few seconds to finish. After a few minutes they disappear but then return seemingly at random. This becomes visible to the app and end user as a big stall in activity.

The checkpoints are still running for their full 5 min checkpoint_timeout duration (logs all say "checkpoint starting: time" and I'm not seeing any warnings about them occurring too frequently.

This is PostgreSQL 12.4 on Ubuntu 18.04, all running in MS Azure (*not* managed by them). 

# select version();
                                                             version
---------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 12.4 (Ubuntu 12.4-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit


I have the stats_temp_directory in a tmpfs mount. I do have pg_wal on the same premium SSD storage volume as the data directory. Normally I would know to separate these but I was told with the cloud storage that it's all virtualized anyway, plus storage IOPS are determined by disk size so having a smaller volume just for pg_wal would hurt me in this case. The kind folks in the PG community Slack suggested just having one large premium cloud storage mount for the data directory and leave pg_wal inside because this virtualization removes any guarantee of true separation.

I'm wondering if others have experience running self-managed PG in a cloud setting (especially if in MS Azure) and what they might have seen/done in cases like this.

Thanks,
Don.

--
Don Seiler
www.seiler.us

Re: High COMMIT times

От
Joshua Drake
Дата:
I have the stats_temp_directory in a tmpfs mount. I do have pg_wal on the same premium SSD storage volume as the data directory. Normally I would know to separate these but I was told with the cloud storage that it's all virtualized anyway, plus storage IOPS are determined by disk size so having a smaller volume just for pg_wal would hurt me in this case. The kind folks in the PG community Slack suggested just having one large premium cloud storage mount for 
the data directory and leave pg_wal inside because this virtualization removes any guarantee of true separation.

It is true that the IO is virtualized but that does not mean that separate volumes won't help. In cloud storage you are granted specific IOPS/MB/s per volume. Separating pg_wal to a new volume mount will take pressure off of page writes and allow the wal to write within its own prioritization.

JD

Re: High COMMIT times

От
Don Seiler
Дата:
On Wed, Jan 6, 2021 at 10:51 AM Joshua Drake <jd@commandprompt.com> wrote:
I have the stats_temp_directory in a tmpfs mount. I do have pg_wal on the same premium SSD storage volume as the data directory. Normally I would know to separate these but I was told with the cloud storage that it's all virtualized anyway, plus storage IOPS are determined by disk size so having a smaller volume just for pg_wal would hurt me in this case. The kind folks in the PG community Slack suggested just having one large premium cloud storage mount for 
the data directory and leave pg_wal inside because this virtualization removes any guarantee of true separation.

It is true that the IO is virtualized but that does not mean that separate volumes won't help. In cloud storage you are granted specific IOPS/MB/s per volume. Separating pg_wal to a new volume mount will take pressure off of page writes and allow the wal to write within its own prioritization.

Looking at the Azure portal metric, we are nowhere close to the advertised maximum IOPS or MB/s throughput (under half of the maximum IOPS and under a quarter of the MB/s maximum). So there must be some other bottleneck in play. The IOPS limit on this VM size is even higher so that shouldn't be it.

If I were to size a separate volume for just WAL, I would think 64GB but obviously the Azure storage IOPS are much less. On this particular DB host, we're currently on a 2.0T P40 disk that is supposed to give 7500 IOPS and 250MB/s [1] (but again, Azure's own usage graphs show us nowhere near those limits). A smaller volume like 64GB would be provisioned at 240 IOPS in this example. Doesn't seem like a lot. Really until you get a terabyte it seems like a real drop-off as far as Azure storage goes.

I'd be interested to hear what others might have configured on their write-heavy cloud databases.


Don.

--
Don Seiler
www.seiler.us

Re: High COMMIT times

От
Joshua Drake
Дата:



Looking at the Azure portal metric, we are nowhere close to the advertised maximum IOPS or MB/s throughput (under half of the maximum IOPS and under a quarter of the MB/s maximum). So there must be some other bottleneck in play. The IOPS limit on this VM size is even higher so that shouldn't be it.

If I were to size a separate volume for just WAL, I would think 64GB but obviously the Azure storage IOPS are much less. On this particular DB host, we're currently on a 2.0T P40 disk that is supposed to give 7500 IOPS and 250MB/s [1] (but again, Azure's own usage graphs show us nowhere near those limits). A smaller volume like 64GB would be provisioned at 240 IOPS in this example. Doesn't seem like a lot. Really until you get a terabyte it seems like a real drop-off as far as Azure storage goes.


Based on those metrics, I would start looking at other things. For example, I once (it was years ago) experienced commit delays because the kernel cache on Linux was getting over run. Do you have any metrics on the system as a whole? Perhaps sar running every few minutes will help you identify a correlation?

JD

 
I'd be interested to hear what others might have configured on their write-heavy cloud databases.


Don.

--
Don Seiler
www.seiler.us

Re: High COMMIT times

От
Don Seiler
Дата:
Azure VMs do have their own IOPS limits that increase with increasing VM "size". In this current case our VM size puts that VM IOPS limit well above anything the disks are rated at, so it shouldn't be a bottleneck.

On Wed, Jan 6, 2021 at 1:15 PM Kenneth Marshall <ktm@rice.edu> wrote:
On Wed, Jan 06, 2021 at 12:06:27PM -0600, Don Seiler wrote:
> On Wed, Jan 6, 2021 at 10:51 AM Joshua Drake <jd@commandprompt.com> wrote:
>
> Looking at the Azure portal metric, we are nowhere close to the advertised
> maximum IOPS or MB/s throughput (under half of the maximum IOPS and under a
> quarter of the MB/s maximum). So there must be some other bottleneck in
> play. The IOPS limit on this VM size is even higher so that shouldn't be it.
>

Hi Don,

I may just be re-stating common knowledge, but the available IOPS would
be constrained by how tightly coupled the storage is to the CPU. Even a
small increase can limit the maximum IOPS unless you can issue multiple
relatively independent queries at one. I know no details of how Azure
implements their storage tiers.

Regards,
Ken


--
Don Seiler
www.seiler.us

Re: High COMMIT times

От
Laurenz Albe
Дата:
On Wed, 2021-01-06 at 10:19 -0600, Don Seiler wrote:
> This week we've noticed that we're starting to see spikes where COMMITs are taking much longer than usual.
>  Sometimes, quite a few seconds to finish.
>
> This is PostgreSQL 12.4 on Ubuntu 18.04, all running in MS Azure (*not* managed by them).

Unless you are using WITH HOLD cursors on large result sets, this is very likely
I/O overload.  Use tools like "sar", "vmstat" and "iostat" to monitor your I/O load.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: High COMMIT times

От
Craig Jackson
Дата:
We had a similar situation recently and saw high commit times that were caused by having unindexed foreign key columns when deleting data with large tables involved.  You might check to see if any new foreign key constraints have been added recently or if any foreign key indexes may have inadvertently been removed. Indexing the foreign keys resolved our issue. 

Regards,

Craig

On Wed, Jan 6, 2021 at 9:19 AM Don Seiler <don@seiler.us> wrote:
Good morning,

This week we've noticed that we're starting to see spikes where COMMITs are taking much longer than usual. Sometimes, quite a few seconds to finish. After a few minutes they disappear but then return seemingly at random. This becomes visible to the app and end user as a big stall in activity.

The checkpoints are still running for their full 5 min checkpoint_timeout duration (logs all say "checkpoint starting: time" and I'm not seeing any warnings about them occurring too frequently.

This is PostgreSQL 12.4 on Ubuntu 18.04, all running in MS Azure (*not* managed by them). 

# select version();
                                                             version
---------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 12.4 (Ubuntu 12.4-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit


I have the stats_temp_directory in a tmpfs mount. I do have pg_wal on the same premium SSD storage volume as the data directory. Normally I would know to separate these but I was told with the cloud storage that it's all virtualized anyway, plus storage IOPS are determined by disk size so having a smaller volume just for pg_wal would hurt me in this case. The kind folks in the PG community Slack suggested just having one large premium cloud storage mount for the data directory and leave pg_wal inside because this virtualization removes any guarantee of true separation.

I'm wondering if others have experience running self-managed PG in a cloud setting (especially if in MS Azure) and what they might have seen/done in cases like this.

Thanks,
Don.

--
Don Seiler
www.seiler.us


--
Craig 

This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.
Вложения

Re: High COMMIT times

От
Don Seiler
Дата:
On Thu, Jan 7, 2021 at 11:50 AM Craig Jackson <craig.jackson@broadcom.com> wrote:
We had a similar situation recently and saw high commit times that were caused by having unindexed foreign key columns when deleting data with large tables involved.  You might check to see if any new foreign key constraints have been added recently or if any foreign key indexes may have inadvertently been removed. Indexing the foreign keys resolved our issue. 

Interesting, I'll run a check for any. Thanks!

Don.

--
Don Seiler
www.seiler.us

Re: High COMMIT times

От
Laurenz Albe
Дата:
On Thu, 2021-01-07 at 10:49 -0700, Craig Jackson wrote:
> We had a similar situation recently and saw high commit times that were caused
>  by having unindexed foreign key columns when deleting data with large tables involved.
> You might check to see if any new foreign key constraints have been added
>  recently or if any foreign key indexes may have inadvertently been removed.
>  Indexing the foreign keys resolved our issue. 

Were these deferred foreign key constraints?

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: High COMMIT times

От
José Arthur Benetasso Villanova
Дата:
On Thu, Jan 7, 2021 at 3:03 PM Don Seiler <don@seiler.us> wrote:
On Thu, Jan 7, 2021 at 11:50 AM Craig Jackson <craig.jackson@broadcom.com> wrote:
We had a similar situation recently and saw high commit times that were caused by having unindexed foreign key columns when deleting data with large tables involved.  You might check to see if any new foreign key constraints have been added recently or if any foreign key indexes may have inadvertently been removed. Indexing the foreign keys resolved our issue. 

Interesting, I'll run a check for any. Thanks!

Don.

--
Don Seiler
www.seiler.us

Do you have standby databases and synchronous_commit = 'remote_apply'?
--
José Arthur Benetasso Villanova

Re: High COMMIT times

От
Don Seiler
Дата:
On Fri, Jan 8, 2021 at 11:52 AM José Arthur Benetasso Villanova <jose.arthur@gmail.com> wrote:
Do you have standby databases and synchronous_commit = 'remote_apply'?

We have standby databases, but synchronous_commit is "on".

I found a pair of unindexed foreign key child tables but they never run deletes on those and they are very small.  I did mention it to the app team as a best practice in the meantime but I don't think it would play a part in our current issue.

Don.

--
Don Seiler
www.seiler.us

Re: High COMMIT times

От
Craig Jackson
Дата:
Yes, these were deferrable foreign key constraints. 

On Fri, Jan 8, 2021 at 2:05 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2021-01-07 at 10:49 -0700, Craig Jackson wrote:
> We had a similar situation recently and saw high commit times that were caused
>  by having unindexed foreign key columns when deleting data with large tables involved.
> You might check to see if any new foreign key constraints have been added
>  recently or if any foreign key indexes may have inadvertently been removed.
>  Indexing the foreign keys resolved our issue.

Were these deferred foreign key constraints?

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



--
Craig 

This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.
Вложения

Re: High COMMIT times

От
Jeff Janes
Дата:


On Wed, Jan 6, 2021 at 11:19 AM Don Seiler <don@seiler.us> wrote:
Good morning,

This week we've noticed that we're starting to see spikes where COMMITs are taking much longer than usual. Sometimes, quite a few seconds to finish. After a few minutes they disappear but then return seemingly at random. This becomes visible to the app and end user as a big stall in activity.
 
How are you monitoring the COMMIT times?  What do you generally see in pg_stat_activity.wait_event during the spikes/stalls?

Cheers,

Jeff

Re: High COMMIT times

От
Don Seiler
Дата:
On Sat, Jan 9, 2021 at 2:07 PM Jeff Janes <jeff.janes@gmail.com> wrote:

How are you monitoring the COMMIT times?  What do you generally see in pg_stat_activity.wait_event during the spikes/stalls?

Right now we just observe the COMMIT duration posted in the postgresql log (we log anything over 100ms).

One other thing that I shamefully forgot to mention. When we see these slow COMMITs in the log, they coincide with a connection storm (Cat 5 hurricane) from our apps where connections will go from ~200 to ~1200. This will probably disgust many, but our PG server's max_connections is set to 2000. We have a set of pgbouncers in front of this with a total max_db_connections of 1600. I know many of you think this defeats the whole purpose of having pgbouncer and I agree. I've been trying to explain as much and that even with 32 CPUs on this DB host, we probably shouldn't expect to be able to support more than 100-200 active connections, let alone 1600. I'm still pushing to have our app server instances (which also use their own JDBC (Hikari) connection pool and *then* go through pgbouncer) to lower their min/max connection settings but obviously it's sort of counterintuitive at first glance but hopefully everyone sees the bigger picture.

One nagging question I have is if the slow COMMIT is triggering the connection storm (eg app sees slow response or timeout from a current connection and fires off a new connection in its place), or vice-versa. We're planning to deploy new performant cloud storage (Azure Ultra disk) just for WAL logs but I'm hesitant to say it'll be a silver bullet when we still have this insane connection management strategy in place.

Curious to know what others think (please pull no punches) and if others have been in a similar scenario with anecdotes to share.

Thanks,
Don.

--
Don Seiler
www.seiler.us

Re: High COMMIT times

От
Craig Jackson
Дата:
How far apart are the min/max connection settings on your application connection pool? We had a similar issue with connection storms in the past on Oracle. One thing we did to minimize the storms was make sure there was not a wide gap between the min/max, say no more than a 5-10 connection difference between min/max.

Regards,

Craig 

On Sun, Jan 10, 2021 at 5:42 PM Don Seiler <don@seiler.us> wrote:
On Sat, Jan 9, 2021 at 2:07 PM Jeff Janes <jeff.janes@gmail.com> wrote:

How are you monitoring the COMMIT times?  What do you generally see in pg_stat_activity.wait_event during the spikes/stalls?

Right now we just observe the COMMIT duration posted in the postgresql log (we log anything over 100ms).

One other thing that I shamefully forgot to mention. When we see these slow COMMITs in the log, they coincide with a connection storm (Cat 5 hurricane) from our apps where connections will go from ~200 to ~1200. This will probably disgust many, but our PG server's max_connections is set to 2000. We have a set of pgbouncers in front of this with a total max_db_connections of 1600. I know many of you think this defeats the whole purpose of having pgbouncer and I agree. I've been trying to explain as much and that even with 32 CPUs on this DB host, we probably shouldn't expect to be able to support more than 100-200 active connections, let alone 1600. I'm still pushing to have our app server instances (which also use their own JDBC (Hikari) connection pool and *then* go through pgbouncer) to lower their min/max connection settings but obviously it's sort of counterintuitive at first glance but hopefully everyone sees the bigger picture.

One nagging question I have is if the slow COMMIT is triggering the connection storm (eg app sees slow response or timeout from a current connection and fires off a new connection in its place), or vice-versa. We're planning to deploy new performant cloud storage (Azure Ultra disk) just for WAL logs but I'm hesitant to say it'll be a silver bullet when we still have this insane connection management strategy in place.

Curious to know what others think (please pull no punches) and if others have been in a similar scenario with anecdotes to share.

Thanks,
Don.

--
Don Seiler
www.seiler.us


--
Craig 

This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.
Вложения

Re: High COMMIT times

От
Don Seiler
Дата:
On Mon, Jan 11, 2021 at 9:06 AM Craig Jackson <craig.jackson@broadcom.com> wrote:
How far apart are the min/max connection settings on your application connection pool? We had a similar issue with connection storms in the past on Oracle. One thing we did to minimize the storms was make sure there was not a wide gap between the min/max, say no more than a 5-10 connection difference between min/max.

App instances by default are configured for 10 connections in their Hikari connection pool, with no different setting for min so it's always 10. Some have begun setting a minimumIdle to 5. I'm pushing them now to lower their minimumIdle to something like 2, and if they are scaled out to multiple app instances then they should think about lowering their max from 10 to 5 perhaps.

From a recent spike from this morning (names have been changed, but other data is real):

From this morning's spike. At 06:03:15, service foo had 2 active sessions and 127 idle sessions. At 06:03:30 (the next tick in grafana), it had 5 active sessions but 364 idle sessions. The foo_db DB overall had 9 active and 337 idle at 06:03:15, and then 5 active and 788 idle overall in the next tick. So a flood of new connections were created within that 15 second interval (probably all within the same second) and more or less abandoned.
 
Don.
--
Don Seiler
www.seiler.us

Re: High COMMIT times

От
Don Seiler
Дата:

Today another DBA suggested that perhaps increasing wal_buffers might be an option. Currently our wal_buffers is 32MB. Total system memory is 128GB, FYI.

I'm curious if wal_buffers being full and/or undersized would present itself in higher COMMIT times as we are observing. We're considering trying this with wal_buffers=64MB as a start. I don't see any actual limit defined in the documentation, but I'm assuming there's such a thing as "too high" here. I'm curious what is and what negative side effects would indicate that as well.

From the documentation:

On systems with high log output, XLogFlush requests might not occur often enough to prevent XLogInsertRecord from having to do writes. On such systems one should increase the number of WAL buffers by modifying the wal_buffers parameter. When full_page_writes is set and the system is very busy, setting wal_buffers higher will help smooth response times during the period immediately following each checkpoint.

We do have full_page_writes=on as well, so this sounds like worth exploring.

We also currently have wal_compression=off. We have plenty of CPU headroom and so are considering enabling that first (since it doesn't require a DB restart like changing wal_buffers does). I imagine enabling wal_compression would save some IO.

I'm interested to hear what you folks think about either or both of these suggestions.

Thanks,
Don.

--
Don Seiler
www.seiler.us