Обсуждение: Adding locks statistics
Hi hackers, Please find attached a patch to add a new view (namely pg_stat_lock) that provides lock statistics. It’s output is like the following: postgres=# select * from pg_stat_lock; locktype | requests | waits | timeouts | deadlock_timeouts | deadlocks | fastpath | stats_reset ------------------+----------+-------+----------+-------------------+-----------+----------+------------------------------- relation | 612775 | 1 | 0 | 0 | 0 | 531115 | 2025-08-01 09:18:26.476275+00 extend | 3128 | 0 | 0 | 0 | 0 | 0 | 2025-08-01 09:18:26.476275+00 frozenid | 11 | 0 | 0 | 0 | 0 | 0 | 2025-08-01 09:18:26.476275+00 page | 1 | 0 | 0 | 0 | 0 | 0 | 2025-08-01 09:18:26.476275+00 tuple | 3613 | 0 | 0 | 0 | 0 | 0 | 2025-08-01 09:18:26.476275+00 transactionid | 6130 | 0 | 0 | 0 | 0 | 0 | 2025-08-01 09:18:26.476275+00 virtualxid | 15390 | 0 | 0 | 0 | 0 | 15390 | 2025-08-01 09:18:26.476275+00 spectoken | 12 | 0 | 0 | 0 | 0 | 0 | 2025-08-01 09:18:26.476275+00 object | 8393 | 0 | 0 | 0 | 0 | 0 | 2025-08-01 09:18:26.476275+00 userlock | 0 | 0 | 0 | 0 | 0 | 0 | 2025-08-01 09:18:26.476275+00 advisory | 44 | 0 | 0 | 0 | 0 | 0 | 2025-08-01 09:18:26.476275+00 applytransaction | 0 | 0 | 0 | 0 | 0 | 0 | 2025-08-01 09:18:26.476275+00 It means that it provides historical trends of locks usage per lock type. It can be used for example for: 1. checking if "waits" is close to "requests". Then it means you usually have to wait before acquiring the lock, which means you may have a concurrency issue. 2. lock_timeout and deadlock_timeout tuning (lock_timeout is visible only in the logs if log_min_error_statement is set appropriately). 3. checking the "requests"/"fastpath" ratio to see if "max_locks_per_transaction" needs tuning (see c4d5cb71d2). If any points need more details, it might be a good idea to start sampling pg_locks. The patch is made of 2 sub-patches: 0001 - It adds a new stat kind PGSTAT_KIND_LOCK for the lock statistics. This new statistic kind is a fixed one because its key is the lock type so that we know its size is LOCKTAG_LAST_TYPE + 1. This statistic kind records the following counters: - requests: Number of requests for this lock type. - waits: Number of times requests for this lock type had to wait. - timeouts: Number of times requests for this lock type had to wait longer than lock_timeout. - deadlock_timeouts: Number of times requests for this lock type had to wait longer than deadlock_timeout. - deadlocks: Number of times a deadlock occurred on this lock type. - fastpath: Number of times this lock type was taken via fast path. No extra details is added (like the ones, i.e relation oid, database oid, we can find in pg_locks). The idea is to provide an idea on what the locking behaviour looks like. Those new counters are incremented outside of the wait events code path, as suggested in [1]. There are no major design choices, it relies on the current statistics machinery. 0002 - It adds the pg_stat_lock view It also adds documentation and some tests. Remarks: - maybe we could add some metrics related to the lock duration (we have some hints thanks to the timeout ounters though) - if this is merged, a next step could be to record those metrics per backend [1]: https://www.postgresql.org/message-id/CA%2BTgmobptuUWo7X5zcQrWKh22qeAn4eL%2B%3Dwtb8_ajCOR%2B7_tcw%40mail.gmail.com Looking forward to your feedback, Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Вложения
Great idea. +1. Here is a quick overall review to get things started.
Meta:
patch did not apply via "git apply". Also has carriage returns (e.g. DOS file), and some errant whitespace. Seems to pass pgindent, though.
Name:
I think the name would read better as pg_stat_locks, especially as it returns multiple rows.
Docs: seem good. Needs a section on how to reset via
SELECT pg_stat_reset_shared('lock');
Also plural better here ('locks')
Code:
+ * Copyright (c) 2021-2025, PostgreSQL Global Development Group
If a new file, we can simply say 2025?
+ LWLock locks[LOCKTAG_LAST_TYPE + 1];
+ PgStat_Lock stats;
Adding a lock to the system that counts locks! :) (just found amusing, not a comment)
-#define PGSTAT_KIND_SLRU 11
-#define PGSTAT_KIND_WAL 12
+#define PGSTAT_KIND_LOCK 11
+#define PGSTAT_KIND_SLRU 12
+#define PGSTAT_KIND_WAL 13
Why not just add LOCK as #13?
What about the overhead of maintaining this system? I know it's overall very lightweight, but from my testing, the relation locktype in particular is very, very busy. Busier than I realized until I saw it in this useful view!
Meta:
patch did not apply via "git apply". Also has carriage returns (e.g. DOS file), and some errant whitespace. Seems to pass pgindent, though.
Name:
I think the name would read better as pg_stat_locks, especially as it returns multiple rows.
Docs: seem good. Needs a section on how to reset via
SELECT pg_stat_reset_shared('lock');
Also plural better here ('locks')
Code:
+ * Copyright (c) 2021-2025, PostgreSQL Global Development Group
If a new file, we can simply say 2025?
+ LWLock locks[LOCKTAG_LAST_TYPE + 1];
+ PgStat_Lock stats;
Adding a lock to the system that counts locks! :) (just found amusing, not a comment)
-#define PGSTAT_KIND_SLRU 11
-#define PGSTAT_KIND_WAL 12
+#define PGSTAT_KIND_LOCK 11
+#define PGSTAT_KIND_SLRU 12
+#define PGSTAT_KIND_WAL 13
Why not just add LOCK as #13?
What about the overhead of maintaining this system? I know it's overall very lightweight, but from my testing, the relation locktype in particular is very, very busy. Busier than I realized until I saw it in this useful view!
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
On Mon, 2025-08-11 at 13:54 -0400, Greg Sabino Mullane wrote: > Great idea. +1. Here is a quick overall review to get things started. Can you describe your use case? I'd like to understand whether this is useful for users, hackers, or both. Regards, Jeff Davis
On Mon, Aug 11, 2025 at 02:53:58PM -0700, Jeff Davis wrote: > On Mon, 2025-08-11 at 13:54 -0400, Greg Sabino Mullane wrote: > > Great idea. +1. Here is a quick overall review to get things started. > > Can you describe your use case? I'd like to understand whether this is > useful for users, hackers, or both. This is a DBA feature, so the questions I'd ask myself are basically: - Is there any decision-making where these numbers would help? These decisions would shape in tweaking the configuration of the server or the application to as we move from a "bad" number trend to a "good" number trend. - What would be good numbers? In this case, most likely a threshold reached over a certain period of time. - Would these new stats overlap with similar statistics gathered in the system, creating duplication and bloat in the pgstats for no real gain? Looking at the patch, the data gathered is this, and I don't think that we have metrics gathered in the system to get an idea of the contention in this area, for timeouts and deadlocks: + PgStat_Counter requests; + PgStat_Counter waits; + PgStat_Counter timeouts; + PgStat_Counter deadlock_timeouts; + PgStat_Counter deadlocks; + PgStat_Counter fastpath; Isn't the "deadlock_timeout" one an overlap between timeout and deadlock? Having three counters to track two concepts seems strange to me. Regarding the fastpath locking, you'd want to optimize the fastpath to be close to the number of requests. If you had these numbers, one thing that I could see a DBA do is tune max_locks_per_transaction, which is what influences the per-backend limit of fast-path locks, with FastPathLockGroupsPerBackend. Using static counters for the pending data is going to be necessary when these are called in critical sections, which is I guess why you've done it this way, right? -- Michael
Вложения
Michael Paquier <michael@paquier.xyz> writes: > On Mon, Aug 11, 2025 at 02:53:58PM -0700, Jeff Davis wrote: >> Can you describe your use case? I'd like to understand whether this is >> useful for users, hackers, or both. > This is a DBA feature, so the questions I'd ask myself are basically: > - Is there any decision-making where these numbers would help? These > decisions would shape in tweaking the configuration of the server or > the application to as we move from a "bad" number trend to a "good" > number trend. > - What would be good numbers? In this case, most likely a threshold > reached over a certain period of time. > - Would these new stats overlap with similar statistics gathered in > the system, creating duplication and bloat in the pgstats for no real > gain? I'm also wondering why slicing the numbers in this particular way (i.e., aggregating by locktype) is a helpful way to look at the data. Maybe it's just what you want, but that's not obvious to me. regards, tom lane
Hi, On Mon, Aug 11, 2025 at 01:54:38PM -0400, Greg Sabino Mullane wrote: > Great idea. +1. Here is a quick overall review to get things started. Thanks for looking at it! > Meta: > patch did not apply via "git apply". Also has carriage returns (e.g. DOS > file), and some errant whitespace. Interesting, I'm not seeing those issues on my side when applying with "git am". > Name: > I think the name would read better as pg_stat_locks, especially as it > returns multiple rows. I considered pg_stat_locks, but chose the singular form to be consistent with pg_stat_database, pg_stat_subscription, and friends. > Docs: seem good. Needs a section on how to reset via > SELECT pg_stat_reset_shared('lock'); That's already included in v1: " @@ -5055,6 +5200,12 @@ description | Waiting for a newly initialized WAL file to reach durable storage <structname>pg_stat_io</structname> view. </para> </listitem> + <listitem> + <para> + <literal>lock</literal>: Reset all the counters shown in the + <structname>pg_stat_lock</structname> view. + </para> + </listitem> " Do you think that this needs any additions or changes? > Also plural better here ('locks') I think having pg_stat_<XXX> and XXX being the same option in " pg_stat_reset_shared()" makes sense. It's how it has been done so far, that's why I went with 'lock'. > Code: > > + * Copyright (c) 2021-2025, PostgreSQL Global Development Group > > If a new file, we can simply say 2025? I'm not sure that matters that much, see [1] and we can look at some files added in 2025 for examples: src/backend/storage/aio/aio_io.c src/backend/access/nbtree/nbtpreprocesskeys.c where 2025 is not "alone". > + LWLock locks[LOCKTAG_LAST_TYPE + 1]; > + PgStat_Lock stats; > > Adding a lock to the system that counts locks! :) (just found amusing, not > a comment) ;-) > -#define PGSTAT_KIND_SLRU 11 > -#define PGSTAT_KIND_WAL 12 > +#define PGSTAT_KIND_LOCK 11 > +#define PGSTAT_KIND_SLRU 12 > +#define PGSTAT_KIND_WAL 13 > > Why not just add LOCK as #13? Because it looks like that they are ordered by alphabetical order. > What about the overhead of maintaining this system? I know it's overall > very lightweight, but from my testing, the relation locktype in particular > is very, very busy. The counter increments do call a function generated that way: " #define PGSTAT_COUNT_LOCK_FUNC(stat) \ void \ CppConcat(pgstat_count_lock_,stat)(uint8 locktag_type) \ { \ Assert(locktag_type <= LOCKTAG_LAST_TYPE); \ PendingLockStats.stats[locktag_type].stat++; \ have_lockstats = true; \ pgstat_report_fixed = true; \ } " So, it's pretty lightweight as you said (given that PendingLockStats is not shared and just local to the backend that increments the counter). There could be some contention when the pending stats are flushed but that's the same for all the stats kinds. We could do better, though, and avoid the function calls by creating macros instead of functions. That would mean PendingLockStats to be visible to the outside world but: - that's not the direction that has been taken recently (for example, see the "kept here to avoid exposing PendingBackendStats to the outside world" comment in pgstat_backend.c). - I'm not sure that's worth for this particular case and code paths. That said, if you (and/or others) have strong concerns about performance penalties, I could study this area more in depth. > Busier than I realized until I saw it in this useful > view! Thanks! Did you observe some noticeable performance penalties? [1]: https://www.postgresql.org/message-id/202501211750.xf5j6thdkppy%40alvherre.pgsql Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Tue, Aug 12, 2025 at 08:44:58AM +0900, Michael Paquier wrote: > On Mon, Aug 11, 2025 at 02:53:58PM -0700, Jeff Davis wrote: > > On Mon, 2025-08-11 at 13:54 -0400, Greg Sabino Mullane wrote: > > > Great idea. +1. Here is a quick overall review to get things started. > > > > Can you describe your use case? I'd like to understand whether this is > > useful for users, hackers, or both. Thanks for looking at it! I provided a few examples in the initial email ([1]): " It can be used for example for: 1. checking if "waits" is close to "requests". Then it means you usually have to wait before acquiring the lock, which means you may have a concurrency issue. 2. lock_timeout and deadlock_timeout tuning (lock_timeout is visible only in the logs if log_min_error_statement is set appropriately). 3. checking the "requests"/"fastpath" ratio to see if "max_locks_per_transaction" needs tuning (see c4d5cb71d2). " Do these seem like useful use cases? > - Is there any decision-making where these numbers would help? These > decisions would shape in tweaking the configuration of the server or > the application to as we move from a "bad" number trend to a "good" > number trend. Right, I think that could help for lock_timeout and deadlock_timeout tuning. > - What would be good numbers? In this case, most likely a threshold > reached over a certain period of time. Also I think it's more a matter of ratio: waits/requests and requests/fastpath for example. > - Would these new stats overlap with similar statistics gathered in > the system, creating duplication and bloat in the pgstats for no real > gain? I don't think there is currently stats that overlap with those. > Looking at the patch, the data gathered is this, and I don't think > that we have metrics gathered in the system to get an idea of the > contention in this area, for timeouts and deadlocks: > + PgStat_Counter requests; > + PgStat_Counter waits; > + PgStat_Counter timeouts; > + PgStat_Counter deadlock_timeouts; > + PgStat_Counter deadlocks; > + PgStat_Counter fastpath; > > Isn't the "deadlock_timeout" one an overlap between timeout and > deadlock? I don't think so because: - deadlock_timeout and lock_timeout are 2 distincts GUCs - you could reach the deadlock_timeout without actually producing a deadlock > Regarding the fastpath locking, you'd want to optimize the fastpath to > be close to the number of requests. If you had these numbers, one > thing that I could see a DBA do is tune max_locks_per_transaction, > which is what influences the per-backend limit of fast-path locks, > with FastPathLockGroupsPerBackend. Exactly. > Using static counters for the pending data is going to be necessary > when these are called in critical sections, which is I guess why > you've done it this way, right? Yes and there is no need to over complicate this stuff as the max size is known: LOCKTAG_LAST_TYPE + 1. [1]: https://www.postgresql.org/message-id/aIyNxBWFCybgBZBS%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Mon, Aug 11, 2025 at 07:49:45PM -0400, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: > > On Mon, Aug 11, 2025 at 02:53:58PM -0700, Jeff Davis wrote: > >> Can you describe your use case? I'd like to understand whether this is > >> useful for users, hackers, or both. > > > This is a DBA feature, so the questions I'd ask myself are basically: > > - Is there any decision-making where these numbers would help? These > > decisions would shape in tweaking the configuration of the server or > > the application to as we move from a "bad" number trend to a "good" > > number trend. > > - What would be good numbers? In this case, most likely a threshold > > reached over a certain period of time. > > - Would these new stats overlap with similar statistics gathered in > > the system, creating duplication and bloat in the pgstats for no real > > gain? > > I'm also wondering why slicing the numbers in this particular way > (i.e., aggregating by locktype) is a helpful way to look at the data. > Maybe it's just what you want, but that's not obvious to me. Thanks for providing your thoughts! I thought it was more natural to aggregate by locktype because: - I think that matches how they are categorized in the doc (from a "wait event" point of view i.e "Wait Events of Type Lock"). - It provides a natural drill-down path, spot issues by locktype in the stats and then query pg_locks for specific objects when needed. Does that make sense to you? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Tue, Aug 12, 2025 at 5:32 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
I considered pg_stat_locks, but chose the singular form to be consistent with
pg_stat_database, pg_stat_subscription, and friends.
Counter-examples: pg_stat_statements, pg_stat_subscription_stats. Our names are not consistent. :) It just feels a better fit to have a plural name for a table tracking aggregates of multiple types of objects.
(Ignore my git complaint, was obviously half-asleep when I wrote that).
> Docs: seem good. Needs a section on how to reset via
> SELECT pg_stat_reset_shared('lock');
I meant something closer to the actual description of the view. Having it buried in the pg_stat_reset_shared section is not intuitive for people looking up the view in the docs.
Because it looks like that they are ordered by alphabetical order.
That makes sense.
- I'm not sure that's worth for this particular case and code paths.
Will let others opine on that.
Thanks! Did you observe some noticeable performance penalties?
No, but I did not give it any particularly heavy testing. More of a idle thought when pulling up a brand new system and seeing the thousands of locks for the tiniest bit of database action.
Cheers,
Greg
On Tue, 2025-08-12 at 09:37 +0000, Bertrand Drouvot wrote: > It can be used for example for: > > 1. checking if "waits" is close to "requests". Then it means you > usually have to > wait before acquiring the lock, which means you may have a > concurrency issue. > > 2. lock_timeout and deadlock_timeout tuning (lock_timeout is visible > only in the > logs if log_min_error_statement is set appropriately). > > 3. checking the "requests"/"fastpath" ratio to see if > "max_locks_per_transaction" > needs tuning (see c4d5cb71d2). > " > > Do these seem like useful use cases? Those seem plausibly useful, but I don't recall needing that exact information myself, and I wanted to hear more from others. For instance, a view could be helpful to diagnose concurrency issues, but I think that's worth discussing in more detail to see what kinds of issues it can help with and how it complements other approaches. I suspect when we get into the details, different people (or different situations) would want slightly different information out of that view. Regards, Jeff Davis
Hi, On Tue, Aug 12, 2025 at 08:54:07AM -0400, Greg Sabino Mullane wrote: > On Tue, Aug 12, 2025 at 5:32 AM Bertrand Drouvot < > bertranddrouvot.pg@gmail.com> wrote: > > > Docs: seem good. Needs a section on how to reset via > > > SELECT pg_stat_reset_shared('lock'); > > > > I meant something closer to the actual description of the view. Having it > buried in the pg_stat_reset_shared section is not intuitive for people > looking up the view in the docs. I agree that we could add this extra information in the view documentation. But none of pg_stat_archiver, pg_stat_bgwriter, pg_stat_checkpointer, pg_stat_io, pg_stat_slru and pg_stat_wal have done so. The only exception is pg_stat_recovery_prefetch and for a good reason as not all of its fields are reset. While I think that's probably a good idea, I think it's worth a dedicated thread (so that the discussion takes into account the other views mentioned above). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Tue, Aug 12, 2025 at 08:42:48AM -0700, Jeff Davis wrote: > On Tue, 2025-08-12 at 09:37 +0000, Bertrand Drouvot wrote: > > It can be used for example for: > > > > 1. checking if "waits" is close to "requests". Then it means you > > usually have to > > wait before acquiring the lock, which means you may have a > > concurrency issue. > > > > 2. lock_timeout and deadlock_timeout tuning (lock_timeout is visible > > only in the > > logs if log_min_error_statement is set appropriately). > > > > 3. checking the "requests"/"fastpath" ratio to see if > > "max_locks_per_transaction" > > needs tuning (see c4d5cb71d2). > > " > > > > Do these seem like useful use cases? > > Those seem plausibly useful, Thanks for sharing your thoughts about the use cases above. > and I wanted to hear more from others. > For instance, a view could be helpful to diagnose concurrency issues, > but I think that's worth discussing in more detail to see what kinds of > issues it can help with and how it complements other approaches. I > suspect when we get into the details, different people (or different > situations) would want slightly different information out of that view. Fully agree, getting input from others definitely helps validate the approach and see if the current design needs some changes. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com