Обсуждение: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations
pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations
От
Benoit Tigeot
Дата:
Hello This is my first patch that I wanted to submit since nearly 2 years. It adds a new `calls_aborted` counter to `pg_stat_statements` that tracks queries terminated due to statement timeouts, user cancellations, or other errors during execution. Unlike the existing calls counter which only increments on successful completion, `calls_aborted` provides visibility into failed query attempts, enabling better monitoring of query reliability and timeout issues without needing to examine logs. The counter is incremented in the `ExecutorRun` `PG_FINALLY` block to capture queries that start execution but fail to complete. I work mostly as a web developer in environments with sensitive data. Leveraging `pg_stat_statements` for cancelled and timed-out queries instead of logs could be very beneficial for developers who want quick access to an overview of problematic queries without having access to all logs and more sensitive data (slow queries, auto explain, etc). Status: For discussion Branch: master Testing: Compiles successfully and passes all regression tests, including new tests for this feature Performance impact: Minimal performance impact. Only adds: - One additional int64 field to the counters structure - A hash table lookup and counter increment in the `PG_FINALLY` block (only for aborted queries) - No impact on successful query execution paths Implementation rationale: The patch tracks aborted queries specifically in `ExecutorRun's PG_FINALLY` block because: 1. Execution focus: Queries that fail during parsing/planning are fundamentally different from queries that start executing but get cancelled 2. Practical monitoring: Operations teams probably care most about queries that consume resources (start execution) but fail to complete. It follows my earlier need of more "dev oriented" columns. 3. Consistent semantics: Matches the existing calls counter which tracks execution completions even if this counter is quite "alone" as it is not linking to other incremented columns. The approach only increments `calls_aborted` for existing entries (doesn't create new ones, using parsed) since queries that would be tracked are already identified during earlier query phases. Anticipated questions and responses (Thanks to Michael Paquier and Tomas Vondra for early discussion on this patch 🙌) A. Why not use PGSS_ABORT kind? pgssStoreKind is designed for successful operations (plans, executions) or storing "setups" with PGSS_INVALID. For me aborted calls don't have meaningful timing/resource statistics to store in the arrays, and it would require extending ALL the timing/buffer/WAL arrays for incomplete operations. B. Could we collect additional stats about aborted runs (buffer/WAL usage, etc.)? I chose not to do this because it would require modifying execMain.c and dealing with interrupted queries to gather partial information. It seems quite ambitious to do as a first patch. I see `calls_aborted` more as a probe to identify problematic queries for further investigation in logs or at the application level. C. Why not track earlier phases? I deliberately focused `calls_aborted` on executor-level failures rather than earlier phases (parser/analyzer/rewriter/planner) because they serve different operational purposes. Earlier phase failures are typically development-time errors (syntax mistakes, missing tables, type mismatches) that don't provide actionable operational insights. Executor aborts represent runtime operational issues (query timeouts, cancellations, resource exhaustion, lock conflicts, etc.) that indicate performance degradation or capacity problems requiring attention. This design keeps the feature focused on what matters for production monitoring: distinguishing between queries that "worked before but now fail operationally" versus "never worked due to code bugs." The implementation is also cleaner, avoiding the complexity of hooking multiple subsystems and classifying different error types but of course I may be wrong. ;) Previous discussions: - https://www.postgresql.org/message-id/20171112223906.dqtl3kk3pd7vn6yc%40alap3.anarazel.de - https://www.postgresql.org/message-id/4671.1510537167%40sss.pgh.pa.us - https://wiki.postgresql.org/wiki/PGConf.dev_2025_Developer_Unconference "Could we count up timeouts?" Thanks __________ Benoit Tigeot benoit.tigeot@gmail.com
Вложения
Re: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations
От
Michael Paquier
Дата:
On Tue, Aug 12, 2025 at 04:54:10PM +0200, Benoit Tigeot wrote: > I deliberately focused `calls_aborted` on executor-level failures > rather than earlier phases (parser/analyzer/rewriter/planner) because > they serve different operational purposes. Earlier phase failures are > typically development-time errors (syntax mistakes, missing tables, > type mismatches) that don't provide actionable operational insights. > Executor aborts represent runtime operational issues (query timeouts, > cancellations, resource exhaustion, lock conflicts, etc.) that > indicate performance degradation or capacity problems requiring > attention. This design keeps the feature focused on what matters for > production monitoring: distinguishing between queries that "worked > before but now fail operationally" versus "never worked due to code > bugs." The implementation is also cleaner, avoiding the complexity of > hooking multiple subsystems and classifying different error types but > of course I may be wrong. ;) That seems kind of limited to me in scope. The executor is only one part of the system. I would have considered using an xact callback when a transaction is aborted if I were to do a patch like the one you are proposing, to know how many times a transaction is failing at a specific phase, because you should know the latest query_id in this case to be able to put a counter update in the correct slot (right?). +-- aborted calls tracking +SELECT pg_sleep(0.5); + pg_sleep +---------- + +(1 row) Using hardcoded sleep times for deterministic tests is never a good idea. On fast machines, they eat time for nothing. And if not written correctly, they may not achieve their goal on slow machines because the sleep threshold may be reached before the custom action is taken. If you want to force a failure, you should just use a SQL that you know would fail at execution time (based on your implementation expects). -- Michael
Вложения
Re: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations
От
Benoit Tigeot
Дата:
Thanks Michael Thu, Aug 14, 2025 at 10:18 AM, Michael Paquier <michael@paquier.xyz> wrote: > > That seems kind of limited to me in scope. The executor is only one > part of the system. I would have considered using an xact callback > when a transaction is aborted if I were to do a patch like the one you > are proposing, to know how many times a transaction is failing at a > specific phase, because you should know the latest query_id in this > case to be able to put a counter update in the correct slot (right?). I will make a v5 patch with this approach. It should also address your second comment about using `pg_sleep` in the test. I used this approach because it sounds that it's the closest to a real timeout issue in production, more like integration testing. I will try to change that with the new patch. __________ Benoit Tigeot
Re: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations
От
Julien Rouhaud
Дата:
On Thu, Aug 14, 2025 at 11:04:27AM +0200, Benoit Tigeot wrote: > Thanks Michael > > Thu, Aug 14, 2025 at 10:18 AM, Michael Paquier <michael@paquier.xyz> wrote: > > > > That seems kind of limited to me in scope. The executor is only one > > part of the system. I would have considered using an xact callback > > when a transaction is aborted if I were to do a patch like the one you > > are proposing, to know how many times a transaction is failing at a > > specific phase, because you should know the latest query_id in this > > case to be able to put a counter update in the correct slot (right?). > > I will make a v5 patch with this approach. I'm wondering how useful that counter alone will be. Since pg_stat_statements is already quite large, wouldn't it be better to have another extension that tracks something like (queryid, sqlstate), with maybe the number of errors, the oldest and the newest timestamp or something like that? You would have way more information to diagnose various problems, and can optionally use pg_stat_statements if you need the query text from there (vs eg. the logs).
> On 14 Aug 2025, at 10:18, Michael Paquier <michael@paquier.xyz> wrote: > > That seems kind of limited to me in scope. The executor is only one > part of the system. I would have considered using an xact callback > when a transaction is aborted if I were to do a patch like the one you > are proposing, to know how many times a transaction is failing at a > specific phase, because you should know the latest query_id in this > case to be able to put a counter update in the correct slot (right?). > > +-- aborted calls tracking > +SELECT pg_sleep(0.5); > + pg_sleep > +---------- > + > +(1 row) > > Using hardcoded sleep times for deterministic tests is never a good > idea. On fast machines, they eat time for nothing. And if not > written correctly, they may not achieve their goal on slow machines > because the sleep threshold may be reached before the custom action is > taken. If you want to force a failure, you should just use a SQL that > you know would fail at execution time (based on your implementation > expects). Hello, Thanks for the review. On scope and callbacks - I prototyped a version with RegisterXactCallback and RegisterSubXactCallback but I’m not very happy with the result. Samelimitation as v4: we only have a stable identity (queryId) after parse/analyze. The xact/subxact callbacks fire withno statement context, so you still need to know “which statement was active” at abort time. - To deal with that, I keep a small backend-local tracker of the current statement’s key (userid, dbid, queryId, toplevel)when we enter the executor, mark it completed on success, and bump calls_aborted in the callbacks only if active&& !completed. - This reliably captures executor-time failures and timeouts. It cannot attribute errors that happen before post-parse (syntax/nameresolution), because queryId isn’t set yet. That’s the hard boundary. Planner failures aren’t attributed in thisversion because the tracker is set in ExecutorRun..still. On the value of callbacks - It seems callbacks alone are insufficient (no per-statement context). - If we want to preserve pgss semantics (only report statements that reached post-parse and have a queryId), the v4 approachis simpler and safer; I could look into planner error handling in a next patch version. - If we also want to count very-early errors, that implies raw-text keyed entries (no queryId), which changes pg_stat_statements’semantics (normalization, PII risk, cardinality). That probably should be separate extension as JulienRouhaud suggested: https://www.postgresql.org/message-id/aJ2ov4KYM2vX4uAs%40jrouhaud. Not sure I would be able to makethis, also I need to have something that will be supported by managed PostgreSQL with extension restrictions. Happy to adjust if you see a better way to thread “latest queryId” into the abort path without the local tracker. On pre-creating entries earlier - I considered creating entries from raw text to count very-early errors, but I’ve avoided it: pgss is keyed on the normalizedparse tree (queryId). Pre-hashing raw text weakens normalization, risks storing sensitive literals before jumbling,and adds lock churn in hot paths. Tests - Agreed on sleep flakiness. I switched to deterministic executor-time failures: duplicate key (primary key violation) anda check constraint violation. Both trip calls_aborted without timing assumptions. Also, are we okay tracking statements that never reach post-parse (i.e., no queryId)? That would effectively key on raw text,which changes pg_stat_statements’ semantics. Benoit
Вложения
Re: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations
От
Sami Imseih
Дата:
Thanks for starting this thread! I do think it will be valuable to have an idea of how many calls were aborted. But I am not so sure about pg_stat_statements being the correct vehicle for this information. The pg_stat_statements view deals with stats the are updated when a statement completes execution, so I think it is quite confusing to add this metric to pg_stat_statements. > I'm wondering how useful that counter alone will be. Since > pg_stat_statements is already quite large, I agree, but I think it's time to start thinking about splitting pg_stat_statements into multiple functions. This was recently discussed [0] > wouldn't it be better to have > another extension that tracks something like (queryid, sqlstate), I think this can be a part of the pg_stat_statements extension, but as a separate view, i.e. pg_stat_statements_aborted ( or something ) which tracks this info. > > part of the system. I would have considered using an xact callback > > when a transaction is aborted if I were to do a patch like the one you > > are proposing, to know how many times a transaction is failing at a > > specific phase, my 2c is the implementation for such statistics, we need to count when execution starts and when execution ends. The difference in those 2 values give us the aborted calls count. When we rely on the transaction callbacks, we assume we reach that hook ( which will occur during normal transaction cleanup ), but I am thinking of other cases, which are quite common, in which the application times out and causes the connection to drop. By counting the ExecutorStart and ExecutorEnd, as described above, we can cover those cases also. Unfortunately, I think this will not be very performant to do the way I describe it above with the way pg_s_s currently works. As it means we will need to take an additional spinlock at executor start to count this. I do think that if we move pg_s_s to using pluggable stats [1], we have an opportunity to do something like this. ( FWIW, I also think it will be good to have a calls_stated and called_completed counter at a higher level also, as in pg_stat_database. ) [0] https://www.postgresql.org/message-id/aHdAQskQCjGWOdfi%40paquier.xyz [1] https://www.postgresql.org/message-id/aKF0V-T8-XAxj47T%40paquier.xyz -- Sami
Re: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations
От
Michael Paquier
Дата:
On Mon, Aug 18, 2025 at 03:10:42PM -0500, Sami Imseih wrote: > I agree, but I think it's time to start thinking about splitting > pg_stat_statements into multiple functions. This was recently > discussed [0] Yes, no idea for other people, but I'm putting a stop to new additions until we've split the SQL facing interface a bit, removing some bloat from the main pgss view. So says the guy who has just bumped pgss to 1.13 a couple of weeks ago, so it sounds a bit metaphoric. -- Michael
Вложения
Re: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations
От
Andres Freund
Дата:
Hi, On 2025-08-19 09:20:23 +0900, Michael Paquier wrote: > On Mon, Aug 18, 2025 at 03:10:42PM -0500, Sami Imseih wrote: > > I agree, but I think it's time to start thinking about splitting > > pg_stat_statements into multiple functions. This was recently > > discussed [0] > > Yes, no idea for other people, but I'm putting a stop to new additions > until we've split the SQL facing interface a bit, removing some bloat > from the main pgss view. So says the guy who has just bumped pgss to > 1.13 a couple of weeks ago, so it sounds a bit metaphoric. I think the problem isn't mainly the SQL view, it's that all the additions made pg_stat_statements have so much overhead that it's practically unusable for any busy workload. It used to be only an issue on really big servers, but there's been so much crap stuffed into pgss that it's trivially reproducible on a laptop. I think it's pretty insane to do things like variance computation while holding a spinlock, for every friggin query execution. The spinlock'ed section is ~185 instructions for me, with plenty high-latency instructions like divisions. It's so slow that it has measurable impact for single threaded readonly pgbench. Which does friggin btree lookups. Greetings, Andres Freund
> On 19 Aug 2025, at 15:46, Andres Freund <andres@anarazel.de> wrote: > > I think it's pretty insane to do things like variance computation while > holding a spinlock, for every friggin query execution. The spinlock'ed section > is ~185 instructions for me, with plenty high-latency instructions like > divisions. > > It's so slow that it has measurable impact for single threaded readonly > pgbench. Which does friggin btree lookups. Do you think it could be an option to first add regression benchmarks, and then consider migrating the most time-consuming columns to another extension? Not sure moving results out of pgss could be an option. --- Benoit