Обсуждение: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations

Поиск
Список
Период
Сортировка

pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations

От
Benoit Tigeot
Дата:
Hello

This is my first patch that I wanted to submit since nearly 2 years.
It adds a new `calls_aborted` counter to `pg_stat_statements` that
tracks queries terminated due to statement timeouts, user
cancellations, or other errors during execution. Unlike the existing
calls counter which only increments on successful completion,
`calls_aborted` provides visibility into failed query attempts,
enabling better monitoring of query reliability and timeout issues
without needing to examine logs. The counter is incremented in the
`ExecutorRun` `PG_FINALLY` block to capture queries that start
execution but fail to complete.

I work mostly as a web developer in environments with sensitive data.
Leveraging `pg_stat_statements` for cancelled and timed-out queries
instead of logs could be very beneficial for developers who want quick
access to an overview of problematic queries without having access to
all logs and more sensitive data (slow queries, auto explain, etc).

Status: For discussion
Branch: master
Testing: Compiles successfully and passes all regression tests,
including new tests for this feature

Performance impact:
Minimal performance impact. Only adds:
- One additional int64 field to the counters structure
- A hash table lookup and counter increment in the `PG_FINALLY` block
(only for aborted queries)
- No impact on successful query execution paths

Implementation rationale:
The patch tracks aborted queries specifically in `ExecutorRun's
PG_FINALLY` block because:

1. Execution focus: Queries that fail during parsing/planning are
fundamentally different from queries that start executing but get
cancelled
2. Practical monitoring: Operations teams probably care most about
queries that consume resources (start execution) but fail to complete.
It follows my earlier need of more "dev oriented" columns.
3. Consistent semantics: Matches the existing calls counter which
tracks execution completions even if this counter is quite "alone" as
it is not linking to other incremented columns.

The approach only increments `calls_aborted` for existing entries
(doesn't create new ones, using parsed) since queries that would be
tracked are already identified during earlier query phases.

Anticipated questions and responses
(Thanks to Michael Paquier and Tomas Vondra for early discussion on
this patch 🙌)

A. Why not use PGSS_ABORT kind?

pgssStoreKind is designed for successful operations (plans,
executions) or storing "setups" with PGSS_INVALID. For me aborted
calls don't have meaningful timing/resource statistics to store in the
arrays, and it would require extending ALL the timing/buffer/WAL
arrays for incomplete operations.

B. Could we collect additional stats about aborted runs (buffer/WAL
usage, etc.)?

I chose not to do this because it would require modifying execMain.c
and dealing with interrupted queries to gather partial information. It
seems quite ambitious to do as a first patch. I see `calls_aborted`
more as a probe to identify problematic queries for further
investigation in logs or at the application level.

C. Why not track earlier phases?

I deliberately focused `calls_aborted` on executor-level failures
rather than earlier phases (parser/analyzer/rewriter/planner) because
they serve different operational purposes. Earlier phase failures are
typically development-time errors (syntax mistakes, missing tables,
type mismatches) that don't provide actionable operational insights.
Executor aborts represent runtime operational issues (query timeouts,
cancellations, resource exhaustion, lock conflicts, etc.) that
indicate performance degradation or capacity problems requiring
attention. This design keeps the feature focused on what matters for
production monitoring: distinguishing between queries that "worked
before but now fail operationally" versus "never worked due to code
bugs." The implementation is also cleaner, avoiding the complexity of
hooking multiple subsystems and classifying different error types but
of course I may be wrong. ;)

Previous discussions:
- https://www.postgresql.org/message-id/20171112223906.dqtl3kk3pd7vn6yc%40alap3.anarazel.de
- https://www.postgresql.org/message-id/4671.1510537167%40sss.pgh.pa.us
- https://wiki.postgresql.org/wiki/PGConf.dev_2025_Developer_Unconference
"Could we count up timeouts?"

Thanks
__________
Benoit Tigeot
benoit.tigeot@gmail.com

Вложения

Re: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations

От
Michael Paquier
Дата:
On Tue, Aug 12, 2025 at 04:54:10PM +0200, Benoit Tigeot wrote:
> I deliberately focused `calls_aborted` on executor-level failures
> rather than earlier phases (parser/analyzer/rewriter/planner) because
> they serve different operational purposes. Earlier phase failures are
> typically development-time errors (syntax mistakes, missing tables,
> type mismatches) that don't provide actionable operational insights.
> Executor aborts represent runtime operational issues (query timeouts,
> cancellations, resource exhaustion, lock conflicts, etc.) that
> indicate performance degradation or capacity problems requiring
> attention. This design keeps the feature focused on what matters for
> production monitoring: distinguishing between queries that "worked
> before but now fail operationally" versus "never worked due to code
> bugs." The implementation is also cleaner, avoiding the complexity of
> hooking multiple subsystems and classifying different error types but
> of course I may be wrong. ;)

That seems kind of limited to me in scope.  The executor is only one
part of the system.  I would have considered using an xact callback
when a transaction is aborted if I were to do a patch like the one you
are proposing, to know how many times a transaction is failing at a
specific phase, because you should know the latest query_id in this
case to be able to put a counter update in the correct slot (right?).

+-- aborted calls tracking
+SELECT pg_sleep(0.5);
+ pg_sleep
+----------
+
+(1 row)

Using hardcoded sleep times for deterministic tests is never a good
idea.  On fast machines, they eat time for nothing.  And if not
written correctly, they may not achieve their goal on slow machines
because the sleep threshold may be reached before the custom action is
taken.  If you want to force a failure, you should just use a SQL that
you know would fail at execution time (based on your implementation
expects).
--
Michael

Вложения

Re: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations

От
Benoit Tigeot
Дата:
Thanks Michael

Thu, Aug 14, 2025 at 10:18 AM, Michael Paquier <michael@paquier.xyz> wrote:
>
> That seems kind of limited to me in scope.  The executor is only one
> part of the system.  I would have considered using an xact callback
> when a transaction is aborted if I were to do a patch like the one you
> are proposing, to know how many times a transaction is failing at a
> specific phase, because you should know the latest query_id in this
> case to be able to put a counter update in the correct slot (right?).

I will make a v5 patch with this approach. It should also address your
second comment about using `pg_sleep` in the test. I used this
approach because it sounds that it's the closest to a real timeout
issue in production, more like integration testing. I will try to
change that with the new patch.
__________
Benoit Tigeot



Re: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations

От
Julien Rouhaud
Дата:
On Thu, Aug 14, 2025 at 11:04:27AM +0200, Benoit Tigeot wrote:
> Thanks Michael
>
> Thu, Aug 14, 2025 at 10:18 AM, Michael Paquier <michael@paquier.xyz> wrote:
> >
> > That seems kind of limited to me in scope.  The executor is only one
> > part of the system.  I would have considered using an xact callback
> > when a transaction is aborted if I were to do a patch like the one you
> > are proposing, to know how many times a transaction is failing at a
> > specific phase, because you should know the latest query_id in this
> > case to be able to put a counter update in the correct slot (right?).
>
> I will make a v5 patch with this approach.

I'm wondering how useful that counter alone will be.  Since
pg_stat_statements is already quite large, wouldn't it be better to have
another extension that tracks something like (queryid, sqlstate), with maybe
the number of errors, the oldest and the newest timestamp or something like
that?  You would have way more information to diagnose various problems, and
can optionally use pg_stat_statements if you need the query text from there (vs
eg. the logs).




> On 14 Aug 2025, at 10:18, Michael Paquier <michael@paquier.xyz> wrote:
>
> That seems kind of limited to me in scope.  The executor is only one
> part of the system.  I would have considered using an xact callback
> when a transaction is aborted if I were to do a patch like the one you
> are proposing, to know how many times a transaction is failing at a
> specific phase, because you should know the latest query_id in this
> case to be able to put a counter update in the correct slot (right?).
>
> +-- aborted calls tracking
> +SELECT pg_sleep(0.5);
> + pg_sleep
> +----------
> +
> +(1 row)
>
> Using hardcoded sleep times for deterministic tests is never a good
> idea.  On fast machines, they eat time for nothing.  And if not
> written correctly, they may not achieve their goal on slow machines
> because the sleep threshold may be reached before the custom action is
> taken.  If you want to force a failure, you should just use a SQL that
> you know would fail at execution time (based on your implementation
> expects).

Hello,

Thanks for the review.

On scope and callbacks
- I prototyped a version with RegisterXactCallback and RegisterSubXactCallback but I’m not very happy with the result.
Samelimitation as v4: we only have a stable identity (queryId) after parse/analyze. The xact/subxact callbacks fire
withno statement context, so you still need to know “which statement was active” at abort time. 
- To deal with that, I keep a small backend-local tracker of the current statement’s key (userid, dbid, queryId,
toplevel)when we enter the executor, mark it completed on success, and bump calls_aborted in the callbacks only if
active&& !completed. 
- This reliably captures executor-time failures and timeouts. It cannot attribute errors that happen before post-parse
(syntax/nameresolution), because queryId isn’t set yet. That’s the hard boundary. Planner failures aren’t attributed in
thisversion because the tracker is set in ExecutorRun..still. 

On the value of callbacks
- It seems callbacks alone are insufficient (no per-statement context).
- If we want to preserve pgss semantics (only report statements that reached post-parse and have a queryId), the v4
approachis simpler and safer; I could look into planner error handling in a next patch version. 
- If we also want to count very-early errors, that implies raw-text keyed entries (no queryId), which changes
pg_stat_statements’semantics (normalization, PII risk, cardinality). That probably should be separate extension as
JulienRouhaud suggested: https://www.postgresql.org/message-id/aJ2ov4KYM2vX4uAs%40jrouhaud. Not sure I would be able to
makethis, also I need to have something that will be supported by managed PostgreSQL with extension restrictions. 

Happy to adjust if you see a better way to thread “latest queryId” into the abort path without the local tracker.

On pre-creating entries earlier
- I considered creating entries from raw text to count very-early errors, but I’ve avoided it: pgss is keyed on the
normalizedparse tree (queryId). Pre-hashing raw text weakens normalization, risks storing sensitive literals before
jumbling,and adds lock churn in hot paths. 

Tests
- Agreed on sleep flakiness. I switched to deterministic executor-time failures: duplicate key (primary key violation)
anda check constraint violation. Both trip calls_aborted without timing assumptions. 

Also, are we okay tracking statements that never reach post-parse (i.e., no queryId)? That would effectively key on raw
text,which changes pg_stat_statements’ semantics. 

Benoit


Вложения

Re: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations

От
Sami Imseih
Дата:
Thanks for starting this thread! I do think it will be valuable to have
an idea of how many calls were aborted. But I am not so sure about
pg_stat_statements being the correct vehicle for this information.

The pg_stat_statements view deals with stats the are updated when a statement
completes execution, so I think it is quite confusing to add this metric to
pg_stat_statements.

> I'm wondering how useful that counter alone will be.  Since
> pg_stat_statements is already quite large,

I agree, but I think it's time to start thinking about splitting
pg_stat_statements into multiple functions. This was recently
discussed [0]

> wouldn't it be better to have
> another extension that tracks something like (queryid, sqlstate),

I think this can be a part of  the pg_stat_statements extension, but as
a separate view, i.e. pg_stat_statements_aborted ( or something ) which
tracks this info.

> > part of the system.  I would have considered using an xact callback
> > when a transaction is aborted if I were to do a patch like the one you
> > are proposing, to know how many times a transaction is failing at a
> > specific phase,

my 2c is the implementation for such statistics, we need to count
when execution starts and when execution ends. The difference in those
2 values give us the aborted calls count.

When we rely on the transaction callbacks, we assume we reach that
hook ( which will occur during normal transaction cleanup ), but I am thinking
of other cases, which are quite common, in which the application times out
and causes the connection to drop. By counting the ExecutorStart and
ExecutorEnd,
as described above, we can cover those cases also.

Unfortunately, I think this will not be very performant to do the way
I describe it
above with the way pg_s_s currently works. As it means we will need to
take an additional spinlock at executor start to count this.

I do think that if we move pg_s_s to using pluggable stats [1], we
have an opportunity
to do something like this.

( FWIW, I also think it will be good to have a calls_stated and
called_completed counter at a higher
level also, as in pg_stat_database. )

[0] https://www.postgresql.org/message-id/aHdAQskQCjGWOdfi%40paquier.xyz
[1] https://www.postgresql.org/message-id/aKF0V-T8-XAxj47T%40paquier.xyz


--
Sami



Re: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations

От
Michael Paquier
Дата:
On Mon, Aug 18, 2025 at 03:10:42PM -0500, Sami Imseih wrote:
> I agree, but I think it's time to start thinking about splitting
> pg_stat_statements into multiple functions. This was recently
> discussed [0]

Yes, no idea for other people, but I'm putting a stop to new additions
until we've split the SQL facing interface a bit, removing some bloat
from the main pgss view.  So says the guy who has just bumped pgss to
1.13 a couple of weeks ago, so it sounds a bit metaphoric.
--
Michael

Вложения

Re: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations

От
Andres Freund
Дата:
Hi,

On 2025-08-19 09:20:23 +0900, Michael Paquier wrote:
> On Mon, Aug 18, 2025 at 03:10:42PM -0500, Sami Imseih wrote:
> > I agree, but I think it's time to start thinking about splitting
> > pg_stat_statements into multiple functions. This was recently
> > discussed [0]
> 
> Yes, no idea for other people, but I'm putting a stop to new additions
> until we've split the SQL facing interface a bit, removing some bloat
> from the main pgss view.  So says the guy who has just bumped pgss to
> 1.13 a couple of weeks ago, so it sounds a bit metaphoric.

I think the problem isn't mainly the SQL view, it's that all the additions
made pg_stat_statements have so much overhead that it's practically unusable
for any busy workload. It used to be only an issue on really big servers, but
there's been so much crap stuffed into pgss that it's trivially reproducible
on a laptop.

I think it's pretty insane to do things like variance computation while
holding a spinlock, for every friggin query execution. The spinlock'ed section
is ~185 instructions for me, with plenty high-latency instructions like
divisions.

It's so slow that it has measurable impact for single threaded readonly
pgbench. Which does friggin btree lookups.

Greetings,

Andres Freund




> On 19 Aug 2025, at 15:46, Andres Freund <andres@anarazel.de> wrote:
>
> I think it's pretty insane to do things like variance computation while
> holding a spinlock, for every friggin query execution. The spinlock'ed section
> is ~185 instructions for me, with plenty high-latency instructions like
> divisions.
>
> It's so slow that it has measurable impact for single threaded readonly
> pgbench. Which does friggin btree lookups.

Do you think it could be an option to first add regression benchmarks,
and then consider migrating the most time-consuming columns
to another extension?

Not sure moving results out of pgss could be an option.

---
Benoit