Обсуждение: Proposal: Extending the PostgreSQL Protocol with Command Metadata

Поиск
Список
Период
Сортировка

Proposal: Extending the PostgreSQL Protocol with Command Metadata

От
Kir Shatrov
Дата:
Hi hackers,

I'd like to propose a backward-compatible extension to the PostgreSQL Frontend/Backend Protocol that would allow servers to send command metadata back to clients alongside query results.

## Background & Motivation

Coming from a MySQL background, I've found PostgreSQL's lack of session state tracking capabilities to be a significant gap. MySQL's session state tracking allows servers to piggyback metadata on query responses -- things like CPU/IO time consumed, server utilization metrics, tenant quotas, etc.

This capability enables powerful client-side patterns:
- Immediate backoff based on server utilization metadata
- Real-time tenant quota enforcement without separate polling systems
- Query performance tracking without additional round-trips
- Custom application metrics embedded in the protocol flow

## High-Level Proposal

The extension consists of four key components:

1. Client opt-in via a new parameter (`_pq_.send_command_metadata='on'`) on StartupMessage
2. Server-side metadata injection through C API or a SQL function as a demo (`pg_add_command_metadata(key, value)`)
3. New protocol message (`CommandMetadata`, type 'm') sent before `CommandComplete`
4. Client API extensions in libpq to consume the metadata

The goal is to keep backward compatibility while allowing flexibility of the interface, and zero additional cost if you have not opted in.

## Unlocking Greater Extensions

Extensions leveraging this metadata protocol could allow to build things with very little overhead that are not possible today:

- Multi-tenant applications tracking per-query resource consumption (buffer hits/misses, CPU/IO time)
- Connection poolers making routing decisions based on server load
- Applications implementing custom rate limiting based on DB metrics
- Performance monitoring without separate instrumentation queries
- Distributed tracing integration at the protocol level

## Why not session variables?

You could argue this could be achieved by reading session variables after each query, but that quickly becomes not practical: leveraging session variables breaks multiplexing (e.g. PgBouncer) and hitting the server with yet another query when it's already busy just increases the number of incoming queries that it needs to handle.

## Asking for feedback

Is there interest in this type of protocol extension?
What concerns do you have about protocol extensions in general?
Would you prefer a different approach (e.g., separate protocol messages vs. extending existing ones)?

I have a complete technical specification with detailed protocol message formats and API designs on Github [0].

Looking forward to your feedback and suggestions. Thanks!

Regards
Kir

[0] https://github.com/kirs/notes-public/blob/main/postgres-command-metadata.md

Re: Proposal: Extending the PostgreSQL Protocol with Command Metadata

От
Aleksander Alekseev
Дата:
Hi,

> I'd like to propose a backward-compatible extension to the PostgreSQL Frontend/Backend Protocol that would allow
serversto send command metadata back to clients alongside query results. 
>
> ## Background & Motivation
>
> Coming from a MySQL background, I've found PostgreSQL's lack of session state tracking capabilities to be a
significantgap. MySQL's session state tracking allows servers to piggyback metadata on query responses -- things like
CPU/IOtime consumed, server utilization metrics, tenant quotas, etc. 
>
> This capability enables powerful client-side patterns:
> - Immediate backoff based on server utilization metadata
> - Real-time tenant quota enforcement without separate polling systems
> - Query performance tracking without additional round-trips
> - Custom application metrics embedded in the protocol flow
>
> ## High-Level Proposal
>
> The extension consists of four key components:
>
> 1. Client opt-in via a new parameter (`_pq_.send_command_metadata='on'`) on StartupMessage
> 2. Server-side metadata injection through C API or a SQL function as a demo (`pg_add_command_metadata(key, value)`)
> 3. New protocol message (`CommandMetadata`, type 'm') sent before `CommandComplete`
> 4. Client API extensions in libpq to consume the metadata
>
> The goal is to keep backward compatibility while allowing flexibility of the interface, and zero additional cost if
youhave not opted in. 
>
> ## Unlocking Greater Extensions
>
> Extensions leveraging this metadata protocol could allow to build things with very little overhead that are not
possibletoday: 
>
> - Multi-tenant applications tracking per-query resource consumption (buffer hits/misses, CPU/IO time)
> - Connection poolers making routing decisions based on server load
> - Applications implementing custom rate limiting based on DB metrics
> - Performance monitoring without separate instrumentation queries
> - Distributed tracing integration at the protocol level
>
> ## Why not session variables?
>
> You could argue this could be achieved by reading session variables after each query, but that quickly becomes not
practical:leveraging session variables breaks multiplexing (e.g. PgBouncer) and hitting the server with yet another
querywhen it's already busy just increases the number of incoming queries that it needs to handle. 
>
> ## Asking for feedback
>
> Is there interest in this type of protocol extension?
> What concerns do you have about protocol extensions in general?
> Would you prefer a different approach (e.g., separate protocol messages vs. extending existing ones)?
>
> I have a complete technical specification with detailed protocol message formats and API designs on Github [0].
>
> Looking forward to your feedback and suggestions. Thanks!

Here are my two cents.

I'm neither for nor against the idea. However I can imagine how it
will unnecessarily increase our code complexity while the benefits
seem questionable, to me at least. On top of that it's probably worth
noting that the PostgreSQL protocol itself is an industry standard
de-facto used not only by PostgreSQL. If we could avoid changing it
(much) we better do so.

IMO our typical approach would be to make the core extendable enough
to make it possible to implement what you want as an extension and
also document the corresponding API and cover it with tests properly.
Perhaps we could include the corresponding extension to /contrib/, as
a reference implementation.

The advantage of this approach is that it decomposes the task into
separate steps each of which can be discussed, implemented, reviewed
and tested separately.

--
Best regards,
Aleksander Alekseev



Re: Proposal: Extending the PostgreSQL Protocol with Command Metadata

От
Aleksander Alekseev
Дата:
Hi,

> - Multi-tenant applications tracking per-query resource consumption (buffer hits/misses, CPU/IO time)

BTW speaking of buffer hits/misses, you can already do this with
EXPLAIN ANALYZE.

-- 
Best regards,
Aleksander Alekseev



Re: Proposal: Extending the PostgreSQL Protocol with Command Metadata

От
Jacob Champion
Дата:
On Sun, Aug 17, 2025 at 5:25 PM Kir Shatrov <sigkirs@gmail.com> wrote:
> Is there interest in this type of protocol extension?

There have been previous requests for baking custom trace IDs and WAL
information into the protocol; the wiki is down right now but I recall
some discussion at maybe the 2024 unconference?

> What concerns do you have about protocol extensions in general?
> Would you prefer a different approach (e.g., separate protocol messages vs. extending existing ones)?

FYI, the community is in the process of breaking up logjams in the
protocol definition. There is some not-insignificant disagreement on
how protocol extensions should be built and maintained, but there is
also recent progress on that front, so I just want to prep you for
that. :D

I haven't looked into the proposal in great detail, but one thing that
caught my eye was (what I consider to be) a layering violation, where
the client advertises its interest in metadata at the protocol level
and then specifies the messages at the SQL level. I know the SQL
function you provided is for illustration and test purposes, but it
still highlights the question of "how do we decide what's interesting
(and to whom)?"

I'd imagine that question will be a little bit difficult for
intermediaries such as pgbouncer to navigate, especially if the client
and proxy have different-but-overlapping interests. (In fact I wonder
if this is going to once again poke at the lack of end-to-end vs
hop-by-hop protocol extensions.) Unless the plan is to push everything
that could possibly be interesting into every response?

Thanks!
--Jacob



Re: Proposal: Extending the PostgreSQL Protocol with Command Metadata

От
Jacob Champion
Дата:
On Mon, Aug 18, 2025 at 9:05 AM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
> There have been previous requests for baking custom trace IDs and WAL
> information into the protocol; the wiki is down right now but I recall
> some discussion at maybe the 2024 unconference?

Wiki's back up:
https://wiki.postgresql.org/wiki/PGConf.dev_2024_Developer_Unconference#Protocol_Enhancements

--Jacob



Re: Proposal: Extending the PostgreSQL Protocol with Command Metadata

От
Andy Fan
Дата:
Kir Shatrov <sigkirs@gmail.com> writes:

Hi,

> ## Unlocking Greater Extensions
>
> Extensions leveraging this metadata protocol could allow to build things with very little overhead that are not
possible
> today:
>
> - Multi-tenant applications tracking per-query resource consumption (buffer hits/misses, CPU/IO time)
> - Connection poolers making routing decisions based on server load
> - Performance monitoring without separate instrumentation queries
> - Applications implementing custom rate limiting based on DB metrics
> - Distributed tracing integration at the protocol level
>
> ## Why not session variables?
>
> You could argue this could be achieved by reading session variables after each query, but that quickly becomes not
> practical: leveraging session variables breaks multiplexing (e.g. PgBouncer) and hitting the server with yet another
> query when it's already busy just increases the number of incoming queries that it needs to handle.

Does pg_stat_statements [1] work for you? I think some differents
between pg_stat_statements and command level metrics are: 

1. pg_stat_statements is a aggreated data and command level metrics is a
raw data for each statement.
2. the entries in pg_stat_statments in limitted by
pg_stat_statements.max. 

But IMO we need some analysis or aggregated data on these metrics rather
than use the raw data,  Could you explain more how to use these command
level metrics to make decision?

[1]
https://www.postgresql.org/docs/current/pgstatstatements.html#PGSTATSTATEMENTS-PG-STAT-STATEMENTS

-- 
Best Regards
Andy Fan




Re: Proposal: Extending the PostgreSQL Protocol with Command Metadata

От
Julien Rouhaud
Дата:
Hi,

FWIW I'm +1 on the feature idea, I think it would really allow way better
metrics processing.

On Tue, Aug 19, 2025 at 07:55:10AM +0800, Andy Fan wrote:
> Kir Shatrov <sigkirs@gmail.com> writes:
>
> Hi,
>
> > ## Unlocking Greater Extensions
> >
> > Extensions leveraging this metadata protocol could allow to build things with very little overhead that are not
possible
> > today:
> >
> > - Multi-tenant applications tracking per-query resource consumption (buffer hits/misses, CPU/IO time)
> > - Connection poolers making routing decisions based on server load
> > - Performance monitoring without separate instrumentation queries
> > - Applications implementing custom rate limiting based on DB metrics
> > - Distributed tracing integration at the protocol level
> >
> > ## Why not session variables?
> >
> > You could argue this could be achieved by reading session variables after each query, but that quickly becomes not
> > practical: leveraging session variables breaks multiplexing (e.g. PgBouncer) and hitting the server with yet
another
> > query when it's already busy just increases the number of incoming queries that it needs to handle.
>
> Does pg_stat_statements [1] work for you? I think some differents
> between pg_stat_statements and command level metrics are:
>
> 1. pg_stat_statements is a aggreated data and command level metrics is a
> raw data for each statement.
> 2. the entries in pg_stat_statments in limitted by
> pg_stat_statements.max.
>
> But IMO we need some analysis or aggregated data on these metrics rather
> than use the raw data,  Could you explain more how to use these command
> level metrics to make decision?

I don't think that pg_stat_statements really allows the same features (for
instance you can't extract the latest execution data since it's aggregated),
but also it has noticeable overhead and really doesn't scale well.  I know that
there are discussion to have the metrics stored using custom pgstats, and while
it would likely scale better it doesn't mean that we're going to have an
unlimited amount of entries available.  There will still be entry evictions,
and therefore lost metrics, and since it will still be aggregated there will
also still be limited precision/granularity.



Re: Proposal: Extending the PostgreSQL Protocol with Command Metadata

От
Jelte Fennema-Nio
Дата:
On Mon, 18 Aug 2025 at 02:25, Kir Shatrov <sigkirs@gmail.com> wrote:
> I'd like to propose a backward-compatible extension to the PostgreSQL Frontend/Backend Protocol that would allow
serversto send command metadata back to clients alongside query results.
 

I think at least one of your goals for the feature makes sense:
Allowing to debug query performance of the query that you have run.
Currently that's only possible with having to rely on EXPLAIN (which
does not return the query results) or autoexplain which requires
access to the server logs. This would allow these stats to be added to
distributed trace spans etc by an application.

A few thoughts on the actual proposal:
1. I think we'd want the result to be JSON so it's easily parseable by
applications
2. I've discussed a similar idea in the past at a conference: allow
continuous (partial) explain analyze plans to be sent by the server to
the client at a specific interval during query execution. So that a
client can use that info to show some kind of progress bar. I think it
would be good to keep that usecase in mind when designing a protocol
extension like this, so that we could later extend it.
3. I think you could create a POC for this in a Postgres extension
without changing the protocol by piggybacking on GUC_REPORT and the
ParameterStatus message. If you create an extension that adds a GUC
with the GUC_REPORT flag, and set that GUC to a JSON string with the
information that you want at the end of a query execution. Then I
think you basically get the functionality that you want.
4. I agree with Jacob, that if we want people to be able to choose the
exact information that they want in these messages, then that would
need to be done at the protocol level. That way proxies could filter
out the things that the client don't want, but they did.