Обсуждение: BackendKeyData is mandatory?

Поиск
Список
Период
Сортировка

BackendKeyData is mandatory?

От
Tatsuo Ishii
Дата:
In the Frontend/Backend protocol, it is explained that after
successful authentication following messages can be sent from backend
to frontend[1]:

BackendKeyData
ParameterStatus
ReadyForQuery
ErrorResponse
NoticeResponse

My question is, BackendKeyData is mandatory or not. Currently
Pgpool-II raises a fatal error if BackendKeyData is not sent before
ReadyForQuery arrives. This is because without the message, frontend
cannot send a CancelRequest message later on, as there's no secret
key.

I heard that some "PostgreSQL compatible" servers do not send
BackendKeyData message to frontend. I wonder if this is a protocol
violation.

[1] https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-START-UP

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: BackendKeyData is mandatory?

От
Tom Lane
Дата:
Tatsuo Ishii <ishii@postgresql.org> writes:
> In the Frontend/Backend protocol, it is explained that after
> successful authentication following messages can be sent from backend
> to frontend[1]:

> BackendKeyData
> ParameterStatus
> ReadyForQuery
> ErrorResponse
> NoticeResponse

> My question is, BackendKeyData is mandatory or not. Currently
> Pgpool-II raises a fatal error if BackendKeyData is not sent before
> ReadyForQuery arrives. This is because without the message, frontend
> cannot send a CancelRequest message later on, as there's no secret
> key.

As you say, without BackendKeyData it's impossible to send a query
cancel, so we expect the server will always send that.

> I heard that some "PostgreSQL compatible" servers do not send
> BackendKeyData message to frontend. I wonder if this is a protocol
> violation.

I'd say so.  Maybe whoever that is doesn't care to support query
cancel.  They're within their rights to do that I guess, but
Pgpool-II does not have to support the case.  (A less incompatible
way of not supporting query cancel is to send dummy BackendKeyData
values and then just ignore cancel requests.  So I don't see that
you need to do anything towards this goal, if it is a goal and
not merely a broken implementation.)

            regards, tom lane



Re: BackendKeyData is mandatory?

От
"David G. Johnston"
Дата:
On Monday, June 16, 2025, Tatsuo Ishii <ishii@postgresql.org> wrote:

My question is, BackendKeyData is mandatory or not. Currently
Pgpool-II raises a fatal error if BackendKeyData is not sent before
ReadyForQuery arrives. This is because without the message, frontend
cannot send a CancelRequest message later on, as there's no secret
key.

I wouldn’t expect a proxy to make a judgement here; but to simply forward what does show up and otherwise stay silent.  If there is proxy layer code needed to deal with its absence ignoring the cancel attempt with a log warning would be sufficient.  Otherwise, the user has made their choices and this is an optional feature in practice (though resorting to pg_cancel_query make be required for truly hung processes).

David J.

Re: BackendKeyData is mandatory?

От
Tatsuo Ishii
Дата:
>> My question is, BackendKeyData is mandatory or not. Currently
>> Pgpool-II raises a fatal error if BackendKeyData is not sent before
>> ReadyForQuery arrives. This is because without the message, frontend
>> cannot send a CancelRequest message later on, as there's no secret
>> key.
> 
> As you say, without BackendKeyData it's impossible to send a query
> cancel, so we expect the server will always send that.

That's my understanding too.

>> I heard that some "PostgreSQL compatible" servers do not send
>> BackendKeyData message to frontend. I wonder if this is a protocol
>> violation.
> 
> I'd say so.  Maybe whoever that is doesn't care to support query
> cancel.  They're within their rights to do that I guess, but
> Pgpool-II does not have to support the case.  (A less incompatible
> way of not supporting query cancel is to send dummy BackendKeyData
> values and then just ignore cancel requests.  So I don't see that
> you need to do anything towards this goal, if it is a goal and
> not merely a broken implementation.)

Agreed.
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: BackendKeyData is mandatory?

От
Peter Eisentraut
Дата:
On 17.06.25 03:10, Tatsuo Ishii wrote:
> My question is, BackendKeyData is mandatory or not. Currently
> Pgpool-II raises a fatal error if BackendKeyData is not sent before
> ReadyForQuery arrives. This is because without the message, frontend
> cannot send a CancelRequest message later on, as there's no secret
> key.

I think that's fine, if the server does not want to support query 
cancellation.  The current protocol description certainly does not 
support the idea that it is a hard error *not* to send BackendKeyData.

It's also worth thinking about the new protocol 3.2 longer key data.  A 
paranoid server might choose to send key data only if protocol >=3.2 is 
chosen and not if a lower, notionally less secure version is chosen.




Re: BackendKeyData is mandatory?

От
Tatsuo Ishii
Дата:
> I think that's fine, if the server does not want to support query
> cancellation.  The current protocol description certainly does not
> support the idea that it is a hard error *not* to send BackendKeyData.

Isn't it scary if the server does not allow a query cancel?  For
example, if the server charge you per query duration and if you
accidentally send a long running query, the only escape exit is the
query cancellation.

> It's also worth thinking about the new protocol 3.2 longer key data.
> A paranoid server might choose to send key data only if protocol >=3.2
> is chosen and not if a lower, notionally less secure version is
> chosen.

I would say the server does wrong a decision. I think even if the key
is not long, it's still useful than nothing.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: BackendKeyData is mandatory?

От
Heikki Linnakangas
Дата:
On 19/06/2025 13:03, Tatsuo Ishii wrote:
>> I think that's fine, if the server does not want to support query
>> cancellation.  The current protocol description certainly does not
>> support the idea that it is a hard error *not* to send BackendKeyData.
> 
> Isn't it scary if the server does not allow a query cancel?  For
> example, if the server charge you per query duration and if you
> accidentally send a long running query, the only escape exit is the
> query cancellation.

Or disconnect. Or pg_cancel_backend().

You can also easily have a stray psql session where the user has gone 
out for lunch. Or an application that doesn't have a timeout. Many other 
scenarios like that.

>> It's also worth thinking about the new protocol 3.2 longer key data.
>> A paranoid server might choose to send key data only if protocol >=3.2
>> is chosen and not if a lower, notionally less secure version is
>> chosen.
> 
> I would say the server does wrong a decision. I think even if the key
> is not long, it's still useful than nothing.

I tend to agree, but people have different priorities. It's also 
reasonable that you'd want to only support long cancellation keys. Or 
maybe you have a proxy that doesn't implement query cancellation, or 
only supports it with long keys because it embeds routing information in 
the key, or something like that.

FWIW my reading of the protocol docs is that BackendKeyData is optional. 
If I was writing a client today, I would accept it missing. But of 
course all PostgreSQL versions today do send it, and I wouldn't be 
surprised if there are clients out there that get confused if they don't 
see it.

- Heikki




Re: BackendKeyData is mandatory?

От
Tatsuo Ishii
Дата:
> Or disconnect.

You cannot disconnect without canceling the query at least using psql.
You can kill psql to disconnect but it's possible that the backend
keeps on running the query.

> Or pg_cancel_backend().

In order to issue pg_cancel_backend() the user needs to know the
backend pid which was supposed to be provided by a BackendKeyData
message. Of course you could search and find the backend pid from
other source, but I think it's less user friendly.

>> I would say the server does wrong a decision. I think even if the key
>> is not long, it's still useful than nothing.
> 
> I tend to agree, but people have different priorities. It's also
> reasonable that you'd want to only support long cancellation keys. Or
> maybe you have a proxy that doesn't implement query cancellation, or
> only supports it with long keys because it embeds routing information
> in the key, or something like that.

Agreed. All PostgreSQL "compatible" servers have their own
priority. In Pgpool-II case, the priority is compatibility with
PostgreSQL (I am not saying Pgpool-II is 100% compatible with
PostgreSQL as of today, but it's a implementation limitation which I
want to eliminate someday).

> FWIW my reading of the protocol docs is that BackendKeyData is
> optional.

If majority of developers think so, do we want to update the protocol
docs?  For me the docs is not clear enough.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: BackendKeyData is mandatory?

От
Jelte Fennema-Nio
Дата:
On Thu, 19 Jun 2025 at 13:51, Tatsuo Ishii <ishii@postgresql.org> wrote:
> > FWIW my reading of the protocol docs is that BackendKeyData is
> > optional.
>
> If majority of developers think so, do we want to update the protocol
> docs?  For me the docs is not clear enough.

I think the docs currently definitely suggests that BackendKeyData
will always be sent (emphasis mine):

> After having received AuthenticationOk, the frontend must wait for further messages from the server. In this phase a
backendprocess is being started, and the frontend is just an interested bystander. It is still possible for the startup
attemptto fail (ErrorResponse) or the server to decline support for the requested minor protocol version
(NegotiateProtocolVersion),*but in the normal case the backend will send some ParameterStatus messages, BackendKeyData,
andfinally ReadyForQuery.* 

I'd be surprised if many clients handle it correctly if it is not
sent. Looking quickly at the code for pgbouncer and libpq for PG17
(and lower) they definitely don't. They won't throw an error, but
instead of doing nothing when the user tries to cancel a query they
will instead send a cancel message with all zeros to the server. Since
PG18 libpq handles a cancel call nicely as a no-op, when no cancel key
was received.

I agree that it would be good to explicitly call out that the server
not sending BackendKeyData is an option if we don't want the server to
require sending this message.



Re: BackendKeyData is mandatory?

От
"David G. Johnston"
Дата:
On Thursday, June 19, 2025, Tatsuo Ishii <ishii@postgresql.org> wrote:
> FWIW my reading of the protocol docs is that BackendKeyData is
> optional.

If majority of developers think so, do we want to update the protocol
docs?  For me the docs is not clear enough.

At this point why does it matter what the docs says?  You know what exists in reality and you can either change or not.

That said, the documentation makes it clear that absent an error the server shall send:

2 or more (some) ParameterStatus messages
Followed by
1 BackendKeyData message
Followed by
1 ReadyForQuery message

So the protocol is violated if the BackendKeyData is absent.  I don’t see why we should change that since we adhere to it and it’s our protocol; but also its absence is non-fatal to the typical operation of the server…

David J.

Re: BackendKeyData is mandatory?

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> At this point why does it matter what the docs says?

Yeah, I cannot get excited about changing this.  The current protocol
spec accurately documents what we do.  It is not incumbent on us to
document what non-Postgres implementations of this protocol could
hypothetically do --- especially when we're just guessing in a vacuum
as to whether this second-hand-reported behavior is intentional or
a bug.

There's an argument based on the ancient principle of "be conservative
in what you send and liberal in what you accept" that Pgpool ought
to survive not getting BackendKeyData, or at least not complain until
such time as it's asked to send a query cancel.  But that principle
is not about what it says in the protocol spec.

            regards, tom lane



Re: BackendKeyData is mandatory?

От
Jacob Champion
Дата:
On Thu, Jun 19, 2025 at 5:12 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> I'd be surprised if many clients handle it correctly if it is not
> sent. Looking quickly at the code for pgbouncer and libpq for PG17
> (and lower) they definitely don't. They won't throw an error, but
> instead of doing nothing when the user tries to cancel a query they
> will instead send a cancel message with all zeros to the server. Since
> PG18 libpq handles a cancel call nicely as a no-op, when no cancel key
> was received.

If anyone today is relying on "backend-key-less" connection, this is
potentially a breaking change. For example, psycopg2 now complains:

    psycopg2.OperationalError: can't get cancellation key

whereas before, it would be able to connect, and sending a cancel
would do something. (Whether or not that "something" was useful was up
to the server implementation, I guess.)

--Jacob



Re: BackendKeyData is mandatory?

От
Jelte Fennema-Nio
Дата:
On Mon, 23 Jun 2025 at 18:02, Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
> If anyone today is relying on "backend-key-less" connection, this is
> potentially a breaking change. For example, psycopg2 now complains:
>
>     psycopg2.OperationalError: can't get cancellation key


It's not super clear what you're referring to with "this" in your
first sentence.

To be clear, I'm not saying we should start throwing errors for things
in libpq that weren't errors before. But more as: I don't expect many
client authors to have considered the scenario of no BackendKeyData.
Because either an author believes the BackendKeyData is optional, in
which case they shouldn't send a CancelRequest for those connections.
Or they think it is required, in which case throwing an error seems
like the sensible thing to do. And the implementations in PgBouncer
and PG17 is neither, i.e. it won't throw an error, but instead of
doing nothing it will do something clearly useless.



Re: BackendKeyData is mandatory?

От
Jacob Champion
Дата:
On Mon, Jun 23, 2025 at 9:24 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> On Mon, 23 Jun 2025 at 18:02, Jacob Champion
> <jacob.champion@enterprisedb.com> wrote:
> > If anyone today is relying on "backend-key-less" connection, this is
> > potentially a breaking change. For example, psycopg2 now complains:
> >
> >     psycopg2.OperationalError: can't get cancellation key
>
> It's not super clear what you're referring to with "this" in your
> first sentence.

The change in behavior in 18, if the server doesn't send a cancellation key.

> To be clear, I'm not saying we should start throwing errors for things
> in libpq that weren't errors before.

But that is _exactly_ what we've started doing now, in 18. We return
NULL instead of an "empty" cancellation object, and at least one
existing client fails because of it. Whether that's a problem in
practice is, I guess, part of the open question of this thread.

> But more as: I don't expect many
> client authors to have considered the scenario of no BackendKeyData.

I agree.

> Because either an author believes the BackendKeyData is optional, in
> which case they shouldn't send a CancelRequest for those connections.
> Or they think it is required, in which case throwing an error seems
> like the sensible thing to do. And the implementations in PgBouncer
> and PG17 is neither, i.e. it won't throw an error, but instead of
> doing nothing it will do something clearly useless.

From reading this thread, I'm not convinced that's "clear". I wouldn't
have chosen the existing behavior, for sure, but any existing servers
that don't send a key must be doing _something_ with that cancel
request, right? Even if it's just ignored?

Do we know which implementations aren't sending keys?

--Jacob



Re: BackendKeyData is mandatory?

От
Jelte Fennema-Nio
Дата:
On Mon, 23 Jun 2025 at 18:42, Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
> > To be clear, I'm not saying we should start throwing errors for things
> > in libpq that weren't errors before.
>
> But that is _exactly_ what we've started doing now, in 18. We return
> NULL instead of an "empty" cancellation object, and at least one
> existing client fails because of it.

Ugh, I seemed to have totally misread that code when I wrote my
initial note about the PG18 behaviour. Indeed PQgetCancel returning
NULL obviously means that an error occurred. Also in the
PQcancelCreate case, we're even setting the error message ourselves.
(I did search for the error message you mentioned, but did so in
psycopg (aka psycopg3), not psycopg2)

> Whether that's a problem in
> practice is, I guess, part of the open question of this thread.

I'd love to hear what database actually has this problem. Without a
specific (and non-experimental) database being impacted by this, I'm
not entirely convinced that we need to do anything.

If we do think it's worth it for libpq to continue to handle such
systems, then I'd say we change the behaviour to only throw an error
at a later point: When actually initiating the cancellation. Because
it seems pretty useful to know that their attempt to cancel a query
could not be serviced (just like in case of network issues). However,
throwing an error in PQgetCancel or PQcancelCreate seems premature,
because some clients (e.g. psycopg2) call PQgetCancel right after
initializing the Postgres connection, and will thus fail the
connection attempt. Effectively meaning that that client with libpq
PG18 won't be able to connect to the type of server in question.

> From reading this thread, I'm not convinced that's "clear". I wouldn't
> have chosen the existing behavior, for sure, but any existing servers
> that don't send a key must be doing _something_ with that cancel
> request, right? Even if it's just ignored?

I mean if the only thing a server can do is ignore it, ISTM that it's
clearly useless to send it anyway. Sending nothing seems a much better
choice in that case.

> Do we know which implementations aren't sending keys?

Nope, that's totally unclear. It would be very nice knowing which
database this is, and if it's at all a production system.



Re: BackendKeyData is mandatory?

От
Tom Lane
Дата:
Jelte Fennema-Nio <postgres@jeltef.nl> writes:
> On Mon, 23 Jun 2025 at 18:42, Jacob Champion
> <jacob.champion@enterprisedb.com> wrote:
>> From reading this thread, I'm not convinced that's "clear". I wouldn't
>> have chosen the existing behavior, for sure, but any existing servers
>> that don't send a key must be doing _something_ with that cancel
>> request, right? Even if it's just ignored?

> I mean if the only thing a server can do is ignore it, ISTM that it's
> clearly useless to send it anyway. Sending nothing seems a much better
> choice in that case.

It could be that the server has some independent way of knowing which
session to cancel.  (As a reductio-ad-absurdum case, maybe it only
supports one session.)

>> Do we know which implementations aren't sending keys?

> Nope, that's totally unclear. It would be very nice knowing which
> database this is, and if it's at all a production system.

Yeah, I'm very hesitant to spend any effort here without having
a more concrete use-case.

            regards, tom lane



Re: BackendKeyData is mandatory?

От
Tatsuo Ishii
Дата:
>>> Do we know which implementations aren't sending keys?
> 
>> Nope, that's totally unclear. It would be very nice knowing which
>> database this is, and if it's at all a production system.
> 
> Yeah, I'm very hesitant to spend any effort here without having
> a more concrete use-case.

One example is Amazon RDS Proxy.
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/rds-proxy.html

See the section:
"Additional limitations for RDS for PostgreSQL"

I haven't tried Amazon RDS Proxy myself but I heard about it in a
conversation with a Pgpool-II user. He tried to use Pgpool-II with
Amazon RDS Proxy and failed.

https://github.com/pgpool/pgpool2/issues/111

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: BackendKeyData is mandatory?

От
Jelte Fennema-Nio
Дата:
On Tue, 24 Jun 2025 at 01:44, Tatsuo Ishii <ishii@postgresql.org> wrote:
> One example is Amazon RDS Proxy.

Okay, that sounds widely used enough to continue that we should
probably change the new PG18 behaviour of PQgetCancel and
PQcancelCreate like I suggested. Failing all psycopg2 connection
attempts against AWS its proxy service doesn't seem like something
we'd want to do.



Re: BackendKeyData is mandatory?

От
Jacob Champion
Дата:
On Tue, Jun 24, 2025 at 1:36 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> Okay, that sounds widely used enough to continue that we should
> probably change the new PG18 behaviour of PQgetCancel and
> PQcancelCreate like I suggested. Failing all psycopg2 connection
> attempts against AWS its proxy service doesn't seem like something
> we'd want to do.

So that's
1) return an (empty) cancellation object even if the server has not
sent a key, and
2) error out when trying to cancel with an empty object?

That sounds reasonable to me.

Are there any reading along who want us to continue sending an
all-zeroes CancelRequest if the server has not sent a key? Personally,
I don't feel a need to push for that without evidence that it's
actually used, and both RDS Proxy and Cockroach [1] seem to fall in
the "don't support cancellation at all" bucket.

Thanks!
--Jacob

[1] https://github.com/cockroachdb/cockroach/issues/32973



Re: BackendKeyData is mandatory?

От
Jelte Fennema-Nio
Дата:
On Tue, 24 Jun 2025 at 17:07, Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
> So that's
> 1) return an (empty) cancellation object even if the server has not
> sent a key, and
> 2) error out when trying to cancel with an empty object?

Yes (and empty being non-NULL obviously)

> That sounds reasonable to me.

Alright, let's do that then. I could probably write a patch for that tomorrow.



Re: BackendKeyData is mandatory?

От
Tom Lane
Дата:
Jacob Champion <jacob.champion@enterprisedb.com> writes:
> So that's
> 1) return an (empty) cancellation object even if the server has not
> sent a key, and
> 2) error out when trying to cancel with an empty object?

> That sounds reasonable to me.

+1.

> Are there any reading along who want us to continue sending an
> all-zeroes CancelRequest if the server has not sent a key?

We might have to consider doing so if evidence emerges of a server
that depends on us doing that, but right now we have no such evidence.

On the whole, reporting an error seems like a better user experience
than silently sending a cancel we know won't work.  But we have to
delay the error until a cancel is actually attempted.

            regards, tom lane



Re: BackendKeyData is mandatory?

От
"Jelte Fennema-Nio"
Дата:
On Tue Jun 24, 2025 at 5:07 PM CEST, Jacob Champion wrote:
> So that's
> 1) return an (empty) cancellation object even if the server has not
> sent a key, and
> 2) error out when trying to cancel with an empty object?
>
> That sounds reasonable to me.

Attached is an attempt at implementing the above. I did not test it
against these systems though.

I also added small commit that checks for the 256 byte key length limit
described in the protocol documentation. This thread made me remember
that we talked about that during PGConf.dev.

Вложения

Re: BackendKeyData is mandatory?

От
Jacob Champion
Дата:
On Wed, Jun 25, 2025 at 12:12 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> Attached is an attempt at implementing the above. I did not test it
> against these systems though.

With 0001, psycopg2 appears to function again when talking to a server
that doesn't send a cancellation key, so that's good.

It looks like Heikki has an open item for this, so I'll defer to him
for copyediting preferences around the comments and commit messages. I
do have a problem with this part of the new documentation:

+        [...] If no
+        BackendKeyData is sent by the server, then that means that the backend
+        does not support canceling queries using the CancelRequest messages.

I don't think we really know that it means that, yet. I'd prefer something like

    Beginning with PostgreSQL 18, libpq assumes that servers do not
support query cancellation if they do not send BackendKeyData.

in the hope that either anyone affected will complain at us to revert,
or it'll become the de facto meaning after some time.

> I also added small commit that checks for the 256 byte key length limit
> described in the protocol documentation. This thread made me remember
> that we talked about that during PGConf.dev.

0002 seems to result in a connection hang if the server sends a bigger
key. I think this may be a latent bug in the original patch -- if
getBackendKeyData() fails, no effort is made to clean up state or
manage the connection buffer. We just... return.

Also, what should we do if the server sends a zero-length key?

Thanks,
--Jacob



Re: BackendKeyData is mandatory?

От
Jelte Fennema-Nio
Дата:
On Mon, 30 Jun 2025 at 19:59, Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
>
> On Wed, Jun 25, 2025 at 12:12 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> > Attached is an attempt at implementing the above. I did not test it
> > against these systems though.
>
> With 0001, psycopg2 appears to function again when talking to a server
> that doesn't send a cancellation key, so that's good.

Great!

> It looks like Heikki has an open item for this, so I'll defer to him

Oh... Sorry for the confusion. I added that open item to the list (so
it would not be missed), and I added Heikki as the owner because he
committed the original patch. I checked now, and since Heikki hasn't
sent an email to hackers since June 10th, I'm not sure that was the
correct decision. So feel free to take ownership of the open item. I
changed it to "Unknown" for now (I did the same for some of the other
protocol docs related patches).

> for copyediting preferences around the comments and commit messages. I
> do have a problem with this part of the new documentation:
>
> +        [...] If no
> +        BackendKeyData is sent by the server, then that means that the backend
> +        does not support canceling queries using the CancelRequest messages.
>
> I don't think we really know that it means that, yet. I'd prefer something like
>
>     Beginning with PostgreSQL 18, libpq assumes that servers do not
> support query cancellation if they do not send BackendKeyData.
>
> in the hope that either anyone affected will complain at us to revert,
> or it'll become the de facto meaning after some time.

I understand your intent, but I don't like that the protocol docs
would be saying anything about libpq specific behaviour. I'd like to
be explicit about protocol behaviour, without considering the specific
client. That we weren't before is exactly how we got into the current
mess (maybe those IETF are on to something with their MUST/SHOULD/etc
stuff).

So I still prefer my wording over yours, especially since no-one has
been able to think of any reasonable way a production system could
utilize an "all zeros" cancellation key (except for having only one
connection, which automatically makes it non-production afaict). But I
do understand your concern, so how about this wording:

Since protocol version 3.2, if the server sent no BackendKeyData, then
that means that the backend does not support canceling queries using
the CancelRequest messages. In protocol versions before 3.2 the
behaviour is undefined if the client receives no BackendKeyData.

That way we define the behavior as "sensible" in 3.2, while still
allowing for the rare case that someone somewhere relied on the "all
zeros" cancel message being sent in libpq for PG17. And if it's a big
enough problem, we could then still change libpq to continue sending
"all zeros" if the server used protocol 3.0.

> 0002 seems to result in a connection hang if the server sends a bigger
> key. I think this may be a latent bug in the original patch -- if
> getBackendKeyData() fails, no effort is made to clean up state or
> manage the connection buffer. We just... return.

Hmm... Yeah, seems like that needs a bit more work then.

> Also, what should we do if the server sends a zero-length key?

Given that that only applies to protocol 3.2, I'd like to define that
strictly. How about simply not allowing that and throwing an error in
libpq in that case? e.g. by defining the secret so that it should be
at minimum 4 and at most 256 bytes?



Re: BackendKeyData is mandatory?

От
Jacob Champion
Дата:
On Mon, Jun 30, 2025 at 11:44 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> > It looks like Heikki has an open item for this, so I'll defer to him
>
> Oh... Sorry for the confusion. I added that open item to the list (so
> it would not be missed), and I added Heikki as the owner because he
> committed the original patch. I checked now, and since Heikki hasn't
> sent an email to hackers since June 10th, I'm not sure that was the
> correct decision.

I don't know that it's necessarily forbidden to add an open item on
someone else's behalf, but I think they should definitely be aware of
it, if you do. :D

> So feel free to take ownership of the open item. I
> changed it to "Unknown" for now (I did the same for some of the other
> protocol docs related patches).

I'll avoid taking it over for now, mostly so that I'm not imposing a
different set of requirements on the follow-up as compared to the
originally committed patch. But I can pick it up if Heikki's not able
to.

> I understand your intent, but I don't like that the protocol docs
> would be saying anything about libpq specific behaviour. I'd like to
> be explicit about protocol behaviour, without considering the specific
> client. That we weren't before is exactly how we got into the current
> mess (maybe those IETF are on to something with their MUST/SHOULD/etc
> stuff).

Underspecification is how you get into this sort of mess, and that's
already done. So hiding the specific change behind a veil of "can't
say libpq" inside the Postgres docs doesn't make sense to me,
personally -- when other people say "Postgres wire-compatible",
they're talking about us.

Plus, modern IETF specs are _very_ good at mentioning when problems in
widely-deployed implementations have led to a protocol decision.
(Calling them out by name, in the permanent RFC, would probably be
unwise in many situations. But we don't have any reason to avoid
calling ourselves out in our own docs.)

> But I
> do understand your concern, so how about this wording:
>
> Since protocol version 3.2, if the server sent no BackendKeyData, then
> that means that the backend does not support canceling queries using
> the CancelRequest messages. In protocol versions before 3.2 the
> behaviour is undefined if the client receives no BackendKeyData.
>
> That way we define the behavior as "sensible" in 3.2, while still
> allowing for the rare case that someone somewhere relied on the "all
> zeros" cancel message being sent in libpq for PG17. And if it's a big
> enough problem, we could then still change libpq to continue sending
> "all zeros" if the server used protocol 3.0.

1. I find this less useful to implementers. Implementers, I think,
want to know what libpq is going to do in reality.
2. Without a way to enforce or test this behavior, I'm not excited
about tying the 3.2 protocol definition to a change that we still
might revert for 3.0. Maybe that is the least bad way forward, but I
would want more committers than just me to buy into that agreement
first.

> > Also, what should we do if the server sends a zero-length key?
>
> Given that that only applies to protocol 3.2,

It applies to 3.0 too. (There is no longer any code in the client that
locks the length of the key to four bytes.) This applies to PG18 and
onwards.

> I'd like to define that
> strictly. How about simply not allowing that and throwing an error in
> libpq in that case? e.g. by defining the secret so that it should be
> at minimum 4 and at most 256 bytes?

No objection here.

--Jacob



Re: BackendKeyData is mandatory?

От
Jelte Fennema-Nio
Дата:
On Tue, 1 Jul 2025 at 00:23, Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
> when other people say "Postgres wire-compatible",
> they're talking about us.

Many people instead are talking about JDBC (Java) or the Npgsql (C#).
These are both much better implementations of the protocol than libpq
is IMO (e.g. libpq still cannot request binary text based per column,
and it couldn't send a Close message until recently). Pretending that
libpq is the "golden standard" for our protocol just seems plain wrong
to me. For instance libpq even supports receiving BackendKeyData many
more times after startup, due to implementation laziness. Do we want
to start documenting that as being part of the protocol?

> Plus, modern IETF specs are _very_ good at mentioning when problems in
> widely-deployed implementations have led to a protocol decision.
> (Calling them out by name, in the permanent RFC, would probably be
> unwise in many situations. But we don't have any reason to avoid
> calling ourselves out in our own docs.)

I think that's fair, and libpq can be used to give context about why
the protocol is defined the way it is. But it shouldn't be the
definition of the protocol itself.

> 1. I find this less useful to implementers. Implementers, I think,
> want to know what libpq is going to do in reality.

Okay, so how about we change it to this (first to sentences are the same)

Since protocol version 3.2, if the server sent no BackendKeyData, then
that means that the backend does not support canceling queries using
the CancelRequest messages. In protocol versions before 3.2 the
behaviour is undefined if the client receives no BackendKeyData.
Up until the libpq shipped in PostgreSQL 17, it would send a
CancelRequest with all zeros to 3.0 connections that did not send a
BackendKeyData message. Since the libpq shipped with PostgreSQL 18 it
does not send any CancelRequest at all for such connections anymore,
thus aligning with the behaviour for protocol 3.2. Throwing an error
as a client when not receiving BackendKeyData on a 3.0 connection is
not recommended, because certain server implementations are known not
to send it.

> 2. Without a way to enforce or test this behavior, I'm not excited
> about tying the 3.2 protocol definition to a change that we still
> might revert for 3.0. Maybe that is the least bad way forward, but I
> would want more committers than just me to buy into that agreement
> first.

If we want to revert the behaviour for 3.0 that would be purely
because of some existing software somehow relying on this all zeros
CancelRequest. If that existing software then wants to start
supporting 3.2 (or more likely a later protocol version), then they
would need to start sending a BackendKeyData with all zeros themselves
first to get the same client behaviour. That seems like an easy enough
workaround for the server implementers, and most importantly one that
does not impact any old servers (because those won't have implemented
3.2).

> > > Also, what should we do if the server sends a zero-length key?
> >
> > Given that that only applies to protocol 3.2,
>
> It applies to 3.0 too. (There is no longer any code in the client that
> locks the length of the key to four bytes.) This applies to PG18 and
> onwards.

That seems like a bug in libpq that we do not disallow that for 3.0.
However, in practice even without changing that, this would still only
apply to 3.2, because servers should not dare to send different length
BackendKeyData for a 3.0 connection. Even if modern libpq handles this
correctly on 3.0 connections, previous versions would still throw an
error, and many other clients will fail too.



Re: BackendKeyData is mandatory?

От
Jacob Champion
Дата:
On Tue, Jul 1, 2025 at 1:15 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> Pretending that
> libpq is the "golden standard" for our protocol just seems plain wrong
> to me.

Not what I said. I'm saying that if a server implementation claims
Postgres compatibility but fails to talk to deployed versions of libpq
in practice, people will roll their eyes, even if other client
implementations work. The most widely deployed implementation tends to
be favored independently of technical quality.

> For instance libpq even supports receiving BackendKeyData many
> more times after startup, due to implementation laziness. Do we want
> to start documenting that as being part of the protocol?

It's not up to me whether we do or not. Since the protocol is
underspecified and these corner cases are untested, what really
matters is how many people depend on the underspecified behavior. (I
point to the immediately preceding thread as evidence.)

> > 2. Without a way to enforce or test this behavior, I'm not excited
> > about tying the 3.2 protocol definition to a change that we still
> > might revert for 3.0. Maybe that is the least bad way forward, but I
> > would want more committers than just me to buy into that agreement
> > first.
>
> If we want to revert the behaviour for 3.0 that would be purely
> because of some existing software somehow relying on this all zeros
> CancelRequest. If that existing software then wants to start
> supporting 3.2 (or more likely a later protocol version), then they
> would need to start sending a BackendKeyData with all zeros themselves
> first to get the same client behaviour. That seems like an easy enough
> workaround for the server implementers, and most importantly one that
> does not impact any old servers (because those won't have implemented
> 3.2).

But again: we do not enforce or test this behavior, so if the revert
happens for 3.0 later, we all have to watch like hawks to make sure
that 3.2 is not affected. I need more buy-in for that from people who
are not me.

> > > > Also, what should we do if the server sends a zero-length key?
> > >
> > > Given that that only applies to protocol 3.2,
> >
> > It applies to 3.0 too. (There is no longer any code in the client that
> > locks the length of the key to four bytes.) This applies to PG18 and
> > onwards.
>
> That seems like a bug in libpq that we do not disallow that for 3.0.

Well... okay. I'm reasoning based on what's committed.

Right now, libpq's interpretation of 3.2 is more akin to an
advertisement that it supports variable-length cancel keys from the
server -- and crucially, it now supports variable-length keys for 3.0
too. (This weirdness makes more sense when you consider that the patch
started life as a _pq_ parameter, not a version bump.)

> However, in practice even without changing that, this would still only
> apply to 3.2, because servers should not dare to send different length
> BackendKeyData for a 3.0 connection. Even if modern libpq handles this
> correctly on 3.0 connections, previous versions would still throw an
> error, and many other clients will fail too.

Personally, I think it's more likely that any new server
implementations with alternative cancellation requirements will start
to silently couple against the new 3.0 behavior. I don't believe for a
minute that third parties "would not dare" to do literally anything
that works in practice. We're not the spec police and no one is going
to ask us for permission; if we want other implementations to behave a
certain way then we have to enforce it in the code.

--Jacob



Re: BackendKeyData is mandatory?

От
"Jelte Fennema-Nio"
Дата:
On Tue, 1 Jul 2025 at 18:50, Jacob Champion <jacob.champion@enterprisedb.com> wrote:
>
> Not what I said. I'm saying that if a server implementation claims
> Postgres compatibility but fails to talk to deployed versions of libpq
> in practice, people will roll their eyes, even if other client
> implementations work. The most widely deployed implementation tends to
> be favored independently of technical quality.

I think the confusion comes from the fact that I understood you to say:
"People don't care about the protocol, if libpq allows it, then it's
good enough." But now I think you might have meant: "If libpq doesn't
allow it, even though the spec suggets it should, then people will still
blame the server implemantion".

> It's not up to me whether we do or not. Since the protocol is
> underspecified and these corner cases are untested, what really
> matters is how many people depend on the underspecified behavior. (I
> point to the immediately preceding thread as evidence.)

Agreed. But what I was trying to say was that you need more than just
libpq to behave in an unspecified way. You need some critical mass of
clients to behave in a similar enough unspecified way. i.e. if JDBC
cannot connect to your custom postgres server because you went out of
spec, then users will start complaining to you that your server is
broken.

Sidenote: I checked just now. JDBC did send the all zeros message too
before it added 3.2 cancel support. But afaict the 3.2 support
introduced an issue where it will call castNonNull on a null pointer if
used on a server that does not send BackendKeyData. CC-ing Dave Cramer
because that sounds like a bug his employer would probably like fixed
because that would impact the RDS Proxy.

> But again: we do not enforce or test this behavior, 

I'd love to have your protocol test suite to be able to add
automated tests for this ;)

> so if the revert> happens for 3.0 later, we all have to watch like hawks to make sure
> that 3.2 is not affected. I need more buy-in for that from people who
> are not me.

Okay, but to be clear: You do agree with this approach? (assuming others
will agree too)

> Well... okay. I'm reasoning based on what's committed.

Attached is a v2 patchset that addresses this, as well as all the other
changes previously discussed. I tested that the behaviour is as intended
by modifying the PG sources locally.

> Personally, I think it's more likely that any new server
> implementations with alternative cancellation requirements will start
> to silently couple against the new 3.0 behavior. I don't believe for a
> minute that third parties "would not dare" to do literally anything
> that works in practice.

Maybe in some years, yes. But it seems rather unlikely that people would
start doing that now, because that would mean that old clients that
request 3.0 would suddenly not be able to connect because they get a
"malformed" BackendKeyData message.

> if we want other implementations to behave a
> certain way then we have to enforce it in the code.

Agreed

P.S. Heiki assigned himself to the open items I created now.

Вложения

Re: BackendKeyData is mandatory?

От
Jacob Champion
Дата:
On Wed, Jul 2, 2025 at 3:18 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> I think you might have meant: "If libpq doesn't
> allow it, even though the spec suggets it should, then people will still
> blame the server implemantion".

Yes. Same for clients which disallow corner cases that the spec is
silent on but libpq allows.

> > It's not up to me whether we do or not. Since the protocol is
> > underspecified and these corner cases are untested, what really
> > matters is how many people depend on the underspecified behavior. (I
> > point to the immediately preceding thread as evidence.)
>
> Agreed. But what I was trying to say was that you need more than just
> libpq to behave in an unspecified way. You need some critical mass of
> clients to behave in a similar enough unspecified way.

Sure, but I believe very strongly in people's ability to find and
depend on nonstandard behavior in popular protocols. :) It's a
law-of-large-numbers thing.

> Sidenote: I checked just now. JDBC did send the all zeros message too
> before it added 3.2 cancel support.

Okay, wait -- JDBC was _copying_ our weird behavior? Why? Does
something depend it in the wild?

> > But again: we do not enforce or test this behavior,
>
> I'd love to have your protocol test suite to be able to add
> automated tests for this ;)

Agreed.

> > so if the revert> happens for 3.0 later, we all have to watch like hawks to make sure
> > that 3.2 is not affected. I need more buy-in for that from people who
> > are not me.
>
> Okay, but to be clear: You do agree with this approach? (assuming others
> will agree too)

At the moment, yes. (We might as well signal a protocol change with
our protocol bump. :D)

> > Well... okay. I'm reasoning based on what's committed.
>
> Attached is a v2 patchset that addresses this, as well as all the other
> changes previously discussed. I tested that the behaviour is as intended
> by modifying the PG sources locally.

I will hold off on detailed review until Heikki gives an opinion on
the design (or we get closer to the end of the month), to avoid making
busy work for you -- but I will say that I think you need to prove
that the new `failure:` case in getBackendKeyData() is safe, because I
don't think any of the other failure modes behave that way inside
pqParseInput3().

> > Personally, I think it's more likely that any new server
> > implementations with alternative cancellation requirements will start
> > to silently couple against the new 3.0 behavior. I don't believe for a
> > minute that third parties "would not dare" to do literally anything
> > that works in practice.
>
> Maybe in some years, yes. But it seems rather unlikely that people would
> start doing that now, because that would mean that old clients that
> request 3.0 would suddenly not be able to connect because they get a
> "malformed" BackendKeyData message.

Maybe. Or maybe "Our new server supports 3.0, but only with PG18
onwards for some reason. ChatGPT isn't sure why it's different. We're
super popular now."

--Jacob



Re: BackendKeyData is mandatory?

От
Jelte Fennema-Nio
Дата:
On Thu, 3 Jul 2025 at 02:03, Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
> Okay, wait -- JDBC was _copying_ our weird behavior? Why? Does
> something depend it in the wild?

I'm pretty sure there was no intent behind this, but it's simply
because our weird behaviour is the simplest from the client
implementation side with a fixed 4-byte key length:
1. Store two ints in a struct/class, these default to 0.
2. When you receive a BackendKeyData during startup update the two ints
3. When client wants to cancel, send a CancelRequest with the two ints.

So if step 2 is missed, then you send this all zero message. PgBouncer
has the same behaviour.



Re: BackendKeyData is mandatory?

От
Heikki Linnakangas
Дата:
Thanks for digging into this!

On 03/07/2025 09:13, Jelte Fennema-Nio wrote:
> On Thu Jul 3, 2025 at 2:03 AM CEST, Jacob Champion wrote:
>> On Wed, Jul 2, 2025 at 3:18 PM Jelte Fennema-Nio <postgres@jeltef.nl> 
>> wrote:
>> I will hold off on detailed review until Heikki gives an opinion on
>> the design (or we get closer to the end of the month), to avoid making
>> busy work for you -- but I will say that I think you need to prove
>> that the new `failure:` case in getBackendKeyData() is safe, because I
>> don't think any of the other failure modes behave that way inside
>> pqParseInput3().
> 
> I changed it slightly now to align with the handleSyncLoss function its
> implementation.

To recap, there is a concrete problem with psycopg2 and libpq v18: if 
the server doesn't end BackendKeyData, the connection fails with "can't 
get cancellation key" error. With previous versions, it worked, and if 
you actually try to cancel, it will send a bogus cancellation message 
with all-zeros key and pid.

psycopg3 works better. It only calls PQgetCancel() when actually 
cancelling. So you can connect, and only if try to actually cancel, you 
get a "couldn't create cancel object" error. That seems quite reasonable.

I think agree with the changes to PQgetCancel()/PQcancel(). It fixes 
psycopg2. I don't see any nicer way to signal that cancellation is not 
available with PQgetCancel(). New applications should use to 
PQcancelCreate(), where we have more options.

I'm not quite sold on the change to PQcancelCreate(). The current 
behavior seems nicer: if cancellation is not available because the 
server didn't send a cancellation key, PQcancelCreate() returns a 
(cancel) connection object that's in a failed state with an error 
message explaining what's wrong. The client can choose to continue 
without cancellation capability, or bail out.

Are there any known drivers that require the change to PQcancelCreate()?

BTW, if the server doesn't send a BackendKeyData, PQbackendPID() returns 
0. Not much we can do about that.

- Heikki




Re: BackendKeyData is mandatory?

От
Jelte Fennema-Nio
Дата:
On Tue, 29 Jul 2025 at 22:42, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I'm not quite sold on the change to PQcancelCreate(). The current
> behavior seems nicer: if cancellation is not available because the
> server didn't send a cancellation key, PQcancelCreate() returns a
> (cancel) connection object that's in a failed state with an error
> message explaining what's wrong. The client can choose to continue
> without cancellation capability, or bail out.
>
> Are there any known drivers that require the change to PQcancelCreate()?

I scoured github search[1] and it seems that (sadly) the only two
libraries (on github) that actually use PQcancelCreate are psycopg3
and ruby-pg. Both of those only create the object when they actually
want to cancel something (not when they use it). So I agree that the
change in bahaviour in PG18 for this function seems fine, and probably
is desirable. So feel free to commit my previous patch without the
changes to PQcancelCreate.


[1]:
https://github.com/search?q=PQcancelCreate+NOT+path%3A%2F%5C%2Fpsycopg%5C%2Fpq%5C%2F%2F+NOT+path%3A%2F%5C%2Fpsycopg_c%5C%2Fpq%5C%2F%2F+NOT+path%3A%2Ffe-cancel.c%24%2F+NOT+path%3A%2Flibpq-fe.h%24%2F+NOT+path%3A%2Fbindings%5Cw*.rs%24%2F+NOT+path%3A%2Flibpq2.sgml%24%2F+NOT+path%3A%2Fexports.txt%24%2F+NOT+path%3A%2Ffe-connect.c%24%2F++NOT+path%3A%2F_pq_ctypes.pyi%24%2F+NOT+path%3A%2F%5Esrc%5C%2Finclude%5C%2Flibpq%5C%2F%2F+NOT+path%3A%2F%5Esrc%5C%2Ffe_utils%5C%2F%2F+NOT+path%3A%2F%5C%2Ftest%5C%2Fisolation%5C%2F%2F+NOT+path%3A%2F%5C%2Ftest%5C%2Fmodules%5C%2Flibpq_pipeline%5C%2F%2F&type=code&p=2



Re: BackendKeyData is mandatory?

От
Heikki Linnakangas
Дата:
On 30/07/2025 00:20, Jelte Fennema-Nio wrote:
> On Tue, 29 Jul 2025 at 22:42, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> I'm not quite sold on the change to PQcancelCreate(). The current
>> behavior seems nicer: if cancellation is not available because the
>> server didn't send a cancellation key, PQcancelCreate() returns a
>> (cancel) connection object that's in a failed state with an error
>> message explaining what's wrong. The client can choose to continue
>> without cancellation capability, or bail out.
>>
>> Are there any known drivers that require the change to PQcancelCreate()?
> 
> I scoured github search[1] and it seems that (sadly) the only two
> libraries (on github) that actually use PQcancelCreate are psycopg3
> and ruby-pg. Both of those only create the object when they actually
> want to cancel something (not when they use it). So I agree that the
> change in bahaviour in PG18 for this function seems fine, and probably
> is desirable. So feel free to commit my previous patch without the
> changes to PQcancelCreate.

Ok, thanks for checking! Committed the PGgetCancel() part.

I felt that explaining historical libpq behavior was a bit too much 
detail in the protocol docs, so I cut it down to just this:

 > The PostgreSQL server will always send this message, but some third 
party backend implementations of the protocol that don't support query 
cancellation are known not to.

Perhaps it would indeed be good to specify more strictly that no 
cancellation key means that the server doesn't support cancellation, but 
it feels like a separate change. And if we do that, perhaps we should 
document that retroactively for protocol version 3.0 too. And if it's 
not too invasive, even backpatch something for older libpq versions. 
Arguably it's a bug to send an all-zeros CancelRequest.

If we wanted to truly support the missing cancel key, here's something 
else to consider: psql's error message isn't very nice when you hit 
CTRL-C and there's no cancel key:

postgres=# select pg_sleep(10);
^CCould not send cancel request: PQcancel() -- no cancellation key received

That's pretty low level. And there's no newline, so if you hit CTRL-C 
again, it looks like this:

postgres=# select pg_sleep(10);
^CCould not send cancel request: PQcancel() -- no cancellation key 
received^CCould not send cancel request: PQcancel() -- no cancellation 
key received

It would be nice to improve all that, but it needs more work.

(I will now start looking at your second patch, 
v3-0002-libpq-Be-strict-about-accept-cancel-key-lengths.patch)

- Heikki




Re: BackendKeyData is mandatory?

От
Jelte Fennema-Nio
Дата:
On Fri, 1 Aug 2025 at 18:12, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> That's pretty low level. And there's no newline

Maybe we should change the message to something like:

"server does not support cancelling queries"

Then at least it's a bit clearer to users what's going on. The newline
issue seems like more work to fix.



Re: BackendKeyData is mandatory?

От
Heikki Linnakangas
Дата:
(reviewing patch 2 now)

On 03/07/2025 09:13, Jelte Fennema-Nio wrote:
> On Thu Jul 3, 2025 at 2:03 AM CEST, Jacob Champion wrote:
>> On Wed, Jul 2, 2025 at 3:18 PM Jelte Fennema-Nio <postgres@jeltef.nl> 
>> wrote:
>> I will hold off on detailed review until Heikki gives an opinion on
>> the design (or we get closer to the end of the month), to avoid making
>> busy work for you -- but I will say that I think you need to prove
>> that the new `failure:` case in getBackendKeyData() is safe, because I
>> don't think any of the other failure modes behave that way inside
>> pqParseInput3().
> 
> I changed it slightly now to align with the handleSyncLoss function its
> implementation.

That seems good, except maybe the copy-pasted comments could be adjusted 
a bit. But I wonder why you added this:

--- a/src/interfaces/libpq/fe-connect.c 

+++ b/src/interfaces/libpq/fe-connect.c 

@@ -4322,6 +4322,9 @@ keep_going: 
       \
  /* We will come back to here until there is 

                                 if (PQisBusy(conn)) 

                                         return PGRES_POLLING_READING; 

  

+                               if (conn->status == CONNECTION_BAD) 

+                                       goto error_return; 

+ 

                                 res = PQgetResult(conn); 

  

                                 /*

That was not necessary for handleSyncLoss() to work, or for any other 
errors. If an error has occurred, PQgetResult() returns an error result, 
which is handled here.

It's not necessarily a bad idea. It saves some effort, as PQgetResult() 
doesn't need to construct the result object, which we will just 
immediately free again. But we have been doing fine without it.

- Heikki




Re: BackendKeyData is mandatory?

От
Jelte Fennema-Nio
Дата:
On Fri, 8 Aug 2025 at 00:03, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> That was not necessary for handleSyncLoss() to work, or for any other
> errors. If an error has occurred, PQgetResult() returns an error result,
> which is handled here.

You're right. I think I simply forgot to remove that in v3 (it was
necessary for v2). I'd say let's remove that check and keep the error
path closer to the behavior in other places.

Feel free to tweak/remove the copy-pasted comments as you see fit.



Re: BackendKeyData is mandatory?

От
Heikki Linnakangas
Дата:
On 08/08/2025 09:44, Jelte Fennema-Nio wrote:
> On Fri, 8 Aug 2025 at 00:03, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> That was not necessary for handleSyncLoss() to work, or for any other
>> errors. If an error has occurred, PQgetResult() returns an error result,
>> which is handled here.
> 
> You're right. I think I simply forgot to remove that in v3 (it was
> necessary for v2). I'd say let's remove that check and keep the error
> path closer to the behavior in other places.

Ok.

I noticed that there's a similar, existing case in getNotify(), where 
libpq just hangs if an allocation fails. To simulate that, apply this 
change and use LISTEN/NOTIFY:

--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -1587,7 +1587,7 @@ getNotify(PGconn *conn)
      if (pqGets(&conn->workBuffer, conn))
          return EOF;
      /* must save name while getting extra string */
-    svname = strdup(conn->workBuffer.data);
+    svname = NULL;
      if (!svname)
          return EOF;
      if (pqGets(&conn->workBuffer, conn))

Returning EOF means "not enough data", which is wrong here just like in 
getBackendKeyData().

I'm not sure how to best fix that. If we can't process a Notify message 
because of out of memory, what should we do?

a) silently drop the Notify messsage.
b) report an error on the next query
c) drop the connection with the error.

You went with c) in getBackendKeyData(), which makes sense since that 
happens when establishing a connection. But Notify messages can be 
received at any time. b) is appealing, but I'm a little worried if we 
can manage the state correctly. Returning an error implies that the 
transaction is aborted, but since this error is generated internally in 
libpq, the transaction is not affected.

I spotted one more case in pqSaveParameterStatus(): if the malloc() 
there fails, it just silently skips storing the parameter, so that a 
later call to PQparameterStatus() will not find it. That also seems bad.

- Heikki




Re: BackendKeyData is mandatory?

От
Jelte Fennema-Nio
Дата:
On Fri, 8 Aug 2025 at 10:42, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I'm not sure how to best fix that. If we can't process a Notify message
> because of out of memory, what should we do?
>
> a) silently drop the Notify messsage.
> b) report an error on the next query
> c) drop the connection with the error.
>
> You went with c) in getBackendKeyData(), which makes sense since that
> happens when establishing a connection. But Notify messages can be
> received at any time. b) is appealing, but I'm a little worried if we
> can manage the state correctly. Returning an error implies that the
> transaction is aborted, but since this error is generated internally in
> libpq, the transaction is not affected.
>
> I spotted one more case in pqSaveParameterStatus(): if the malloc()
> there fails, it just silently skips storing the parameter, so that a
> later call to PQparameterStatus() will not find it. That also seems bad.

Nice finds. I think c) would be a very defensible position even for
these two. Sure these two messages might not be critical to always
receive correctly for all clients, but for some they might. And even
for the ones where it's not critical, if malloc fails for these
allocations, it's likely a next malloc for some other purpose will
also fail. Relieving memory pressure by cleaning the connection seems
a pretty sensible thing to do in such a case.

On Fri, 8 Aug 2025 at 10:42, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 08/08/2025 09:44, Jelte Fennema-Nio wrote:
> > On Fri, 8 Aug 2025 at 00:03, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >> That was not necessary for handleSyncLoss() to work, or for any other
> >> errors. If an error has occurred, PQgetResult() returns an error result,
> >> which is handled here.
> >
> > You're right. I think I simply forgot to remove that in v3 (it was
> > necessary for v2). I'd say let's remove that check and keep the error
> > path closer to the behavior in other places.
>
> Ok.
>
> I noticed that there's a similar, existing case in getNotify(), where
> libpq just hangs if an allocation fails. To simulate that, apply this
> change and use LISTEN/NOTIFY:
>
> --- a/src/interfaces/libpq/fe-protocol3.c
> +++ b/src/interfaces/libpq/fe-protocol3.c
> @@ -1587,7 +1587,7 @@ getNotify(PGconn *conn)
>         if (pqGets(&conn->workBuffer, conn))
>                 return EOF;
>         /* must save name while getting extra string */
> -       svname = strdup(conn->workBuffer.data);
> +       svname = NULL;
>         if (!svname)
>                 return EOF;
>         if (pqGets(&conn->workBuffer, conn))
>
> Returning EOF means "not enough data", which is wrong here just like in
> getBackendKeyData().
>
> I'm not sure how to best fix that. If we can't process a Notify message
> because of out of memory, what should we do?
>
> a) silently drop the Notify messsage.
> b) report an error on the next query
> c) drop the connection with the error.
>
> You went with c) in getBackendKeyData(), which makes sense since that
> happens when establishing a connection. But Notify messages can be
> received at any time. b) is appealing, but I'm a little worried if we
> can manage the state correctly. Returning an error implies that the
> transaction is aborted, but since this error is generated internally in
> libpq, the transaction is not affected.
>
> I spotted one more case in pqSaveParameterStatus(): if the malloc()
> there fails, it just silently skips storing the parameter, so that a
> later call to PQparameterStatus() will not find it. That also seems bad.
>
> - Heikki
>



Re: BackendKeyData is mandatory?

От
Heikki Linnakangas
Дата:
On 08/08/2025 11:49, Jelte Fennema-Nio wrote:
> On Fri, 8 Aug 2025 at 10:42, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> I'm not sure how to best fix that. If we can't process a Notify message
>> because of out of memory, what should we do?
>>
>> a) silently drop the Notify messsage.
>> b) report an error on the next query
>> c) drop the connection with the error.
>>
>> You went with c) in getBackendKeyData(), which makes sense since that
>> happens when establishing a connection. But Notify messages can be
>> received at any time. b) is appealing, but I'm a little worried if we
>> can manage the state correctly. Returning an error implies that the
>> transaction is aborted, but since this error is generated internally in
>> libpq, the transaction is not affected.
>>
>> I spotted one more case in pqSaveParameterStatus(): if the malloc()
>> there fails, it just silently skips storing the parameter, so that a
>> later call to PQparameterStatus() will not find it. That also seems bad.
> 
> Nice finds. I think c) would be a very defensible position even for
> these two. Sure these two messages might not be critical to always
> receive correctly for all clients, but for some they might. And even
> for the ones where it's not critical, if malloc fails for these
> allocations, it's likely a next malloc for some other purpose will
> also fail. Relieving memory pressure by cleaning the connection seems
> a pretty sensible thing to do in such a case.

Makes sense.

Digging a little deeper still, there are many more cases. These 
functions like getNotify(), getBackendKeyData(), getParameter() should 
never return EOF. If they do, it's a sign of a protocol violation, and 
we should drop the connection.

Those EOF returns are a remnants of handling protocol version 2. 
Messages in protocol version 2 didn't have a length field, and we relied 
on the functions to return EOF if there wasn't enough data in the 
buffer, and come back later with more data. But nowadays, we always read 
the whole message as indicated by the length field into memory before 
calling them. So EOF means the message length didn't agree with the 
fields within the message.

- Heikki




Re: BackendKeyData is mandatory?

От
Tom Lane
Дата:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> I noticed that there's a similar, existing case in getNotify(), where 
> libpq just hangs if an allocation fails. To simulate that, apply this 
> change and use LISTEN/NOTIFY:

> --- a/src/interfaces/libpq/fe-protocol3.c
> +++ b/src/interfaces/libpq/fe-protocol3.c
> @@ -1587,7 +1587,7 @@ getNotify(PGconn *conn)
>       if (pqGets(&conn->workBuffer, conn))
>           return EOF;
>       /* must save name while getting extra string */
> -    svname = strdup(conn->workBuffer.data);
> +    svname = NULL;
>       if (!svname)
>           return EOF;
>       if (pqGets(&conn->workBuffer, conn))

> Returning EOF means "not enough data", which is wrong here just like in 
> getBackendKeyData().

The implication of "return EOF" is "try again later", which seems like
about the best thing we can do.  If memory serves, other places that
construct query results do likewise.

            regards, tom lane



Re: BackendKeyData is mandatory?

От
Heikki Linnakangas
Дата:
On 08/08/2025 17:01, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> I noticed that there's a similar, existing case in getNotify(), where
>> libpq just hangs if an allocation fails. To simulate that, apply this
>> change and use LISTEN/NOTIFY:
> 
>> --- a/src/interfaces/libpq/fe-protocol3.c
>> +++ b/src/interfaces/libpq/fe-protocol3.c
>> @@ -1587,7 +1587,7 @@ getNotify(PGconn *conn)
>>        if (pqGets(&conn->workBuffer, conn))
>>            return EOF;
>>        /* must save name while getting extra string */
>> -    svname = strdup(conn->workBuffer.data);
>> +    svname = NULL;
>>        if (!svname)
>>            return EOF;
>>        if (pqGets(&conn->workBuffer, conn))
> 
>> Returning EOF means "not enough data", which is wrong here just like in
>> getBackendKeyData().
> 
> The implication of "return EOF" is "try again later", which seems like
> about the best thing we can do.

We could:

a) report an error,
b) silently discard the async notification and move one, or
c) disconnect.

There's another malloc() call in the same getNotify() function, and if 
that fails, we silently discard the notification and move on (i.e. 
option b). So it's inconsistent.

The current behavior is "hang until more data is received from the 
server, then try again". I think any of the other options would be 
better. There's no guarantee that more data will ever arrive, the 
connection might be used just to wait for the notification.

> If memory serves, other places that construct query results do
> likewise.
It's a mix. Most functions that are related to queries, e.g. 
getRowDescriptions(), construct an error result object. If we run out of 
space in allocating the input buffer in pqCheckInBufferSpace(), we 
disconnect.

Another weird error handling: functions like pqGetInt() and pqGets() 
don't check that you don't read past the end of the message. They do 
check that they don't read past the input buffer, but the input buffer 
might contain another message after the current one, and the functions 
will merrily read past the message boundary. We do check for that later, 
report the "message contents do not agree with length ..." message, and 
resync. But it's weird that we don't check for that earlier already.

- Heikki




Re: BackendKeyData is mandatory?

От
Jelte Fennema-Nio
Дата:
On Fri, 8 Aug 2025 at 17:50, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I think any of the other options would be
> better. There's no guarantee that more data will ever arrive, the
> connection might be used just to wait for the notification.

Yeah, I think that's the important thing to realize here. The "try
again later" only makes sense if we need more data to try again. If we
don't then we now start waiting on data that might never come.



Re: BackendKeyData is mandatory?

От
Heikki Linnakangas
Дата:
On 11/08/2025 09:58, Jelte Fennema-Nio wrote:
> On Fri, 8 Aug 2025 at 17:50, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> I think any of the other options would be
>> better. There's no guarantee that more data will ever arrive, the
>> connection might be used just to wait for the notification.
> 
> Yeah, I think that's the important thing to realize here. The "try
> again later" only makes sense if we need more data to try again. If we
> don't then we now start waiting on data that might never come.

Here's a new set of patches, to disconnect on OOM instead of hanging or 
silently discarding messages:

v4-0001-libpq-Don-t-hang-on-out-of-memory.patch

This is a very minimal fix for the "hang on OOM" issue. It changes the 
behavior to silently skip over the message, i.e. discard the async 
notification, or continue without cancellation key.

I think we should backpatch this.

v4-0002-libpq-Handle-OOMs-by-disconnecting-instead-of-sil.patch

This changes the behavior of all of the problematic places where we 
silently discarded things on OOM to disconnect instead. That is, when 
processing a Notify, BackendKeyData, or ParameterStatus message.

Now that I look at this again, we should probably forget about patch #1 
and commit and backpatch this straight away. It's a little more changes 
than patch #1, but not that much.

Note that there are still places where we discard things on OOM, like 
when parsing an error 'E' message. Those are a little iffy too, but 
they're less problematic because you still get an error, and the error 
is clearly associated with a query, unlike these Notify, BackendData and 
ParameterStatus messages.

v4-0003-libpq-Be-strict-about-cancel-key-lengths.patch

Your patch to add more checks, rebased over the first two.

- Heikki




Re: BackendKeyData is mandatory?

От
Heikki Linnakangas
Дата:
On 11/08/2025 09:58, Jelte Fennema-Nio wrote:
> On Fri, 8 Aug 2025 at 17:50, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> I think any of the other options would be
>> better. There's no guarantee that more data will ever arrive, the
>> connection might be used just to wait for the notification.
> 
> Yeah, I think that's the important thing to realize here. The "try
> again later" only makes sense if we need more data to try again. If we
> don't then we now start waiting on data that might never come.

Here's a new set of patches, to disconnect on OOM instead of hanging or 
silently discarding messages:

v4-0001-libpq-Don-t-hang-on-out-of-memory.patch

This is a very minimal fix for the "hang on OOM" issue. It changes the 
behavior to silently skip over the message, i.e. discard the async 
notification, or continue without cancellation key.

I think we should backpatch this.

v4-0002-libpq-Handle-OOMs-by-disconnecting-instead-of-sil.patch

This changes the behavior of all of the problematic places where we 
silently discarded things on OOM to disconnect instead. That is, when 
processing a Notify, BackendKeyData, or ParameterStatus message.

Now that I look at this again, we should probably forget about patch #1 
and commit and backpatch this straight away. It's a little more changes 
than patch #1, but not that much.

Note that there are still places where we discard things on OOM, like 
when parsing an error 'E' message. Those are a little iffy too, but 
they're less problematic because you still get an error, and the error 
is clearly associated with a query, unlike these Notify, BackendData and 
ParameterStatus messages.

v4-0003-libpq-Be-strict-about-cancel-key-lengths.patch

Your patch to add more checks, rebased over the first two.

- Heikki

Вложения

Re: BackendKeyData is mandatory?

От
Jelte Fennema-Nio
Дата:
On Thu, 14 Aug 2025 at 15:10, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Here's a new set of patches, to disconnect on OOM instead of hanging or
> silently discarding messages:

Code looks good. Som small nitpicks though.

This change seems unnecessary, i.e. free(NULL) is a no-op

-        free(svname);
+        if (svname)
+            free(svname);


Small wording suggestion, maybe change this:

The caller has already saved the error message in conn->errorMessage.

to

The caller should have already saved the error message in conn->errorMessage.

or even

The caller should have already saved the error message using
libpq_append_conn_error.



Re: BackendKeyData is mandatory?

От
Heikki Linnakangas
Дата:
On 19/08/2025 23:49, Jelte Fennema-Nio wrote:
> On Thu, 14 Aug 2025 at 15:10, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> Here's a new set of patches, to disconnect on OOM instead of hanging or
>> silently discarding messages:
> 
> Code looks good. Som small nitpicks though.
> 
> This change seems unnecessary, i.e. free(NULL) is a no-op
> 
> -        free(svname);
> +        if (svname)
> +            free(svname);

And even more so, this is unreachable when svname == NULL. Thanks!

> Small wording suggestion, maybe change this:
> 
> The caller has already saved the error message in conn->errorMessage.
> 
> to
> 
> The caller should have already saved the error message in conn->errorMessage.
> 
> or even
> 
> The caller should have already saved the error message using
> libpq_append_conn_error.

I kept it as is, because we use similar wording in a few other places. 
Some places do write the message directly in conn->errorMessage without 
using libpq_append_conn_error.

Pushed and backpatched to v18. I feel the OOM handling commit would be 
appropriate to backpatch further, but since it's pretty intricate code 
and we haven't gotten any complaints from the field, I only backpatched 
it to v18 for now. We can backpatch it further later if needed.

Thanks for the original patch and the review!

- Heikki