Обсуждение: Named Prepared statement problems and possible solutions

Поиск
Список
Период
Сортировка

Named Prepared statement problems and possible solutions

От
Dave Cramer
Дата:
Greetings,

At pgcon last week I was speaking to some people about the problem we have with connection pools and named prepared statements.

For context pgjdbc (and others) use un-named statements and then switch to named statements after using the statement N (default 5) times. In session mode this is not a problem. When the connection is closed by the application the pools generally issue "DISCARD ALL" and close all prepared statements. The next time the connection is opened the statement is prepared and all works as it should.

However one of the more interesting use cases for pgbouncer is to use "TRANSACTION MODE" to manage idle sessions. In transaction mode the connection is returned to the pool after each transaction. There are usage patterns in large applications where clients have client pools and subsequently have large numbers of connections open. Sometimes in the thousands, unfortunately many of these are idle connections. Using transaction mode reduces the number of real connections to the database in many cases by orders of magnitude.

Unfortunately this is incompatible with named prepared statements. From the client's point of view they have one session and named prepared statements are session objects. From one transaction to the next the physical connection can change along with the attached prepared statements.

The idea that was discussed is when we prepare the statement we cache it in a statement cache and return a queryid much like the queryid used in pg_stat_statements.  Instead of executing the statement name we would execute the queryid. 

If the queryid did not exist, attempting to execute it would cause an error and cause the running transaction to fail. Retrieving the statement from the query cache would have to happen before the attempt to execute it and return an error to the client subsequently the client could re-prepare the statement and execute. This would have to happen in such a way as to not cause the transaction to fail.

The one other idea that was proposed was to cache the statements in the client. However this does nothing to address the issue of managing idle connections.

Regards,
Dave Cramer

Re: Named Prepared statement problems and possible solutions

От
Konstantin Knizhnik
Дата:


On 07.06.2023 10:48 PM, Dave Cramer wrote:
Greetings,

At pgcon last week I was speaking to some people about the problem we have with connection pools and named prepared statements.

For context pgjdbc (and others) use un-named statements and then switch to named statements after using the statement N (default 5) times. In session mode this is not a problem. When the connection is closed by the application the pools generally issue "DISCARD ALL" and close all prepared statements. The next time the connection is opened the statement is prepared and all works as it should.

However one of the more interesting use cases for pgbouncer is to use "TRANSACTION MODE" to manage idle sessions. In transaction mode the connection is returned to the pool after each transaction. There are usage patterns in large applications where clients have client pools and subsequently have large numbers of connections open. Sometimes in the thousands, unfortunately many of these are idle connections. Using transaction mode reduces the number of real connections to the database in many cases by orders of magnitude.

Unfortunately this is incompatible with named prepared statements. From the client's point of view they have one session and named prepared statements are session objects. From one transaction to the next the physical connection can change along with the attached prepared statements.

The idea that was discussed is when we prepare the statement we cache it in a statement cache and return a queryid much like the queryid used in pg_stat_statements.  Instead of executing the statement name we would execute the queryid. 

If the queryid did not exist, attempting to execute it would cause an error and cause the running transaction to fail. Retrieving the statement from the query cache would have to happen before the attempt to execute it and return an error to the client subsequently the client could re-prepare the statement and execute. This would have to happen in such a way as to not cause the transaction to fail.

The one other idea that was proposed was to cache the statements in the client. However this does nothing to address the issue of managing idle connections.

Regards,
Dave Cramer


There is a PR with support of prepared statement support to pgbouncer:
https://github.com/pgbouncer/pgbouncer/pull/845
any feedback, reviews and suggestions are welcome.

Re: Named Prepared statement problems and possible solutions

От
Dave Cramer
Дата:
Hi Konstantin,

Yes, I ran into Euler at pgcon and he mentioned this. I intend to test it. I'd still like to see my proposal in the server. 

Dave Cramer


On Thu, 8 Jun 2023 at 02:15, Konstantin Knizhnik <knizhnik@garret.ru> wrote:


On 07.06.2023 10:48 PM, Dave Cramer wrote:
Greetings,

At pgcon last week I was speaking to some people about the problem we have with connection pools and named prepared statements.

For context pgjdbc (and others) use un-named statements and then switch to named statements after using the statement N (default 5) times. In session mode this is not a problem. When the connection is closed by the application the pools generally issue "DISCARD ALL" and close all prepared statements. The next time the connection is opened the statement is prepared and all works as it should.

However one of the more interesting use cases for pgbouncer is to use "TRANSACTION MODE" to manage idle sessions. In transaction mode the connection is returned to the pool after each transaction. There are usage patterns in large applications where clients have client pools and subsequently have large numbers of connections open. Sometimes in the thousands, unfortunately many of these are idle connections. Using transaction mode reduces the number of real connections to the database in many cases by orders of magnitude.

Unfortunately this is incompatible with named prepared statements. From the client's point of view they have one session and named prepared statements are session objects. From one transaction to the next the physical connection can change along with the attached prepared statements.

The idea that was discussed is when we prepare the statement we cache it in a statement cache and return a queryid much like the queryid used in pg_stat_statements.  Instead of executing the statement name we would execute the queryid. 

If the queryid did not exist, attempting to execute it would cause an error and cause the running transaction to fail. Retrieving the statement from the query cache would have to happen before the attempt to execute it and return an error to the client subsequently the client could re-prepare the statement and execute. This would have to happen in such a way as to not cause the transaction to fail.

The one other idea that was proposed was to cache the statements in the client. However this does nothing to address the issue of managing idle connections.

Regards,
Dave Cramer


There is a PR with support of prepared statement support to pgbouncer:
https://github.com/pgbouncer/pgbouncer/pull/845
any feedback, reviews and suggestions are welcome.

Re: Named Prepared statement problems and possible solutions

От
Jan Wieck
Дата:
On 6/8/23 02:15, Konstantin Knizhnik wrote:

> There is a PR with support of prepared statement support to pgbouncer:
> https://github.com/pgbouncer/pgbouncer/pull/845
> any feedback, reviews and suggestions are welcome.

I was about to say that the support would have to come from the pooler 
as it is possible to have multiple applications in different languages 
connecting to the same pool(s).

I can certainly give this a try, possibly over the weekend. I have a 
TPC-C that can use prepared statements plus pause/resume. That might be 
a good stress for it.


Best Regards, Jan



Re: Named Prepared statement problems and possible solutions

От
Dave Cramer
Дата:


On Thu, Jun 8, 2023 at 8:43 AM Jan Wieck <jan@wi3ck.info> wrote:
On 6/8/23 02:15, Konstantin Knizhnik wrote:

> There is a PR with support of prepared statement support to pgbouncer:
> https://github.com/pgbouncer/pgbouncer/pull/845
> any feedback, reviews and suggestions are welcome.

I was about to say that the support would have to come from the pooler
as it is possible to have multiple applications in different languages
connecting to the same pool(s).

Why from the pooler? If it were done at the server every client could use it?

Dave

--
Dave Cramer

Re: Named Prepared statement problems and possible solutions

От
Konstantin Knizhnik
Дата:


On 08.06.2023 3:43 PM, Jan Wieck wrote:
On 6/8/23 02:15, Konstantin Knizhnik wrote:

There is a PR with support of prepared statement support to pgbouncer:
https://github.com/pgbouncer/pgbouncer/pull/845
any feedback, reviews and suggestions are welcome.

I was about to say that the support would have to come from the pooler as it is possible to have multiple applications in different languages connecting to the same pool(s)

Ideally, support should be provided by both sides: only pooler knows mapping between clients and postgres backends and only server knows
which queries require session semantic and which not (in principle it is possible to make connection pooler to determine it, but it is very non-trivial).
.

I can certainly give this a try, possibly over the weekend. I have a TPC-C that can use prepared statements plus pause/resume. That might be a good stress for it.


By the way, I have done some small benchmarking of different connection poolers for Postgres.
Benchmark was very simple: I just create small pgbench database with scale 10 and then
run read-only queries with 100 clients:

pgbench -c 100 -P 10 -T 100 -S -M prepared postgres

Number of connections to the database was limited in an all pooler
configurations to 10. I have tested only transaction mode. If pooler supports prepared statements, I have also tested them.
Just for reference I also include results with direct connection to Postgres.
All benchamrking was done at my notebook, so it is not quite representative scenario.

 
Direct:
Connections  Prepared     TPS
10           yes       135507
10           no         73218
100          yes        79042
100          no         59245

Pooler: (100 client connections, 10 server connections, transaction mode)
Pooler         Prepared     TPS
pgbouncer      no          65029
pgbouncer-ps   no          65570
pgbouncer-ps   yes         65825
odyssey        yes         18351
odyssey        no          21299
pgagrol        no          29673
pgcat          no          23247

Re: Named Prepared statement problems and possible solutions

От
Jan Wieck
Дата:
On 6/8/23 09:21, Dave Cramer wrote:
> 
> 
> On Thu, Jun 8, 2023 at 8:43 AM Jan Wieck <jan@wi3ck.info 
> <mailto:jan@wi3ck.info>> wrote:
> 
>     On 6/8/23 02:15, Konstantin Knizhnik wrote:
> 
>      > There is a PR with support of prepared statement support to
>     pgbouncer:
>      > https://github.com/pgbouncer/pgbouncer/pull/845
>     <https://github.com/pgbouncer/pgbouncer/pull/845>
>      > any feedback, reviews and suggestions are welcome.
> 
>     I was about to say that the support would have to come from the pooler
>     as it is possible to have multiple applications in different languages
>     connecting to the same pool(s).
> 
> 
> Why from the pooler? If it were done at the server every client could 
> use it?

The server doesn't know about all the clients of the pooler, does it? It 
has no way of telling if/when a client disconnects from the pooler.


Jan



Re: Named Prepared statement problems and possible solutions

От
Jan Wieck
Дата:
On 6/8/23 09:53, Jan Wieck wrote:
> On 6/8/23 09:21, Dave Cramer wrote:
> The server doesn't know about all the clients of the pooler, does it? It
> has no way of telling if/when a client disconnects from the pooler.

Another problem that complicates doing it in the server is that the 
information require to (re-)prepare a statement in a backend that 
currently doesn't have it needs to be kept in shared memory. This 
includes the query string itself. Doing that without shared memory in a 
pooler that is multi-threaded or based on async-IO is much simpler and 
allows for easy ballooning.


Jan




Re: Named Prepared statement problems and possible solutions

От
Dave Cramer
Дата:


On Thu, 8 Jun 2023 at 09:53, Jan Wieck <jan@wi3ck.info> wrote:
On 6/8/23 09:21, Dave Cramer wrote:
>
>
> On Thu, Jun 8, 2023 at 8:43 AM Jan Wieck <jan@wi3ck.info
> <mailto:jan@wi3ck.info>> wrote:
>
>     On 6/8/23 02:15, Konstantin Knizhnik wrote:
>
>      > There is a PR with support of prepared statement support to
>     pgbouncer:
>      > https://github.com/pgbouncer/pgbouncer/pull/845
>     <https://github.com/pgbouncer/pgbouncer/pull/845>
>      > any feedback, reviews and suggestions are welcome.
>
>     I was about to say that the support would have to come from the pooler
>     as it is possible to have multiple applications in different languages
>     connecting to the same pool(s).
>
>
> Why from the pooler? If it were done at the server every client could
> use it?

The server doesn't know about all the clients of the pooler, does it? It
has no way of telling if/when a client disconnects from the pooler.

Why does it have to know if the client disconnects ? It just keeps a cache of prepared statements. 
In large apps it is very likely there will be another client wanting to use the statement

Dave 

Re: Named Prepared statement problems and possible solutions

От
Dave Cramer
Дата:




On Thu, 8 Jun 2023 at 10:31, Jan Wieck <jan@wi3ck.info> wrote:
On 6/8/23 09:53, Jan Wieck wrote:
> On 6/8/23 09:21, Dave Cramer wrote:
> The server doesn't know about all the clients of the pooler, does it? It
> has no way of telling if/when a client disconnects from the pooler.

Another problem that complicates doing it in the server is that the
information require to (re-)prepare a statement in a backend that
currently doesn't have it needs to be kept in shared memory. This
includes the query string itself. Doing that without shared memory in a
pooler that is multi-threaded or based on async-IO is much simpler and
allows for easy ballooning.


I don't expect the server to re-prepare the statement. If the server responds with "statement doesn't exist" the client would send a prepare.

Dave

Re: Named Prepared statement problems and possible solutions

От
Jan Wieck
Дата:
On 6/8/23 10:56, Dave Cramer wrote:
> 
> 
> 
> 
> On Thu, 8 Jun 2023 at 10:31, Jan Wieck <jan@wi3ck.info 
> <mailto:jan@wi3ck.info>> wrote:
> 
>     On 6/8/23 09:53, Jan Wieck wrote:
>      > On 6/8/23 09:21, Dave Cramer wrote:
>      > The server doesn't know about all the clients of the pooler, does
>     it? It
>      > has no way of telling if/when a client disconnects from the pooler.
> 
>     Another problem that complicates doing it in the server is that the
>     information require to (re-)prepare a statement in a backend that
>     currently doesn't have it needs to be kept in shared memory. This
>     includes the query string itself. Doing that without shared memory in a
>     pooler that is multi-threaded or based on async-IO is much simpler and
>     allows for easy ballooning.
> 
> 
> I don't expect the server to re-prepare the statement. If the server 
> responds with "statement doesn't exist" the client would send a prepare.

Are you proposing a new libpq protocol version?


Jan



Re: Named Prepared statement problems and possible solutions

От
Dave Cramer
Дата:


On Thu, 8 Jun 2023 at 11:15, Jan Wieck <jan@wi3ck.info> wrote:
On 6/8/23 10:56, Dave Cramer wrote:
>
>
>
>
> On Thu, 8 Jun 2023 at 10:31, Jan Wieck <jan@wi3ck.info
> <mailto:jan@wi3ck.info>> wrote:
>
>     On 6/8/23 09:53, Jan Wieck wrote:
>      > On 6/8/23 09:21, Dave Cramer wrote:
>      > The server doesn't know about all the clients of the pooler, does
>     it? It
>      > has no way of telling if/when a client disconnects from the pooler.
>
>     Another problem that complicates doing it in the server is that the
>     information require to (re-)prepare a statement in a backend that
>     currently doesn't have it needs to be kept in shared memory. This
>     includes the query string itself. Doing that without shared memory in a
>     pooler that is multi-threaded or based on async-IO is much simpler and
>     allows for easy ballooning.
>
>
> I don't expect the server to re-prepare the statement. If the server
> responds with "statement doesn't exist" the client would send a prepare.

Are you proposing a new libpq protocol version?

I believe we would need to add this to the protocol, yes.

Dave 


Jan

Re: Named Prepared statement problems and possible solutions

От
Konstantin Knizhnik
Дата:


On 08.06.2023 6:18 PM, Dave Cramer wrote:


On Thu, 8 Jun 2023 at 11:15, Jan Wieck <jan@wi3ck.info> wrote:
On 6/8/23 10:56, Dave Cramer wrote:
>
>
>
>
> On Thu, 8 Jun 2023 at 10:31, Jan Wieck <jan@wi3ck.info
> <mailto:jan@wi3ck.info>> wrote:
>
>     On 6/8/23 09:53, Jan Wieck wrote:
>      > On 6/8/23 09:21, Dave Cramer wrote:
>      > The server doesn't know about all the clients of the pooler, does
>     it? It
>      > has no way of telling if/when a client disconnects from the pooler.
>
>     Another problem that complicates doing it in the server is that the
>     information require to (re-)prepare a statement in a backend that
>     currently doesn't have it needs to be kept in shared memory. This
>     includes the query string itself. Doing that without shared memory in a
>     pooler that is multi-threaded or based on async-IO is much simpler and
>     allows for easy ballooning.
>
>
> I don't expect the server to re-prepare the statement. If the server
> responds with "statement doesn't exist" the client would send a prepare.

Are you proposing a new libpq protocol version?

I believe we would need to add this to the protocol, yes.


So it will be responsibility of client to remember text of prepared query to be able to resend it when statement doesn't exists at server?
IMHO very strange decision. Why not to handle it in connection pooler (doesn't matter - external or embedded)?

Re: Named Prepared statement problems and possible solutions

От
Dave Cramer
Дата:

On Thu, 8 Jun 2023 at 11:22, Konstantin Knizhnik <knizhnik@garret.ru> wrote:


On 08.06.2023 6:18 PM, Dave Cramer wrote:


On Thu, 8 Jun 2023 at 11:15, Jan Wieck <jan@wi3ck.info> wrote:
On 6/8/23 10:56, Dave Cramer wrote:
>
>
>
>
> On Thu, 8 Jun 2023 at 10:31, Jan Wieck <jan@wi3ck.info
> <mailto:jan@wi3ck.info>> wrote:
>
>     On 6/8/23 09:53, Jan Wieck wrote:
>      > On 6/8/23 09:21, Dave Cramer wrote:
>      > The server doesn't know about all the clients of the pooler, does
>     it? It
>      > has no way of telling if/when a client disconnects from the pooler.
>
>     Another problem that complicates doing it in the server is that the
>     information require to (re-)prepare a statement in a backend that
>     currently doesn't have it needs to be kept in shared memory. This
>     includes the query string itself. Doing that without shared memory in a
>     pooler that is multi-threaded or based on async-IO is much simpler and
>     allows for easy ballooning.
>
>
> I don't expect the server to re-prepare the statement. If the server
> responds with "statement doesn't exist" the client would send a prepare.

Are you proposing a new libpq protocol version?

I believe we would need to add this to the protocol, yes.


So it will be responsibility of client to remember text of prepared query to be able to resend it when statement doesn't exists at server?
IMHO very strange decision. Why not to handle it in connection pooler (doesn't matter - external or embedded)?

I may be myopic but in the JDBC world and I assume others we have a `PreparedStatement` object which has the text of the query.
The text is readily available to us.

Also again from the JDBC point of view we have use un-named statements normally and then name them after 5 uses so we already have embedded logic on how to deal with PreparedStatements

Dave
 

Re: Named Prepared statement problems and possible solutions

От
Jan Wieck
Дата:
On 6/8/23 13:31, Dave Cramer wrote:
> 
> On Thu, 8 Jun 2023 at 11:22, Konstantin Knizhnik <knizhnik@garret.ru 
> <mailto:knizhnik@garret.ru>> wrote:
> 

>     So it will be responsibility of client to remember text of prepared
>     query to be able to resend it when statement doesn't exists at server?
>     IMHO very strange decision. Why not to handle it in connection
>     pooler (doesn't matter - external or embedded)?
> 
> 
> I may be myopic but in the JDBC world and I assume others we have a 
> `PreparedStatement` object which has the text of the query.
> The text is readily available to us.
> 
> Also again from the JDBC point of view we have use un-named statements 
> normally and then name them after 5 uses so we already have embedded 
> logic on how to deal with PreparedStatements

The entire problem only surfaces when using a connection pool of one 
sort or another. Without one the session is persistent to the client.

At some point I created a "functional" proof of concept for a connection 
pool that did a mapping of the client side name to a pool managed server 
side name. It kept track of which query was known by a server. It kept a 
hashtable of poolname+username+query MD5 sums. On each prepare request 
it would look up if that query is known, add a query-client reference in 
another hashtable and so on. On a Bind/Exec message it would check that 
the server has the query prepared and issue a P message if not. What was 
missing was to keep track of no longer needed queries and deallocate them.

As said, it was a POC. Since it was implemented in Tcl it performed 
miserable, but I got it to the point of being able to pause & resume and 
the whole thing did work with prepared statements on the transaction 
level. So it was a full functioning POC.

What makes this design appealing to me is that it is completely 
transparent to every existing client that uses the extended query 
protocol for server side prepared statements.


Jan




Re: Named Prepared statement problems and possible solutions

От
Dave Cramer
Дата:




On Thu, 8 Jun 2023 at 15:49, Jan Wieck <jan@wi3ck.info> wrote:
On 6/8/23 13:31, Dave Cramer wrote:
>
> On Thu, 8 Jun 2023 at 11:22, Konstantin Knizhnik <knizhnik@garret.ru
> <mailto:knizhnik@garret.ru>> wrote:
>

>     So it will be responsibility of client to remember text of prepared
>     query to be able to resend it when statement doesn't exists at server?
>     IMHO very strange decision. Why not to handle it in connection
>     pooler (doesn't matter - external or embedded)?
>
>
> I may be myopic but in the JDBC world and I assume others we have a
> `PreparedStatement` object which has the text of the query.
> The text is readily available to us.
>
> Also again from the JDBC point of view we have use un-named statements
> normally and then name them after 5 uses so we already have embedded
> logic on how to deal with PreparedStatements

The entire problem only surfaces when using a connection pool of one
sort or another. Without one the session is persistent to the client.

At some point I created a "functional" proof of concept for a connection
pool that did a mapping of the client side name to a pool managed server
side name. It kept track of which query was known by a server. It kept a
hashtable of poolname+username+query MD5 sums. On each prepare request
it would look up if that query is known, add a query-client reference in
another hashtable and so on. On a Bind/Exec message it would check that
the server has the query prepared and issue a P message if not. What was
missing was to keep track of no longer needed queries and deallocate them.

As said, it was a POC. Since it was implemented in Tcl it performed
miserable, but I got it to the point of being able to pause & resume and
the whole thing did work with prepared statements on the transaction
level. So it was a full functioning POC.

What makes this design appealing to me is that it is completely
transparent to every existing client that uses the extended query
protocol for server side prepared statements.


Dave 


Jan

Re: Named Prepared statement problems and possible solutions

От
Jan Wieck
Дата:
On 6/8/23 15:57, Dave Cramer wrote:
> 
> Apparently this is coming in pgbouncer Support of prepared statements by 
> knizhnik · Pull Request #845 · pgbouncer/pgbouncer (github.com) 
> <https://github.com/pgbouncer/pgbouncer/pull/845>

I am quite interested in that patch. Considering how pgbouncer works 
internally I am very curious.


Jan