Обсуждение: PostgreSQL protocol 3 JDBC drivers, sub-protocols, and latency

Поиск
Список
Период
Сортировка

PostgreSQL protocol 3 JDBC drivers, sub-protocols, and latency

От
Stevo Slavić
Дата:
Hello pgsql-jdbc users,

When executing same queries from psql and from application over WAN
with high latency it was easier to notice there is big difference in
query execution times. Where pgsql would execute statements in latency
+ couple of more ms, application using latest 9.1 jdbc driver would
take n * latency + couple of more ms. After turning on the logging
level (see [1] for connection params) I've noticed multiple backend
protocol messages being exchanged between the database and the driver,
each of them taking latency + couple more ms. Reading through protocol
specs (see [2]) I've found out about "extended query" sub-protocol,
which adds for every query multiple frontend and backend messages. It
seems newer jdbc drivers which support extended query sub-protocol are
using it by default. This sub-protocol is advertised that "it might
allow improvements in performance or functionality", but with high
latency environment in fact it performs much worse since query
execution time is (n-2) * latency bigger compared to simple query
protocol, where n is number of messages extended query sub-protocol
uses for each query. psql seems to be using (by default) "simple
query" protocol with only one frontend and one backend message.

Since I can not eliminate latency, I'd like to force pgsql-jdbc driver
to use simple query sub-protocol. I couldn't find direct info in
pgsql-jdcb driver docs on how to force simple query sub-protocol.
Connecting info in [3] I've found out that if I put protocolVersion to
2 that simple query sub-protocol gets used and query times are on par
with psql. Version 3 of postgres frontend/backend protocol added
support for extended query sub-protocol, so it isn't available in
protocol version 2.

Is there a connection parameter in pgsql-jdbc which would configure
jdbc driver to force usage of simple query sub-protocol while still
using protocol version 3?
Are there any (significant) down-sides in using protocol version 2
instead of protocol version 3?

Thanks in advance!

Regards,
Stevo


[1] pgsql-jdbc driver specific connection parameters
http://jdbc.postgresql.org/documentation/80/connect.html#connection-parameters
[2] Postgres Frontend/Backend protocol
http://www.postgresql.org/docs/9.1/static/protocol-overview.html
[3] Summary of changes in Postgres Frontend/Backend protocol between
version 2 and 3
http://www.postgresql.org/docs/9.1/static/protocol-changes.html

Re: PostgreSQL protocol 3 JDBC drivers, sub-protocols, and latency

От
Oliver Jowett
Дата:
On 19 November 2011 00:21, Stevo Slavić <sslavic@gmail.com> wrote:

> It seems newer jdbc drivers which support extended query sub-protocol are using it by default.

Not "newer" drivers - it's been doing that since 8.0, which is very
old now. What driver versions are you comparing to?

> This sub-protocol is advertised that "it might
> allow improvements in performance or functionality", but with high
> latency environment in fact it performs much worse since query
> execution time is (n-2) * latency bigger compared to simple query
> protocol, where n is number of messages extended query sub-protocol
> uses for each query. psql seems to be using (by default) "simple
> query" protocol with only one frontend and one backend message.

There's no reason the extended query protocol needs to be worse. in
general the driver tries to send many messages without waiting for
responses, so while there are several logical messages involved, there
are few round trips. Basically you will be seeing one round trip per
Sync or Flush message. How many of those do you see? I know I tried to
minimize the chattiness of the original implementation exactly to
avoid problems with high-latency connections, but it may have got
worse over time since it's not a case that many people appear to be
using. The original implementation would end up doing something like
this most of the time:

send Parse, Describe, Execute, Sync
wait for results

which is no worse than the simple query protocol for latency.

> Are there any (significant) down-sides in using protocol version 2
> instead of protocol version 3?

Yes, you'll lose all sorts of random features, using v2 should be a last resort.

Oliver

Re: PostgreSQL protocol 3 JDBC drivers, sub-protocols, and latency

От
Stevo Slavić
Дата:
Hello Oliver,

Thanks for replying!

Yes, I'm aware that newer is not that new, I guess it's ever since v3
protocol was introduced, with varying behavior.

Here is the log output from 9.1-901 JDBC 4 driver with v3
frontend/backend protocol for a very simple query and 60ms latency
(simulated in one way only, from db back to db client):

19:59:40.057 (1) simple execute,
handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@7f72455,
maxRows=0, fetchSize=0, flags=17
19:59:40.057 (1)  FE=> Parse(stmt=null,query="select
nextval('hibernate_sequence')",oids={})
19:59:40.057 (1)  FE=> Bind(stmt=null,portal=null)
19:59:40.057 (1)  FE=> Describe(portal=null)
19:59:40.057 (1)  FE=> Execute(portal=null,limit=0)
19:59:40.057 (1)  FE=> Sync
19:59:40.120 (1)  <=BE ParseComplete [null]
19:59:40.180 (1)  <=BE BindComplete [null]
19:59:40.240 (1)  <=BE RowDescription(1)
19:59:40.300 (1)  <=BE DataRow
19:59:40.300 (1)  <=BE CommandStatus(SELECT 1)
19:59:40.360 (1)  <=BE ReadyForQuery(I)

Maybe simulated latency is serializing things which should be
occurring concurrently. But wouldn't that happen for DataRow and
CommandStatus backend message too?

As soon as I configured protocolVersion=2, I got 5 times better
performance, but another issue popped-up, defaults for parameters seem
to be different in v2 compared to v3, but found a quick workaround.

Regards,
Stevo.

On Fri, Nov 18, 2011 at 7:26 AM, Oliver Jowett <oliver@opencloud.com> wrote:
> On 19 November 2011 00:21, Stevo Slavić <sslavic@gmail.com> wrote:
>
>> It seems newer jdbc drivers which support extended query sub-protocol are using it by default.
>
> Not "newer" drivers - it's been doing that since 8.0, which is very
> old now. What driver versions are you comparing to?
>
>> This sub-protocol is advertised that "it might
>> allow improvements in performance or functionality", but with high
>> latency environment in fact it performs much worse since query
>> execution time is (n-2) * latency bigger compared to simple query
>> protocol, where n is number of messages extended query sub-protocol
>> uses for each query. psql seems to be using (by default) "simple
>> query" protocol with only one frontend and one backend message.
>
> There's no reason the extended query protocol needs to be worse. in
> general the driver tries to send many messages without waiting for
> responses, so while there are several logical messages involved, there
> are few round trips. Basically you will be seeing one round trip per
> Sync or Flush message. How many of those do you see? I know I tried to
> minimize the chattiness of the original implementation exactly to
> avoid problems with high-latency connections, but it may have got
> worse over time since it's not a case that many people appear to be
> using. The original implementation would end up doing something like
> this most of the time:
>
> send Parse, Describe, Execute, Sync
> wait for results
>
> which is no worse than the simple query protocol for latency.
>
>> Are there any (significant) down-sides in using protocol version 2
>> instead of protocol version 3?
>
> Yes, you'll lose all sorts of random features, using v2 should be a last resort.
>
> Oliver
>

Re: PostgreSQL protocol 3 JDBC drivers, sub-protocols, and latency

От
Stevo Slavić
Дата:
All is good, false alarm, simulation of latency was wrong, now after
applying proper latency (using
http://www.linuxfoundation.org/collaborate/workgroups/networking/netem
) with v3 protocol log for same query looks like:

19:39:31.392 (1) simple execute,
handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@15dfd77,
maxRows=0, fetchSize=0, fl
ags=17
19:39:31.393 (1)  FE=> Parse(stmt=null,query="select
nextval('hibernate_sequence')",oids={})
19:39:31.394 (1)  FE=> Bind(stmt=null,portal=null)
19:39:31.396 (1)  FE=> Describe(portal=null)
19:39:31.397 (1)  FE=> Execute(portal=null,limit=0)
19:39:31.398 (1)  FE=> Sync
19:39:31.460 (1)  <=BE ParseComplete [null]
19:39:31.460 (1)  <=BE BindComplete [null]
19:39:31.461 (1)  <=BE RowDescription(1)
19:39:31.462 (1)  <=BE DataRow
19:39:31.463 (1)  <=BE CommandStatus(SELECT)
19:39:31.464 (1)  <=BE ReadyForQuery(I)

Thanks again for great support!

Regards,
Stevo.

On Fri, Nov 18, 2011 at 9:00 AM, Stevo Slavić <sslavic@gmail.com> wrote:
> Hello Oliver,
>
> Thanks for replying!
>
> Yes, I'm aware that newer is not that new, I guess it's ever since v3
> protocol was introduced, with varying behavior.
>
> Here is the log output from 9.1-901 JDBC 4 driver with v3
> frontend/backend protocol for a very simple query and 60ms latency
> (simulated in one way only, from db back to db client):
>
> 19:59:40.057 (1) simple execute,
> handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@7f72455,
> maxRows=0, fetchSize=0, flags=17
> 19:59:40.057 (1)  FE=> Parse(stmt=null,query="select
> nextval('hibernate_sequence')",oids={})
> 19:59:40.057 (1)  FE=> Bind(stmt=null,portal=null)
> 19:59:40.057 (1)  FE=> Describe(portal=null)
> 19:59:40.057 (1)  FE=> Execute(portal=null,limit=0)
> 19:59:40.057 (1)  FE=> Sync
> 19:59:40.120 (1)  <=BE ParseComplete [null]
> 19:59:40.180 (1)  <=BE BindComplete [null]
> 19:59:40.240 (1)  <=BE RowDescription(1)
> 19:59:40.300 (1)  <=BE DataRow
> 19:59:40.300 (1)  <=BE CommandStatus(SELECT 1)
> 19:59:40.360 (1)  <=BE ReadyForQuery(I)
>
> Maybe simulated latency is serializing things which should be
> occurring concurrently. But wouldn't that happen for DataRow and
> CommandStatus backend message too?
>
> As soon as I configured protocolVersion=2, I got 5 times better
> performance, but another issue popped-up, defaults for parameters seem
> to be different in v2 compared to v3, but found a quick workaround.
>
> Regards,
> Stevo.
>
> On Fri, Nov 18, 2011 at 7:26 AM, Oliver Jowett <oliver@opencloud.com> wrote:
>> On 19 November 2011 00:21, Stevo Slavić <sslavic@gmail.com> wrote:
>>
>>> It seems newer jdbc drivers which support extended query sub-protocol are using it by default.
>>
>> Not "newer" drivers - it's been doing that since 8.0, which is very
>> old now. What driver versions are you comparing to?
>>
>>> This sub-protocol is advertised that "it might
>>> allow improvements in performance or functionality", but with high
>>> latency environment in fact it performs much worse since query
>>> execution time is (n-2) * latency bigger compared to simple query
>>> protocol, where n is number of messages extended query sub-protocol
>>> uses for each query. psql seems to be using (by default) "simple
>>> query" protocol with only one frontend and one backend message.
>>
>> There's no reason the extended query protocol needs to be worse. in
>> general the driver tries to send many messages without waiting for
>> responses, so while there are several logical messages involved, there
>> are few round trips. Basically you will be seeing one round trip per
>> Sync or Flush message. How many of those do you see? I know I tried to
>> minimize the chattiness of the original implementation exactly to
>> avoid problems with high-latency connections, but it may have got
>> worse over time since it's not a case that many people appear to be
>> using. The original implementation would end up doing something like
>> this most of the time:
>>
>> send Parse, Describe, Execute, Sync
>> wait for results
>>
>> which is no worse than the simple query protocol for latency.
>>
>>> Are there any (significant) down-sides in using protocol version 2
>>> instead of protocol version 3?
>>
>> Yes, you'll lose all sorts of random features, using v2 should be a last resort.
>>
>> Oliver
>>
>