Обсуждение: PostgreSQL protocol 3 JDBC drivers, sub-protocols, and latency
Hello pgsql-jdbc users, When executing same queries from psql and from application over WAN with high latency it was easier to notice there is big difference in query execution times. Where pgsql would execute statements in latency + couple of more ms, application using latest 9.1 jdbc driver would take n * latency + couple of more ms. After turning on the logging level (see [1] for connection params) I've noticed multiple backend protocol messages being exchanged between the database and the driver, each of them taking latency + couple more ms. Reading through protocol specs (see [2]) I've found out about "extended query" sub-protocol, which adds for every query multiple frontend and backend messages. It seems newer jdbc drivers which support extended query sub-protocol are using it by default. This sub-protocol is advertised that "it might allow improvements in performance or functionality", but with high latency environment in fact it performs much worse since query execution time is (n-2) * latency bigger compared to simple query protocol, where n is number of messages extended query sub-protocol uses for each query. psql seems to be using (by default) "simple query" protocol with only one frontend and one backend message. Since I can not eliminate latency, I'd like to force pgsql-jdbc driver to use simple query sub-protocol. I couldn't find direct info in pgsql-jdcb driver docs on how to force simple query sub-protocol. Connecting info in [3] I've found out that if I put protocolVersion to 2 that simple query sub-protocol gets used and query times are on par with psql. Version 3 of postgres frontend/backend protocol added support for extended query sub-protocol, so it isn't available in protocol version 2. Is there a connection parameter in pgsql-jdbc which would configure jdbc driver to force usage of simple query sub-protocol while still using protocol version 3? Are there any (significant) down-sides in using protocol version 2 instead of protocol version 3? Thanks in advance! Regards, Stevo [1] pgsql-jdbc driver specific connection parameters http://jdbc.postgresql.org/documentation/80/connect.html#connection-parameters [2] Postgres Frontend/Backend protocol http://www.postgresql.org/docs/9.1/static/protocol-overview.html [3] Summary of changes in Postgres Frontend/Backend protocol between version 2 and 3 http://www.postgresql.org/docs/9.1/static/protocol-changes.html
On 19 November 2011 00:21, Stevo Slavić <sslavic@gmail.com> wrote: > It seems newer jdbc drivers which support extended query sub-protocol are using it by default. Not "newer" drivers - it's been doing that since 8.0, which is very old now. What driver versions are you comparing to? > This sub-protocol is advertised that "it might > allow improvements in performance or functionality", but with high > latency environment in fact it performs much worse since query > execution time is (n-2) * latency bigger compared to simple query > protocol, where n is number of messages extended query sub-protocol > uses for each query. psql seems to be using (by default) "simple > query" protocol with only one frontend and one backend message. There's no reason the extended query protocol needs to be worse. in general the driver tries to send many messages without waiting for responses, so while there are several logical messages involved, there are few round trips. Basically you will be seeing one round trip per Sync or Flush message. How many of those do you see? I know I tried to minimize the chattiness of the original implementation exactly to avoid problems with high-latency connections, but it may have got worse over time since it's not a case that many people appear to be using. The original implementation would end up doing something like this most of the time: send Parse, Describe, Execute, Sync wait for results which is no worse than the simple query protocol for latency. > Are there any (significant) down-sides in using protocol version 2 > instead of protocol version 3? Yes, you'll lose all sorts of random features, using v2 should be a last resort. Oliver
Hello Oliver, Thanks for replying! Yes, I'm aware that newer is not that new, I guess it's ever since v3 protocol was introduced, with varying behavior. Here is the log output from 9.1-901 JDBC 4 driver with v3 frontend/backend protocol for a very simple query and 60ms latency (simulated in one way only, from db back to db client): 19:59:40.057 (1) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@7f72455, maxRows=0, fetchSize=0, flags=17 19:59:40.057 (1) FE=> Parse(stmt=null,query="select nextval('hibernate_sequence')",oids={}) 19:59:40.057 (1) FE=> Bind(stmt=null,portal=null) 19:59:40.057 (1) FE=> Describe(portal=null) 19:59:40.057 (1) FE=> Execute(portal=null,limit=0) 19:59:40.057 (1) FE=> Sync 19:59:40.120 (1) <=BE ParseComplete [null] 19:59:40.180 (1) <=BE BindComplete [null] 19:59:40.240 (1) <=BE RowDescription(1) 19:59:40.300 (1) <=BE DataRow 19:59:40.300 (1) <=BE CommandStatus(SELECT 1) 19:59:40.360 (1) <=BE ReadyForQuery(I) Maybe simulated latency is serializing things which should be occurring concurrently. But wouldn't that happen for DataRow and CommandStatus backend message too? As soon as I configured protocolVersion=2, I got 5 times better performance, but another issue popped-up, defaults for parameters seem to be different in v2 compared to v3, but found a quick workaround. Regards, Stevo. On Fri, Nov 18, 2011 at 7:26 AM, Oliver Jowett <oliver@opencloud.com> wrote: > On 19 November 2011 00:21, Stevo Slavić <sslavic@gmail.com> wrote: > >> It seems newer jdbc drivers which support extended query sub-protocol are using it by default. > > Not "newer" drivers - it's been doing that since 8.0, which is very > old now. What driver versions are you comparing to? > >> This sub-protocol is advertised that "it might >> allow improvements in performance or functionality", but with high >> latency environment in fact it performs much worse since query >> execution time is (n-2) * latency bigger compared to simple query >> protocol, where n is number of messages extended query sub-protocol >> uses for each query. psql seems to be using (by default) "simple >> query" protocol with only one frontend and one backend message. > > There's no reason the extended query protocol needs to be worse. in > general the driver tries to send many messages without waiting for > responses, so while there are several logical messages involved, there > are few round trips. Basically you will be seeing one round trip per > Sync or Flush message. How many of those do you see? I know I tried to > minimize the chattiness of the original implementation exactly to > avoid problems with high-latency connections, but it may have got > worse over time since it's not a case that many people appear to be > using. The original implementation would end up doing something like > this most of the time: > > send Parse, Describe, Execute, Sync > wait for results > > which is no worse than the simple query protocol for latency. > >> Are there any (significant) down-sides in using protocol version 2 >> instead of protocol version 3? > > Yes, you'll lose all sorts of random features, using v2 should be a last resort. > > Oliver >
All is good, false alarm, simulation of latency was wrong, now after applying proper latency (using http://www.linuxfoundation.org/collaborate/workgroups/networking/netem ) with v3 protocol log for same query looks like: 19:39:31.392 (1) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@15dfd77, maxRows=0, fetchSize=0, fl ags=17 19:39:31.393 (1) FE=> Parse(stmt=null,query="select nextval('hibernate_sequence')",oids={}) 19:39:31.394 (1) FE=> Bind(stmt=null,portal=null) 19:39:31.396 (1) FE=> Describe(portal=null) 19:39:31.397 (1) FE=> Execute(portal=null,limit=0) 19:39:31.398 (1) FE=> Sync 19:39:31.460 (1) <=BE ParseComplete [null] 19:39:31.460 (1) <=BE BindComplete [null] 19:39:31.461 (1) <=BE RowDescription(1) 19:39:31.462 (1) <=BE DataRow 19:39:31.463 (1) <=BE CommandStatus(SELECT) 19:39:31.464 (1) <=BE ReadyForQuery(I) Thanks again for great support! Regards, Stevo. On Fri, Nov 18, 2011 at 9:00 AM, Stevo Slavić <sslavic@gmail.com> wrote: > Hello Oliver, > > Thanks for replying! > > Yes, I'm aware that newer is not that new, I guess it's ever since v3 > protocol was introduced, with varying behavior. > > Here is the log output from 9.1-901 JDBC 4 driver with v3 > frontend/backend protocol for a very simple query and 60ms latency > (simulated in one way only, from db back to db client): > > 19:59:40.057 (1) simple execute, > handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@7f72455, > maxRows=0, fetchSize=0, flags=17 > 19:59:40.057 (1) FE=> Parse(stmt=null,query="select > nextval('hibernate_sequence')",oids={}) > 19:59:40.057 (1) FE=> Bind(stmt=null,portal=null) > 19:59:40.057 (1) FE=> Describe(portal=null) > 19:59:40.057 (1) FE=> Execute(portal=null,limit=0) > 19:59:40.057 (1) FE=> Sync > 19:59:40.120 (1) <=BE ParseComplete [null] > 19:59:40.180 (1) <=BE BindComplete [null] > 19:59:40.240 (1) <=BE RowDescription(1) > 19:59:40.300 (1) <=BE DataRow > 19:59:40.300 (1) <=BE CommandStatus(SELECT 1) > 19:59:40.360 (1) <=BE ReadyForQuery(I) > > Maybe simulated latency is serializing things which should be > occurring concurrently. But wouldn't that happen for DataRow and > CommandStatus backend message too? > > As soon as I configured protocolVersion=2, I got 5 times better > performance, but another issue popped-up, defaults for parameters seem > to be different in v2 compared to v3, but found a quick workaround. > > Regards, > Stevo. > > On Fri, Nov 18, 2011 at 7:26 AM, Oliver Jowett <oliver@opencloud.com> wrote: >> On 19 November 2011 00:21, Stevo Slavić <sslavic@gmail.com> wrote: >> >>> It seems newer jdbc drivers which support extended query sub-protocol are using it by default. >> >> Not "newer" drivers - it's been doing that since 8.0, which is very >> old now. What driver versions are you comparing to? >> >>> This sub-protocol is advertised that "it might >>> allow improvements in performance or functionality", but with high >>> latency environment in fact it performs much worse since query >>> execution time is (n-2) * latency bigger compared to simple query >>> protocol, where n is number of messages extended query sub-protocol >>> uses for each query. psql seems to be using (by default) "simple >>> query" protocol with only one frontend and one backend message. >> >> There's no reason the extended query protocol needs to be worse. in >> general the driver tries to send many messages without waiting for >> responses, so while there are several logical messages involved, there >> are few round trips. Basically you will be seeing one round trip per >> Sync or Flush message. How many of those do you see? I know I tried to >> minimize the chattiness of the original implementation exactly to >> avoid problems with high-latency connections, but it may have got >> worse over time since it's not a case that many people appear to be >> using. The original implementation would end up doing something like >> this most of the time: >> >> send Parse, Describe, Execute, Sync >> wait for results >> >> which is no worse than the simple query protocol for latency. >> >>> Are there any (significant) down-sides in using protocol version 2 >>> instead of protocol version 3? >> >> Yes, you'll lose all sorts of random features, using v2 should be a last resort. >> >> Oliver >> >