Обсуждение: binary tuple receiving patch v2

Поиск
Список
Период
Сортировка

binary tuple receiving patch v2

От
Mikko Tiihonen
Дата:
Hi,

This patch includes the code from my first try with subject
"ResultSet.getBytes() of bytea speedup batch". It also adds support for
receiving date,timestamp,timestamptz,time and timetz types using binary
encoding in addition to the bytea that my first patch supported.

My quick benchmark for binary encoded time types gave the following
results (with postgresql 8.1.5 compiled in two different date handling
modes):

intdatetime=off
txt: speed 1298.70 (max memory: 9.4MB)
bin: speed 4953.26 (max memory: 2.5MB) - over 250% faster

intdatetime=on
txt: speed 1449.28 (max memory: 9.4MB)
bin: speed 5882.35 (max memory: 2.5MB) - over 300% faster

And as my previous mail showed the binary encoding for bytea gave
20-400% speedup depending on the column size.

------

Changes since previous patch:

- added time/timetz/timestamp/timestamptz/date support
- binary encoding used only with v3 protocol
- new "binaryEncoding=0/1" connection property for controlling the
  feature - currently defaults to 1
- possibility to control binary encoding per oid type
- fixed sending Bind request when not all result set fields were
  binary

What the patch does:

After executing a prepared statement the first time the result set
field information (Field[]) is cached into the prepared query object.
Subsequent executions of the prepared statements use the field
types to decide when to request binary encoding from the server.
The caching seems safe because the Field objects are immutable.

-----

Still open questions before the patch can be integrated - any help
with answering these is appreciated:

- is there any better way to obtain the result set field information
  in QueryExecutor? (org/postgresql/jdbc2/AbstractJdbc2Statement.java)
  * could the result set type information be obtained during the
    prepare so that they would be available even on the first execute?

- in which conditions should a prepared statement drop the cached
  fields? for example: can the result set fields ever change after
  the statement has been prepared?

- How big fuzzyness should be supported by the driver. If a column is
  of type date and ResulSet.getTimestamp is called, should it work?
  * I currently support only getString on the time fields
  * I can make the getTime/getDate/getTimestamp interchangeable if
    requested. It would even be possible to use getLong for time
    fields which the text encoding never supported

- Must the different SQL time formats in java have different extreme
  small/large value handling than what postgresql internally uses?

- if caching of Field information is allowed (at least in some
  circumstances), could the driver be optimised to only request a
  Describe on the first excution of a prepared statement (or once
  during prepare)?

-Mikko

Вложения

Re: binary tuple receiving patch v2

От
Kris Jurka
Дата:

On Sun, 19 Nov 2006, Mikko Tiihonen wrote:

I have not yet looked at this patch, but hopefully I can answer some of
the questions below.

> - is there any better way to obtain the result set field information
>  in QueryExecutor? (org/postgresql/jdbc2/AbstractJdbc2Statement.java)
>  * could the result set type information be obtained during the
>    prepare so that they would be available even on the first execute?

Getting information on the prepare before execution is possible, but would
require an extra network roundtrip.  There has been a lot of speculation
about the impact of such a change, but I don't believe anyone has
measured it.  The other complication is that we often want/need to do a
Bind before Describe so we know what's coming back.  Consider "SELECT ?",
calling setInt will make the describe come back with an int type.  Without
that I think it may error with an unresolved type.  So you might need to
do:  Parse/Bind/Describe/Sync, process results of Describe,
Bind/Execute/Sync while all that happens right now in one network trip.

> - in which conditions should a prepared statement drop the cached
>  fields? for example: can the result set fields ever change after
>  the statement has been prepared?

PreparedStatement ps = conn.prepareStatement("SELECT ?");
ps.setInt(1,1);
ResultSet rs = ps.executeQuery();
ps.setLong(1,1);
rs = ps.executeQuery();

Here the returned type has changed.  In this case the driver recognized
that the input types have changed and reparses the query behind the
scenes.  I'm not sure where you are operating so you might be alright or
you might need extra work to handle this case.

> - How big fuzzyness should be supported by the driver. If a column is
>  of type date and ResulSet.getTimestamp is called, should it work?
>  * I currently support only getString on the time fields
>  * I can make the getTime/getDate/getTimestamp interchangeable if
>    requested. It would even be possible to use getLong for time
>    fields which the text encoding never supported

Yes, getTimestamp on a date should work.  Consult the JDBC spec for
details of all the conversions.  In the JDBC 3 spec, appendix B table B-6
shows all the required conversions.

> - Must the different SQL time formats in java have different extreme
>  small/large value handling than what postgresql internally uses?

Don't know what you mean.

> - if caching of Field information is allowed (at least in some
>  circumstances), could the driver be optimised to only request a
>  Describe on the first excution of a prepared statement (or once
>  during prepare)?
>

Well it does depend on the Bind parameters, but yes doing it repeatedly is
often a waste of time.


Hopefully that helped and I'll be able to look at the patch this weekend.

Kris Jurka