Обсуждение: Re: [JDBC] Prepared statement performance...

Поиск
Список
Период
Сортировка

Re: [JDBC] Prepared statement performance...

От
Barry Lind
Дата:
Aaron,

It is a bit more complex than just looking for a ; since a ; is a valid
character in a quoted string.  But as Bruce has mentioned in a followup,
psql does it so it is doable.

I think the real question here is when does it make sense to use server
side prepared statements.  In the little bit of testing I have done, I
would say the answer is rarely.  You need many factors to come into
place for it to make sense to use server side prepared statements:

1) The statement needs to be big and complex such that there is
significant overhead in the parsing and planning stages of execution.
The query that I test with is about 3K in size and joins about 10
different tables and includes a union.  In this case there is a
significant overhead involved in both parsing and planning.   However
for a simple query that affects only one or two tables there is little
overhead.

2) The statement object needs to be reused multiple times.  Using a
server prepared statement requires at a minimum three sql statements to
be executed to do the work of one original statement:
select foo from bar;
becomes
prepare <name> as select foo from bar;
execute <name>;
deallocate <name>;

We can do the first two together in one roundtrip to the server, but the
last one requires a separate roundtrip.  So if you are only using a the
statement object/query only once then using server side prepared
statements will make performance worse than not.  The vast majority of
statement objects are created, executed once and then closed.  I don't
think it makes sence to turn on a feature that will make the overall
performance for most users worse.  There is very little jdbc code that I
have seen that creates a statement, call execute multiple times on that
same statement object and then finally closes it.

This is the first version of the feature.  Improvements will come with
implementation feedback and I welcome all feedback.  But there are
reasons it is implemented the way it is.  In the long term what I would
really like to see is this functionality done at the BE/FE protocol
level instead of at the sql level as I have seen other databases do.

Someone has suggestted a jdbc url arguement to enable the functionality
by default and that is probably a good idea if we learn from real use
that it makes sense to have all statements use this.  But I am not
convinced (given my reasons above) that this really makes sense.

thanks,
--Barry

Aaron Mulder wrote:
> Dave, Barry, et al.:
>
>
>>Unless of course we were to write a
>>sql parser in the jdbc driver so that we could predetermine which
>>statements should use the prepare mechanism or not.
>
>
>     That's a bit of an exaggeration -- all you need to check in the
> JDBC driver is whether there's an unescaped ; to see if there's more than
> one statement, right?  It wouldn't be unreasonable to default to enabling
> the feature unless there's a ; in the statement (or unless there's a ;
> with anything but whitespace afterward, if you like) -- it wouldn't even
> be a disaster to forget escaping and just decide based on a ; in the
> statement period -- few enough statements have an escaped ; that the
> suboptimal performance in that case won't matter.
>     I'm a little concerned about how this feature interacts with
> connection pooling as currently implemented -- it probably needs to be
> reset to a known state every time a connection is returned to the pool.
> Which is OK for the PostgreSQL implementation (which you'll note is not
> supposed to be used in an app server environment), but impossible for all
> the app servers out there which won't take driver-specific steps when
> connections are returned to their generic pools.  Furthermore, with the
> default behavior set to "disabled", you're reducing performance (you
> quoted 60%?) in the application server environment, when there will never
> be multiple statements executed in the same call, but there will
> frequently be multiple identical statements executed on the same
> PreparedStatement, but there's no available way to reset the flag, short
> of writing a wrapper driver to stand between the app server and the real
> driver.  On the other hand, if you set the default to a global "enabled",
> it breaks DBVisualizer and other development tools which currently allow
> multiple statements by default.
>     IMHO, it would be better to have the feature enabled for queries
> where it's appropriate, and disabled where not, and let the programmer
> specify a global override where they absolutely must.  That way
> DB-specific actions are only required in truly exceptional cases, and both
> the app servers and the tools behave properly by default.
>     If you absolutely object, I think we should add a property to the
> Driver and the ConnectionPoolDataSource so that in an app server
> environment you can set the property to enabled at configuration time.
> But again, I would prefer smarter default behavior.
>
> Aaron
>
> On 27 Sep 2002, Dave Cramer wrote:
>
>>Peter,
>>
>>I have to agree with Barry here. Since it is possible to send two (or
>>more ) statements to the backend, the programmer needs some mechanism
>>for enabling/disabling this feature. Unless of course we were to write a
>>sql parser in the jdbc driver so that we could predetermine which
>>statements should use the prepare mechanism or not.
>>
>>I often use prepared statements instead of simple statements even when
>>only doing one select. The difference being that I can use setXXX to set
>>the parameters in the statment. Which is particularly important for
>>setDate, setTime, setTimestamp
>>
>>And you would be amazed at the unique (irregular) coding practices which
>>exist out there, so having the default do something sane is a GOOD
>>THING.
>>
>>Dave
>>
>>
>>On Fri, 2002-09-27 at 02:52, Peter Kovacs wrote:
>>
>>>Barry,
>>>
>>>It's great to have this functionality.
>>>
>>>However, I am surprised about the need/requirement to explicitely turning on
>>>this feature in the JDBC driver. As you explain, the reason for this
>>>behaviour is to support the use of multiple statements with the
>>>prepareStatement method. But this is supporting incorrect us of the JDBC
>>>interface, since the JDBC API says that the parameter of the
>>>prepareStatement method is one(!) SQL statetment. On the other hand, it is
>>>clear that the intended purpose of the prepareStatement method is to use
>>>prepared statment -- exclusively and not by default. For statements executed
>>>only once, the designers of the API made available the method
>>>"createStatment".
>>>
>>>So the buttom line is that you have an irregular behaviour in support of an
>>>irregular coding practice. I understand that there is "legacy" code that
>>>needs to use the prepareStatement method with multiple SQL statements. But
>>>since such practice was already incorrect at the time of its inception, the
>>>right way would be to allow a transition from bad to good by requiring the
>>>"sinners" to explicitly use some kind of
>>>"org.postgresql.PGStatement.setUseServerPrepare(false)" tweaking in their
>>>code, and let new coders use standard code.
>>>
>>>Of course, you're running the show, so you will do it as you deem right, but
>>>IMHO the implementation as you described it is clearly not reasonable.
>>>
>>>Cheers,
>>>
>>>Peter
>>>
>>>----- Original Message -----
>>>From: "Barry Lind" <barry@xythos.com>
>>>To: "Dmitry Tkach" <dmitry@openratings.com>
>>>Cc: <pgsql-general@postgresql.org>; <pgsql-jdbc@postgresql.org>
>>>Sent: Thursday, September 26, 2002 10:56 PM
>>>Subject: Re: [JDBC] [GENERAL] Prepared statement performance...
>>>
>>>
>>>
>>>>Dimitry,
>>>>
>>>>I have some good news for you.  All of this functionality is in 7.3 of
>>>>the server and the current development build of the jdbc driver.  Of
>>>>course 7.3 isn't quite production yet, but beta 2 should be out this week.
>>>>
>>>>Your first option is what has been implemented.  In 7.3 the server now
>>>>supports 'prepare <name> as <sql>' and 'execute <name>' (and variants
>>>>for passing bind variables).  The jdbc driver also now has a method on
>>>>org.postgresql.PGStatement.setUseServerPrepare(true) to enable the the
>>>>use of server side prepared statements (so you can cast your Statement
>>>>object to an org.postgresql.PGStatement object and enable the
>>>>functionality for a specific statement).
>>>>
>>>>This hasn't yet made its way into the jdbc documentation for 7.3, but
>>>>this will be documented before 7.3 goes production.
>>>>
>>>>This implementation does still require that you make some changes to
>>>>your app in order to enable it but that is by design for the following
>>>>reasons:
>>>>    1) In order to be useful you need to reuse a statement object
>>>>multiple times.  Most code generally will just create a statement and
>>>>use it only once, and since more work is being done to prepare the
>>>>statement that would have a negative performance impact for the majority
>>>>of statements being executed in normal code.
>>>>    2) Prepare functionality will not work for all types of sql that you
>>>>can execute through the jdbc driver.  For example you can quite easily
>>>>do the following:  conn.prepareStatement("update foo set a = ?; insert
>>>>into bar values (?,?,?);");
>>>>By issuing both sql statements in one call only one network roundtrip is
>>>>needed to execute both statements providing what is likely a more
>>>>performant application.  However this doesn't work with prepare because:
>>>>prepare foo as update foo set a = ?; insert into bar values(?,?,?);   -
>>>>is one prepared statement and one regular statement which will cause
>>>>very strange things to happen.
>>>>
>>>>It is for the reasons above that the developer needs to explicitly turn
>>>>on the use of server side prepared statements when it makes sense to do
>>>
>>>so.
>>>
>>>>thanks,
>>>>--Barry
>>>>
>>>>PS.  I would really appreciate it if you could test this new
>>>>functionality.  As far as I know I am the only one who has used it.  But
>>>>for one of my frequently used and complex sql statements the time to
>>>>execute dropped by 60% using server side prepared statements.
>>>>
>>>>
>>>>
>>>>Dmitry Tkach wrote:
>>>> > Hi, everybody.
>>>> >
>>>> > I am running into huge performance problems, due to JDBC not being able
>>>> > to cache query plans :-(
>>>> > My java program runs the (set of about 5) identical statements with
>>>> > different parameters for about 30 million times...
>>>> > What I am talking about below has to do with JDBC, but is not really
>>>> > specific to it - it seems to me, that the backend itself
>>>> > could be improved to better handle this kind of scenario, when the same
>>>> > statement is executed many times in the same session.
>>>> >
>>>> > It takes about 3 weeks(!) to run, and it looks like about *half* of
>>>
>>>that
>>>
>>>> > time is spent by the query planner,
>>>> > creating and recreating the query plans every time I ran the damn
>>>> > statement....
>>>> >
>>>> > I am looking into implementing some kind of a solution, that would let
>>>> > me work around that problem...
>>>> > So far, I only see two possibilities:
>>>> >
>>>> > - a general solution, that would involve extending postgres SQL gramma
>>>> > to include a 'prepare' statement
>>>> > - or am ugly work around, that would involve moving all my sql
>>>> > statements into stored procedures, and have
>>>> >   those cache the query plans inside...
>>>> >
>>>> > The second solution is not only ugly (because it requires the
>>>> > application code to be changed and to have a specialized
>>>> > stored procedure for every query), but also requires some additional
>>>> > hacks (to overcome the hard limit on the number of
>>>> > function arguments and the inability for functions to return tuples) -
>>>> > the only way I imagine this can be made to work is
>>>> > to glue all the arguments together into a text string, and have the
>>>> > stored procedure parse it back, execute the query, then
>>>> > glue the resulting tuple(s) into another text string, return it, and
>>>> > have the application (or, perhaps, JDBC layer) parse it back
>>>> > into columns...
>>>> >
>>>> > I was wonderring if anybody has any better ideas how this can be made
>>>
>>>to
>>>
>>>> > work (I am looking for a solution that would
>>>> > minimize changes to the existing JDBC applications that use
>>>> > PreparedStatements)?
>>>> >
>>>> > If the maintainers of the involved code are interested, I would be
>>>> > willing to implement and contribute the solution we come up with ...
>>>> > (I figure, nobody would really be interested in getting that second
>>>> > solution I mentioned into the mainstream :-), but, if we are
>>>> > able to come up with something more general and less ugly, perhaps, I
>>>
>>>am
>>>
>>>> > not the only one who would be able to contribute from
>>>> > it)...
>>>> >
>>>> > For example, I believe, it should not be too complicated to implement
>>>> > that first possibility I described above...
>>>> > The way I see it would involve adding two statements to postgres SQL
>>>> > syntax:
>>>> >
>>>> > prepare <name> as <sql statement>
>>>> > and
>>>> > execute <name> with (<parameter list>)
>>>> >
>>>> > For example:
>>>> >
>>>> > prepare mystatement as select * from mytable where id = $1 and name
>>>
>>>like
>>>
>>>> > $2;
>>>> >
>>>> > and then
>>>> >
>>>> > execute mystatement with (1, 'Dima');
>>>> > execute mystatement with (2, 'John');
>>>> >
>>>> > etc....
>>>> >
>>>> > The JDBC driver would then send the 'prepare' command to the backend in
>>>> > Connection.prepareStatement (), and
>>>> > use the 'execute' in PreparedStatement.execute ();
>>>> >
>>>> > One potential problem with implementation I see here is that the query
>>>> > planner wants to know the argument types ahead of time...
>>>> > I guess, I could get around that by making all the arguments 'text',
>>>
>>>and
>>>
>>>> > having them casted into the right types when the
>>>> > statement is actually executed.
>>>> >
>>>> > There is, probably a need to also have some kind of a 'close' command
>>>
>>>to
>>>
>>>> > throw away the prepared query plans... or we could just
>>>> > make them last forever until, say, the end of transaction (or, perhaps,
>>>> > the end of the session?)...
>>>> >
>>>> > If there is anyone interested in discussing various possibilities, and
>>>> > getting this implemented one way or another,
>>>> >  I would like to hear from you!
>>>> >
>>>> > Thanks!
>>>> >
>>>> > Dima.
>>>> >
>>>> >
>>>> >
>>>> > ---------------------------(end of
>>>
>>>broadcast)---------------------------
>>>
>>>> > TIP 2: you can get off all lists at once with the unregister command
>>>> >    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>>>> >
>>>>
>>>>
>>>>
>>>>---------------------------(end of broadcast)---------------------------
>>>>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>>>>
>>>
>>>
>>>
>>>---------------------------(end of broadcast)---------------------------
>>>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>>>
>>>
>>
>>
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 5: Have you checked our extensive FAQ?
>>
>>http://www.postgresql.org/users-lounge/docs/faq.html
>>
>
>
>


Re: [JDBC] Prepared statement performance...

От
Neil Conway
Дата:
Barry Lind <barry@xythos.com> writes:
> It is a bit more complex than just looking for a ; since a ; is a
> valid character in a quoted string.  But as Bruce has mentioned in a
> followup, psql does it so it is doable.
>
> I think the real question here is when does it make sense to use
> server side prepared statements.  In the little bit of testing I have
> done, I would say the answer is rarely.  You need many factors to come
> into place for it to make sense to use server side prepared statements:
>
> 1) The statement needs to be big and complex such that there is
> significant overhead in the parsing and planning stages of
> execution. The query that I test with is about 3K in size and joins
> about 10 different tables and includes a union.  In this case there is
> a significant overhead involved in both parsing and planning.
> However for a simple query that affects only one or two tables there
> is little overhead.
>
> 2) The statement object needs to be reused multiple times.  Using a
> server prepared statement requires at a minimum three sql statements
> to be executed to do the work of one original statement:
> select foo from bar;
> becomes
> prepare <name> as select foo from bar;
> execute <name>;
> deallocate <name>;

Note that DEALLOCATE is not really necessary -- prepared statements
are flushed from memory when the backend exits (I suppose if you're
using a connection pool, however, you should still explicitely
DEALLOCATE prepared statements when you're done with them).

> We can do the first two together in one roundtrip to the server, but
> the last one requires a separate roundtrip.  So if you are only using
> a the statement object/query only once then using server side prepared
> statements will make performance worse than not.  The vast majority of
> statement objects are created, executed once and then closed.  I don't
> think it makes sence to turn on a feature that will make the overall
> performance for most users worse.  There is very little jdbc code that
> I have seen that creates a statement, call execute multiple times on
> that same statement object and then finally closes it.
>
> This is the first version of the feature.  Improvements will come with
> implementation feedback and I welcome all feedback.

As do I (in reference to the backend part of the feature).

> In the long term what I would really like to see is this
> functionality done at the BE/FE protocol level instead of at the sql
> level as I have seen other databases do.

Yeah, that would be good. However, it requires a protocol change,
which wasn't feasible during the 7.3 development cycle. However, it
looks like there will be an FE/BE protocol change for 7.4, so that
might be a good opportunity to add protocol-level support.

>
> Someone has suggestted a jdbc url arguement to enable the
> functionality by default and that is probably a good idea if we learn
> from real use that it makes sense to have all statements use this.
> But I am not convinced (given my reasons above) that this really makes
> sense.

--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC

Re: [JDBC] Prepared statement performance...

От
Barry Lind
Дата:

Dmitry Tkach wrote:

> What is this BE/FE thing?

It is the protocol a client like jdbc uses to talk to the postgres server.

BE/FE = Back End / Front End protocol

--Barry



Re: [JDBC] Prepared statement performance...

От
Barry Lind
Дата:

Dmitry Tkach wrote:
>
> I have to disagree with that too, based on my experience...
> In that application I mentioned in the topic that initiated this
> discussion, the statement, that I have to execute about 30 million times
> looks like
>
> "select * from foo where primary_key=bar"
>
> What could possibly be simpler than that? :-)
> I was surprised to find out however, that in the end the backend turns
> out to have *more* time parsing and planning that query then actually
> executing it :-(
>

I don't disagree that this is a simple query.  But remember that since
the implementation of server side prepare is through the sql interface
with each call to execute this statement you still need to issue the
following sql statement:  execute <name>(bar);  which will still need to
invoke the parser.  Using server side prepared statements doesn't
eliminate all parses.  It isn't true that you can just take the original
  execute time of a statement and assume that with server side prepared
statements the total time will be the same as that, it will be more.

Executing something 30million times will certainly make even a little
improvement for a simple statement like this noticable, but 30million
executes of the same statement certainly isn't the norm.  And I would
suggest it is the type of environment where serverside prepared
statements are the most useful.  I hope you can try the new
functionality and provide us all with real numbers as to how much it helps.

thanks,
--Barry