Обсуждение: Odd query execution behavior with extended protocol

Поиск
Список
Период
Сортировка

Odd query execution behavior with extended protocol

От
Shay Rojansky
Дата:
<div dir="ltr"><div style="font-size:12.8px">Hi hackers, some odd behavior has been reported with Npgsql and I'm sure
youcan help.</div><div style="font-size:12.8px"><br /></div><div style="font-size:12.8px">Npgsql supports sending
multipleSQL statements in a single packet via the extended protocol. This works fine, but when the second query SELECTs
avalue modified by the first's UPDATE, I'm getting a result as if the UPDATE hasn't yet occurred.</div><div
style="font-size:12.8px"><br/></div><div style="font-size:12.8px">The exact messages send by Npgsql are:</div><div
style="font-size:12.8px"><br/></div><div style="font-size:12.8px">Parse (UPDATE data SET name='foo' WHERE id=1),
statement=unnamed</div><divstyle="font-size:12.8px">Describe (statement=unnamed)</div><div
style="font-size:12.8px">Bind(statement=unnamed, portal=MQ0)</div><div style="font-size:12.8px">Parse (SELECT * FROM
dataWHERE id=1), statement=unnamed</div><div style="font-size:12.8px">Describe (statement=unnamed)</div><div
style="font-size:12.8px">Bind(statement=unnamed, portal=MQ1)</div><div style="font-size:12.8px">Execute
(portal=MQ0)</div><divstyle="font-size:12.8px">Close (portal=MQ0)</div><div style="font-size:12.8px">Execute
(portal=MQ1)</div><divstyle="font-size:12.8px">Close (portal=MQ1)</div><div style="font-size:12.8px">Sync</div><div
style="font-size:12.8px"><br/></div><div style="font-size:12.8px">Instead of returning the expected 'foo' value set in
thefirst command's UPDATE, I'm getting whatever value was previously there.</div><div style="font-size:12.8px">Note
thatthis happen regardless of whether a transaction is already set and of the isolation level.</div><div
style="font-size:12.8px"><br/></div><div style="font-size:12.8px">Is this the expected behavior, have I misunderstood
theprotocol specs?</div><div style="font-size:12.8px"><br /></div><div style="font-size:12.8px">Thanks for your help,
andplease let me know if you need any more info.</div><div style="font-size:12.8px"><br /></div><div
style="font-size:12.8px">Shay</div></div>

Re: Odd query execution behavior with extended protocol

От
"Charles Clavadetscher"
Дата:
Hello

> Npgsql supports sending multiple SQL statements in a single packet via the extended protocol. This works fine, but
whenthe second query SELECTs a value modified by the first's UPDATE, I'm getting a result as if the  
> UPDATE hasn't yet occurred.

Looks like the first updating statement is not committed, assuming that the two statements run in different
transactions.

> The exact messages send by Npgsql are:
>
> Parse (UPDATE data SET name='foo' WHERE id=1), statement=unnamed
> Describe (statement=unnamed)
> Bind (statement=unnamed, portal=MQ0)
> Parse (SELECT * FROM data WHERE id=1), statement=unnamed
> Describe (statement=unnamed)
> Bind (statement=unnamed, portal=MQ1)
> Execute (portal=MQ0)
> Close (portal=MQ0)
> Execute (portal=MQ1)
> Close (portal=MQ1)
> Sync

I never used Npgsql so I don't know if there is something missing there. Would you need an explicit commit before
closingMQ0? 
Also I am not in clear what "statement=unnamed" means, but it is used twice. Is it possible that the update is
overwrittenwith select before it executes? 

Just some thoughts, as I said I know nothing of Npgsql.

BTW: Do you see the change after update in your DB if you look into it with another tool (e.g. psql)?

Charles





Re: Odd query execution behavior with extended protocol

От
Shay Rojansky
Дата:
> Npgsql supports sending multiple SQL statements in a single packet via the extended protocol. This works fine, but when the second query SELECTs a value modified by the first's UPDATE, I'm getting a result as if the
> UPDATE hasn't yet occurred.

Looks like the first updating statement is not committed, assuming that the two statements run in different transactions.

I did try to prefix the message chain with an explicit transaction BEGIN (with the several different isolation levels) without a difference in behavior.

> The exact messages send by Npgsql are:
>
> Parse (UPDATE data SET name='foo' WHERE id=1), statement=unnamed
> Describe (statement=unnamed)
> Bind (statement=unnamed, portal=MQ0)
> Parse (SELECT * FROM data WHERE id=1), statement=unnamed
> Describe (statement=unnamed)
> Bind (statement=unnamed, portal=MQ1)
> Execute (portal=MQ0)
> Close (portal=MQ0)
> Execute (portal=MQ1)
> Close (portal=MQ1)
> Sync

I never used Npgsql so I don't know if there is something missing there. Would you need an explicit commit before closing MQ0?

I guess this is exactly my question to PostgreSQL... But unless I'm misunderstanding the transaction semantics I shouldn't need to commit the first UPDATE in order to see its effect in the second SELECT...

Also I am not in clear what "statement=unnamed" means, but it is used twice. Is it possible that the update is overwritten with select before it executes?

statement=unnamed means that the destination statement is the unnamed prepared statement (as described in http://www.postgresql.org/docs/current/static/protocol-message-formats.html). Right after the Parse I bind the unnamed statement which I just parsed to cursor MQ0. In other words, Npgsql first parses the two queries and binds them to portals MQ0 and MQ1, and only then executes both portals

BTW: Do you see the change after update in your DB if you look into it with another tool (e.g. psql)?

That's a good suggestion, I'll try to check it out, thanks!

Re: Odd query execution behavior with extended protocol

От
Andres Freund
Дата:
On October 4, 2015 2:50:10 PM GMT+02:00, Shay Rojansky <roji@roji.org> wrote:
>>
>> > Npgsql supports sending multiple SQL statements in a single packet
>via
>> the extended protocol. This works fine, but when the second query
>SELECTs a
>> value modified by the first's UPDATE, I'm getting a result as if the
>> > UPDATE hasn't yet occurred.
>>
>> Looks like the first updating statement is not committed, assuming
>that
>> the two statements run in different transactions.
>>
>
>I did try to prefix the message chain with an explicit transaction
>BEGIN
>(with the several different isolation levels) without a difference in
>behavior.
>
>> The exact messages send by Npgsql are:
>> >
>> > Parse (UPDATE data SET name='foo' WHERE id=1), statement=unnamed
>> > Describe (statement=unnamed)
>> > Bind (statement=unnamed, portal=MQ0)
>> > Parse (SELECT * FROM data WHERE id=1), statement=unnamed
>> > Describe (statement=unnamed)
>> > Bind (statement=unnamed, portal=MQ1)
>> > Execute (portal=MQ0)
>> > Close (portal=MQ0)
>> > Execute (portal=MQ1)
>> > Close (portal=MQ1)
>> > Sync
>>
>> I never used Npgsql so I don't know if there is something missing
>there.
>> Would you need an explicit commit before closing MQ0?
>>
>
>I guess this is exactly my question to PostgreSQL... But unless I'm
>misunderstanding the transaction semantics I shouldn't need to commit
>the
>first UPDATE in order to see its effect in the second SELECT...

Try adding a sync before the second execute.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.



Re: Odd query execution behavior with extended protocol

От
Tom Lane
Дата:
Shay Rojansky <roji@roji.org> writes:
> Npgsql supports sending multiple SQL statements in a single packet via the
> extended protocol. This works fine, but when the second query SELECTs a
> value modified by the first's UPDATE, I'm getting a result as if the UPDATE
> hasn't yet occurred.

> The exact messages send by Npgsql are:

> Parse (UPDATE data SET name='foo' WHERE id=1), statement=unnamed
> Describe (statement=unnamed)
> Bind (statement=unnamed, portal=MQ0)
> Parse (SELECT * FROM data WHERE id=1), statement=unnamed
> Describe (statement=unnamed)
> Bind (statement=unnamed, portal=MQ1)
> Execute (portal=MQ0)
> Close (portal=MQ0)
> Execute (portal=MQ1)
> Close (portal=MQ1)
> Sync

I'm fairly sure that the query snapshot is established at Bind time,
which means that this SELECT will run with a snapshot that indeed
does not see the effects of the UPDATE.

To my mind there is not a lot of value in performing Bind until you
are ready to do Execute.  The only reason the operations are separated
in the protocol is so that you can do multiple Executes with a row limit
on each one, to retrieve a large query result in chunks.
        regards, tom lane



Re: Odd query execution behavior with extended protocol

От
Shay Rojansky
Дата:
Try adding a sync before the second execute.

I tried inserting a Sync right before the second Execute, this caused an error with the message 'portal "MQ1" does not exist'.
This seems like problematic behavior on its own, regardless of my issues here (Sync shouldn't be causing an implicit close of the portal, should it?).
 

Re: Odd query execution behavior with extended protocol

От
Shay Rojansky
Дата:
I'm fairly sure that the query snapshot is established at Bind time,
which means that this SELECT will run with a snapshot that indeed
does not see the effects of the UPDATE.

To my mind there is not a lot of value in performing Bind until you
are ready to do Execute.  The only reason the operations are separated
in the protocol is so that you can do multiple Executes with a row limit
on each one, to retrieve a large query result in chunks.

So you would suggest changing my message chain to send Bind right after Execute, right? This would yield the following messages:

P1/P2/D1/B1/E1/D2/B2/E2/S (rather than the current P1/D1/B1/P2/D2/B2/E1/C1/E2/C2/S)

This would mean that I would switch to using named statements and the unnamed portal, rather than the current unnamed statement
and named portals. If I recall correctly, I was under the impression that there are some PostgreSQL performance benefits to using the
unnamed statement over named statements, although I admit I can't find any documentation backing that. Can you confirm that the two
are equivalent performance-wise?

Shay 

Re: Odd query execution behavior with extended protocol

От
Tom Lane
Дата:
Shay Rojansky <roji@roji.org> writes:
>> To my mind there is not a lot of value in performing Bind until you
>> are ready to do Execute.  The only reason the operations are separated
>> in the protocol is so that you can do multiple Executes with a row limit
>> on each one, to retrieve a large query result in chunks.

> So you would suggest changing my message chain to send Bind right after
> Execute, right? This would yield the following messages:

> P1/P2/D1/B1/E1/D2/B2/E2/S (rather than the current
> P1/D1/B1/P2/D2/B2/E1/C1/E2/C2/S)

> This would mean that I would switch to using named statements and the
> unnamed portal, rather than the current unnamed statement
> and named portals. If I recall correctly, I was under the impression that
> there are some PostgreSQL performance benefits to using the
> unnamed statement over named statements, although I admit I can't find any
> documentation backing that. Can you confirm that the two
> are equivalent performance-wise?

Hmm.  I do not recall exactly what performance optimizations apply to
those two cases; they're probably not "equivalent", though I do not think
the difference is major in either case.  TBH I was a bit surprised on
reading your message to hear that the system would take that sequence at
all; it's not obvious that it should be allowed to replace a statement,
named or not, while there's an open portal that depends on it.

I think you might have more issues with lifespans, since portals go away
at commit whereas named statements don't.
        regards, tom lane



Re: Odd query execution behavior with extended protocol

От
Tom Lane
Дата:
Shay Rojansky <roji@roji.org> writes:
>> Try adding a sync before the second execute.

> I tried inserting a Sync right before the second Execute, this caused an
> error with the message 'portal "MQ1" does not exist'.
> This seems like problematic behavior on its own, regardless of my issues
> here (Sync shouldn't be causing an implicit close of the portal, should
> it?).

Sync results in closing the transaction, if you've not explicitly executed
a BEGIN.
        regards, tom lane



Re: Odd query execution behavior with extended protocol

От
Shay Rojansky
Дата:
<div dir="ltr">Thanks for the help Tom and the others, I'll modify my sequence and report if I encounter any further
issues.</div><divclass="gmail_extra"><br /><div class="gmail_quote">On Sun, Oct 4, 2015 at 7:36 PM, Tom Lane <span
dir="ltr"><<ahref="mailto:tgl@sss.pgh.pa.us" target="_blank">tgl@sss.pgh.pa.us</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="">ShayRojansky <<a href="mailto:roji@roji.org">roji@roji.org</a>> writes:<br /> >> To my mind there
isnot a lot of value in performing Bind until you<br /> >> are ready to do Execute.  The only reason the
operationsare separated<br /> >> in the protocol is so that you can do multiple Executes with a row limit<br />
>>on each one, to retrieve a large query result in chunks.<br /><br /> > So you would suggest changing my
messagechain to send Bind right after<br /> > Execute, right? This would yield the following messages:<br /><br />
>P1/P2/D1/B1/E1/D2/B2/E2/S (rather than the current<br /> > P1/D1/B1/P2/D2/B2/E1/C1/E2/C2/S)<br /><br /> >
Thiswould mean that I would switch to using named statements and the<br /> > unnamed portal, rather than the current
unnamedstatement<br /> > and named portals. If I recall correctly, I was under the impression that<br /> > there
aresome PostgreSQL performance benefits to using the<br /> > unnamed statement over named statements, although I
admitI can't find any<br /> > documentation backing that. Can you confirm that the two<br /> > are equivalent
performance-wise?<br/><br /></span>Hmm.  I do not recall exactly what performance optimizations apply to<br /> those
twocases; they're probably not "equivalent", though I do not think<br /> the difference is major in either case.  TBH I
wasa bit surprised on<br /> reading your message to hear that the system would take that sequence at<br /> all; it's
notobvious that it should be allowed to replace a statement,<br /> named or not, while there's an open portal that
dependson it.<br /><br /> I think you might have more issues with lifespans, since portals go away<br /> at commit
whereasnamed statements don't.<br /><br />                         regards, tom lane<br /></blockquote></div><br
/></div>

Re: Odd query execution behavior with extended protocol

От
Shay Rojansky
Дата:
> So you would suggest changing my message chain to send Bind right after
> Execute, right? This would yield the following messages:

> P1/P2/D1/B1/E1/D2/B2/E2/S (rather than the current
> P1/D1/B1/P2/D2/B2/E1/C1/E2/C2/S)

> This would mean that I would switch to using named statements and the
> unnamed portal, rather than the current unnamed statement
> and named portals. If I recall correctly, I was under the impression that
> there are some PostgreSQL performance benefits to using the
> unnamed statement over named statements, although I admit I can't find any
> documentation backing that. Can you confirm that the two
> are equivalent performance-wise?

Hmm.  I do not recall exactly what performance optimizations apply to
those two cases; they're probably not "equivalent", though I do not think
the difference is major in either case.  TBH I was a bit surprised on
reading your message to hear that the system would take that sequence at
all; it's not obvious that it should be allowed to replace a statement,
named or not, while there's an open portal that depends on it.

One more important piece of information...

The reason Npgsql currently sends P1/D1/B1/P2/D2/B2/E1/C1/E2/C2/S is to avoid deadlocks, I've already discussed this with you in http://www.postgresql.org/message-id/CADT4RqB+fbtQpTE5YLZ0hKb-2K-nGZHM2YbVj0TMC8rQBGfUxA@mail.gmail.com.

Unfortunately, the alternative I proposed above, P1/P2/D1/B1/E1/D2/B2/E2/S, suffers from the same issue: any sequence in which a Bind is sent after a previous Execute is deadlock-prone - Execute causes PostgreSQL to start writing a potentially large dataset, while Bind means the client may be writing a potentially large parameter value.

In other words, unless I'm mistaken it seems there's no alternative but to implement non-blocking I/O at the client side - write until writing would block, switching to reading when that happens. This adds some substantial complexity, especially with .NET's SSL/TLS implementation layer.

Or does anyone see some sort of alternative which I've missed?