Обсуждение: CommandStatus from insert returning when using a portal.

Поиск
Список
Период
Сортировка

CommandStatus from insert returning when using a portal.

От
Dave Cramer
Дата:
Greetings,

With a simple insert such as 

INSERT INTO test_table (cnt) VALUES (1), (2) RETURNING id

if a portal is used to get the results then the CommandStatus is not returned on the execute only when the portal is closed. After looking at this more it is really after all of the data is read which is consistent if you don't use a portal, however it would be much more useful if we received the CommandStatus after the insert was completed and before the data

Obviously I am biased by the JDBC API which would like to have

PreparedStatement.execute() return the number of rows inserted without having to wait to read all of the rows returned


Dave Cramer

Re: CommandStatus from insert returning when using a portal.

От
"David G. Johnston"
Дата:
On Wed, Jul 12, 2023 at 1:03 PM Dave Cramer <davecramer@gmail.com> wrote:

INSERT INTO test_table (cnt) VALUES (1), (2) RETURNING id

if a portal is used to get the results then the CommandStatus

IIUC the portal is not optional if you including the RETURNING clause.

There is no CommandStatus message in the protocol, the desired information is part of the command tag returned in the CommandComplete message.  You get that at the end of the command, which has been defined as when any portal produced by the command has been fully executed.

You probably should add your desire to the Version 4 protocol ToDo on the wiki.


If that ever becomes an active project working through the details of that list for desirability and feasibility would be the first thing to happen.

David J.

Re: CommandStatus from insert returning when using a portal.

От
Gurjeet Singh
Дата:
On Wed, Jul 12, 2023 at 1:03 PM Dave Cramer <davecramer@gmail.com> wrote:
>
> With a simple insert such as
>
> INSERT INTO test_table (cnt) VALUES (1), (2) RETURNING id
>
> if a portal is used to get the results then the CommandStatus is not returned on the execute only when the portal is
closed.After looking at this more it is really after all of the data is read which is consistent if you don't use a
portal,however it would be much more useful if we received the CommandStatus after the insert was completed and before
thedata 
>
> Obviously I am biased by the JDBC API which would like to have
>
> PreparedStatement.execute() return the number of rows inserted without having to wait to read all of the rows
returned

I believe if RETURNING clause is use, the protocol-level behaviour of
INSERT is expected to match that of SELECT. If the SELECT command
behaves like that (resultset followed by CommandStatus), then I'd say
the INSERT RETURNING is behaving as expected.

Best regards,
Gurjeet
http://Gurje.et



Re: CommandStatus from insert returning when using a portal.

От
Dave Cramer
Дата:

Dave Cramer


On Wed, 12 Jul 2023 at 16:31, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wed, Jul 12, 2023 at 1:03 PM Dave Cramer <davecramer@gmail.com> wrote:

INSERT INTO test_table (cnt) VALUES (1), (2) RETURNING id

if a portal is used to get the results then the CommandStatus

IIUC the portal is not optional if you including the RETURNING clause.
From my testing it isn't required.  

There is no CommandStatus message in the protocol, the desired information is part of the command tag returned in the CommandComplete message.  You get that at the end of the command, which has been defined as when any portal produced by the command has been fully executed.

I could argue that the insert is fully completed whether I read the data or not. 

You probably should add your desire to the Version 4 protocol ToDo on the wiki.


thx, will do.

Dave

Re: CommandStatus from insert returning when using a portal.

От
Tom Lane
Дата:
Dave Cramer <davecramer@gmail.com> writes:
> Obviously I am biased by the JDBC API which would like to have
> PreparedStatement.execute() return the number of rows inserted
> without having to wait to read all of the rows returned

Umm ... you do realize that we return the rows on-the-fly?
The server does not know how many rows got inserted/returned
until it's run the query to completion, at which point all
the data has already been sent to the client.  There isn't
any way to return the rowcount before the data, and it wouldn't
be some trivial protocol adjustment to make that work differently.
(What it *would* be is expensive, because we'd have to store
those rows somewhere.)

            regards, tom lane



Re: CommandStatus from insert returning when using a portal.

От
Dave Cramer
Дата:


On Wed, 12 Jul 2023 at 17:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dave Cramer <davecramer@gmail.com> writes:
> Obviously I am biased by the JDBC API which would like to have
> PreparedStatement.execute() return the number of rows inserted
> without having to wait to read all of the rows returned

Umm ... you do realize that we return the rows on-the-fly?
I do realize that. 
The server does not know how many rows got inserted/returned
Well I haven't looked at the code, but it seems unintuitive that adding the returning clause changes the semantics of insert. 

until it's run the query to completion, at which point all
the data has already been sent to the client.  There isn't
any way to return the rowcount before the data, and it wouldn't
be some trivial protocol adjustment to make that work differently.
(What it *would* be is expensive, because we'd have to store
those rows somewhere.)
I wasn't asking for that, I just want the number of rows inserted.

Thanks,

Dave

Re: CommandStatus from insert returning when using a portal.

От
"David G. Johnston"
Дата:
On Wed, Jul 12, 2023 at 2:59 PM Dave Cramer <davecramer@gmail.com> wrote:
On Wed, 12 Jul 2023 at 17:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dave Cramer <davecramer@gmail.com> writes:
> Obviously I am biased by the JDBC API which would like to have
> PreparedStatement.execute() return the number of rows inserted
> without having to wait to read all of the rows returned

Umm ... you do realize that we return the rows on-the-fly?
I do realize that. 
The server does not know how many rows got inserted/returned
Well I haven't looked at the code, but it seems unintuitive that adding the returning clause changes the semantics of insert. 


It doesn't have to - the insertions are always "as rows are produced", it is just that in the non-returning case the final row can be sent to /dev/null instead of the client (IOW, there is always some destination).  In both cases the total number of rows inserted are only reliably known when the top executor node requests a new tuple and its immediate predecessor says "no more rows present".

David J.

Re: CommandStatus from insert returning when using a portal.

От
chap@anastigmatix.net
Дата:
Dave Cramer <davecramer@gmail.com> writes:
> Obviously I am biased by the JDBC API which would like to have
> PreparedStatement.execute() return the number of rows inserted
> without having to wait to read all of the rows returned

Huh ... just how *is* PreparedStatement.execute() supposed
to behave when the statement is an INSERT RETURNING?

execute() -> true
getResultSet() -> the rows
getMoreResults() -> false
getUpdateCount() -> number inserted?

It seems that would fit the portal's behavior easily enough.

Or is the JDBC spec insisting on some other order?

Regards,
-Chap



Re: CommandStatus from insert returning when using a portal.

От
Dave Cramer
Дата:


On Wed, 12 Jul 2023 at 20:00, <chap@anastigmatix.net> wrote:
Dave Cramer <davecramer@gmail.com> writes:
> Obviously I am biased by the JDBC API which would like to have
> PreparedStatement.execute() return the number of rows inserted
> without having to wait to read all of the rows returned

Huh ... just how *is* PreparedStatement.execute() supposed
to behave when the statement is an INSERT RETURNING?

It's really executeUpdate which is supposed to return the number of rows updated.
Without a cursor it returns right away as all of the results are returned by the server. However with cursor you have to wait until you fetch the rows before you can get the CommandComplete message which btw is wrong as it returns INSERT 0 0 instead of INSERT 2 0

Dave

Re: CommandStatus from insert returning when using a portal.

От
"David G. Johnston"
Дата:
On Wed, Jul 12, 2023 at 5:57 PM Dave Cramer <davecramer@gmail.com> wrote:
On Wed, 12 Jul 2023 at 20:00, <chap@anastigmatix.net> wrote:
Dave Cramer <davecramer@gmail.com> writes:
> Obviously I am biased by the JDBC API which would like to have
> PreparedStatement.execute() return the number of rows inserted
> without having to wait to read all of the rows returned

Huh ... just how *is* PreparedStatement.execute() supposed
to behave when the statement is an INSERT RETURNING?

It's really executeUpdate which is supposed to return the number of rows updated.

Right, and executeUpdate is the wrong API method to use, in the PostgreSQL world, when executing insert/update/delete with the non-SQL-standard returning clause.  executeQuery is the method to use.  And execute() should behave as if executeQuery was called, i.e., return true, which it is capable of doing since it has resultSet data that it needs to handle.

The addition of returning turns the insert/update/delete into a select in terms of effective client-seen behavior.

ISTM that you are trying to make user-error less painful.  While that is laudable it apparently isn't practical.  They can either discard the results and get a count by omitting returning or obtain the result and derive the count by counting rows alongside whatever else they needed the returned data for.

David J.

Re: CommandStatus from insert returning when using a portal.

От
Dave Cramer
Дата:


On Wed, 12 Jul 2023 at 21:31, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wed, Jul 12, 2023 at 5:57 PM Dave Cramer <davecramer@gmail.com> wrote:
On Wed, 12 Jul 2023 at 20:00, <chap@anastigmatix.net> wrote:
Dave Cramer <davecramer@gmail.com> writes:
> Obviously I am biased by the JDBC API which would like to have
> PreparedStatement.execute() return the number of rows inserted
> without having to wait to read all of the rows returned

Huh ... just how *is* PreparedStatement.execute() supposed
to behave when the statement is an INSERT RETURNING?

It's really executeUpdate which is supposed to return the number of rows updated.

Right, and executeUpdate is the wrong API method to use, in the PostgreSQL world, when executing insert/update/delete with the non-SQL-standard returning clause.  executeQuery is the method to use.  And execute() should behave as if executeQuery was called, i.e., return true, which it is capable of doing since it has resultSet data that it needs to handle.

The addition of returning turns the insert/update/delete into a select in terms of effective client-seen behavior.

ISTM that you are trying to make user-error less painful.  While that is laudable it apparently isn't practical.  They can either discard the results and get a count by omitting returning or obtain the result and derive the count by counting rows alongside whatever else they needed the returned data for.
Any comment on why the CommandComplete is incorrect ?
It returns INSERT 0 0  if a cursor is used

Dave

Re: CommandStatus from insert returning when using a portal.

От
"David G. Johnston"
Дата:
On Thursday, July 13, 2023, Dave Cramer <davecramer@gmail.com> wrote:

Any comment on why the CommandComplete is incorrect ?
It returns INSERT 0 0  if a cursor is used

 Looking at DECLARE it is surprising that what you describe is even possible.  Can you share a psql reproducer?

David J.

Re: CommandStatus from insert returning when using a portal.

От
Dave Cramer
Дата:




On Thu, 13 Jul 2023 at 10:24, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thursday, July 13, 2023, Dave Cramer <davecramer@gmail.com> wrote:

Any comment on why the CommandComplete is incorrect ?
It returns INSERT 0 0  if a cursor is used

 Looking at DECLARE it is surprising that what you describe is even possible.  Can you share a psql reproducer?

apologies, we are using a portal, not a cursor. 
Dave Cramer

Re: CommandStatus from insert returning when using a portal.

От
"David G. Johnston"
Дата:
On Thu, Jul 13, 2023 at 6:07 PM Dave Cramer <davecramer@gmail.com> wrote:
On Thu, 13 Jul 2023 at 10:24, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thursday, July 13, 2023, Dave Cramer <davecramer@gmail.com> wrote:

Any comment on why the CommandComplete is incorrect ?
It returns INSERT 0 0  if a cursor is used

 Looking at DECLARE it is surprising that what you describe is even possible.  Can you share a psql reproducer?

apologies, we are using a portal, not a cursor. 


Still the same basic request of providing a reproducer - ideally in psql.

IIUC a portal has to be used for a prepared (extended query mode) result set returning query.  v16 can now handle parameter binding so:

postgres=# \bind 4
postgres=# insert into ins values ($1) returning id;
 id
----
  4
(1 row)

INSERT 0 1

Which gives the expected non-zero command tag row count result.

David J.

Re: CommandStatus from insert returning when using a portal.

От
chap@anastigmatix.net
Дата:
On 2023-07-12 20:57, Dave Cramer wrote:
> Without a cursor it returns right away as all of the results are 
> returned
> by the server. However with cursor you have to wait until you fetch the
> rows before you can get the CommandComplete message which btw is wrong 
> as
> it returns INSERT 0 0 instead of INSERT 2 0

To make sure I am following, was this describing a comparison of
two different ways in Java, using JDBC, to perform the same operation,
one of which behaves as desired while the other doesn't? If so, for
my curiosity, what do both ways look like in Java?

Or was it a comparison of two different operations, say one
an INSERT RETURNING and the other something else?

Regards,
-Chap



Re: CommandStatus from insert returning when using a portal.

От
Dave Cramer
Дата:
David, 

I will try to get a tcpdump file. Doing this in libpq seems challenging as I'm not aware of how to create a portal in psql.

Chap

The only difference is one instance uses a portal to fetch the results, the other (correct one) is a normal insert where all of the rows are returned immediately

this is a reproducer in Java

conn.prepareStatement("DROP TABLE IF EXISTS test_table").execute();
conn.prepareStatement("CREATE TABLE IF NOT EXISTS test_table (id SERIAL PRIMARY KEY, cnt INT NOT NULL)").execute();

for (var fetchSize : List.of(0, 1, 2, 3)) {    System.out.println("FetchSize=" + fetchSize);
    try (var stmt = conn.prepareStatement("INSERT INTO test_table (cnt) VALUES (1), (2) RETURNING id", RETURN_GENERATED_KEYS)) {        stmt.setFetchSize(fetchSize);
        var ret = stmt.executeUpdate();        System.out.println("executeUpdate result: " + ret);
        var rs = stmt.getGeneratedKeys();        System.out.print("ids: ");        while (rs.next()) {            System.out.print(rs.getInt(1) + " ");        }        System.out.print("\n\n");    }
}
Dave

On Fri, 14 Jul 2023 at 12:07, <chap@anastigmatix.net> wrote:
On 2023-07-12 20:57, Dave Cramer wrote:
> Without a cursor it returns right away as all of the results are
> returned
> by the server. However with cursor you have to wait until you fetch the
> rows before you can get the CommandComplete message which btw is wrong
> as
> it returns INSERT 0 0 instead of INSERT 2 0

To make sure I am following, was this describing a comparison of
two different ways in Java, using JDBC, to perform the same operation,
one of which behaves as desired while the other doesn't? If so, for
my curiosity, what do both ways look like in Java?

Or was it a comparison of two different operations, say one
an INSERT RETURNING and the other something else?

Regards,
-Chap

Re: CommandStatus from insert returning when using a portal.

От
"David G. Johnston"
Дата:
On Fri, Jul 14, 2023 at 9:30 AM Dave Cramer <davecramer@gmail.com> wrote:
David, 

I will try to get a tcpdump file. Doing this in libpq seems challenging as I'm not aware of how to create a portal in psql.

Yeah, apparently psql does something special (like ignoring it...) with its FETCH_COUNT variable (set to 2 below as evidenced by the first query) for the insert returning case.  As documented since the command itself is not select or values the test in is_select_command returns false and the branch:

 else if (pset.fetch_count <= 0 || pset.gexec_flag ||
pset.crosstab_flag || !is_select_command(query))
{
/* Default fetch-it-all-and-print mode */

Is chosen.

Fixing that test in some manner and recompiling psql seems like it should be the easiest way to produce a core-only test case.

postgres=# select * from (Values (1),(2),(30000),(40000)) vals (v);
 v
---
 1
 2
 30000
 40000
(4 rows)

postgres=# \bind 5 6 70000 80000
postgres=# insert into ins values ($1),($2),($3),($4) returning id;
  id
-------
     5
     6
 70000
 80000
(4 rows)

INSERT 0 4

I was hoping to see the INSERT RETURNING query have a 4 width header instead of 7.

David J.

Re: CommandStatus from insert returning when using a portal.

От
"David G. Johnston"
Дата:
On Fri, Jul 14, 2023 at 9:50 AM David G. Johnston <david.g.johnston@gmail.com> wrote:

Fixing that test in some manner and recompiling psql seems like it should be the easiest way to produce a core-only test case.


Apparently not - since it (ExecQueryUsingCursor) literally wraps the query in a DECLARE CURSOR SQL Command which prohibits INSERT...

I suppose we'd have to write a psql equivalent of ExecQueryUsingPortal that iterates over via fetch to make this work...probably more than I'm willing to try.

David J.

Re: CommandStatus from insert returning when using a portal.

От
Dave Cramer
Дата:
See attached pcap file

after the execute of the portal it returns INSERT 0 0
Dave Cramer


On Fri, 14 Jul 2023 at 12:57, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Fri, Jul 14, 2023 at 9:50 AM David G. Johnston <david.g.johnston@gmail.com> wrote:

Fixing that test in some manner and recompiling psql seems like it should be the easiest way to produce a core-only test case.


Apparently not - since it (ExecQueryUsingCursor) literally wraps the query in a DECLARE CURSOR SQL Command which prohibits INSERT...

I suppose we'd have to write a psql equivalent of ExecQueryUsingPortal that iterates over via fetch to make this work...probably more than I'm willing to try.

David J.
Вложения

Re: CommandStatus from insert returning when using a portal.

От
chap@anastigmatix.net
Дата:
On 2023-07-14 12:58, Dave Cramer wrote:
> See attached pcap file

So if the fetch count is zero and no portal is needed,
or if the fetch count exceeds the row count and the command
completion follows directly with no suspension of the portal, then
it comes with the correct count, but if the portal gets suspended,
then the later command completion reports a zero count?

Regards,
-Chap



Re: CommandStatus from insert returning when using a portal.

От
Dave Cramer
Дата:



On Fri, 14 Jul 2023 at 13:39, <chap@anastigmatix.net> wrote:
On 2023-07-14 12:58, Dave Cramer wrote:
> See attached pcap file

So if the fetch count is zero and no portal is needed,
or if the fetch count exceeds the row count and the command
completion follows directly with no suspension of the portal, then
it comes with the correct count, but if the portal gets suspended,
then the later command completion reports a zero count?


Seems so, yes.

Dave

Re: CommandStatus from insert returning when using a portal.

От
"David G. Johnston"
Дата:
On Fri, Jul 14, 2023 at 10:39 AM <chap@anastigmatix.net> wrote:
On 2023-07-14 12:58, Dave Cramer wrote:
> See attached pcap file

So if the fetch count is zero and no portal is needed,
or if the fetch count exceeds the row count and the command
completion follows directly with no suspension of the portal, then
it comes with the correct count, but if the portal gets suspended,
then the later command completion reports a zero count?


I cannot really understand that output other than to confirm that all queries had returning and one of them showed INSERT 0 0

Is there some magic set of arguments I should be using besides: tcpdump -Ar filename ?

Because of the returning they all need a portal so far as the server is concerned and the server will obligingly send the contents of the portal back to the client.

I can definitely believe a bug exists in the intersection of a non-SELECT query and a less-than-complete fetch count specification.  There doesn't seem to be any place in the core testing framework to actually test out the interaction though...I even tried using plpgsql, which lets me open/execute a plpgsql cursor with insert returning (which SQL prohibits) but we can't get access to the command tag itself there.  The ROW_COUNT variable likely tracks actual rows fetched or seen (in the case of MOVE).

What I kinda suspect might be happening with a portal suspend is that the suspension loop only ends when the final fetch attempt find zero rows to return and thus the final count ends up being zero instead of the cumulative sum over all portal scans.

David J.





Re: CommandStatus from insert returning when using a portal.

От
chap@anastigmatix.net
Дата:
On 2023-07-12 21:30, David G. Johnston wrote:
> Right, and executeUpdate is the wrong API method to use, in the 
> PostgreSQL
> world, when executing insert/update/delete with the non-SQL-standard
> returning clause. ... ISTM that you are trying to make user-error less
> painful.

In Dave's Java reproducer, no user-error has been made, because the user
supplied a plain INSERT with the RETURN_GENERATED_KEYS option, and the
RETURNING clause has been added by the JDBC driver. So the user expects
executeUpdate to be the right method, and return the row count, and
getGeneratedKeys() to then return the rows.

I've seen a possibly even more interesting result using pgjdbc-ng with
protocol.trace=true:

FetchSize=0
<P<D<S
> 1.>t.>T$>Z*
<B<E<S
> 2.>D.>D.>C.>Z*
executeUpdate result: 2
ids: 1 2

FetchSize=1
<B<E<H
> 2.>D.>s*
executeUpdate result: -1
ids: 3 <E<H
> D.>s*
4 <E<H
> C*
<C<S
> 3.>Z*

FetchSize=2
<B<E<H
> 2.>D.>D.>s*
executeUpdate result: -1
ids: 5 6 <E<H
> C*
<C<S
> 3.>Z*

FetchSize=3
<B<E<H
> 2.>D.>D.>C*
<C<S
> 3.>Z*
executeUpdate result: 2
ids: 7 8


Unless there's some interleaving of trace and stdout messages happening
here, I think pgjdbc-ng is not even collecting all the returned rows
in the suspended-cursor case before executeUpdate returns, but keeping
the cursor around for getGeneratedKeys() to use, so executeUpdate
returns -1 before even having seen the later command complete, and would
still do that even if the command complete message had the right count.

Regards,
-Chap



Re: CommandStatus from insert returning when using a portal.

От
chap@anastigmatix.net
Дата:
On 2023-07-14 14:19, David G. Johnston wrote:
> Is there some magic set of arguments I should be using besides: tcpdump 
> -Ar
> filename ?

I opened it with Wireshark, which has a pgsql protocol decoder.

Regards,
-Chap



Re: CommandStatus from insert returning when using a portal.

От
Dave Cramer
Дата:



On Fri, 14 Jul 2023 at 14:34, <chap@anastigmatix.net> wrote:
On 2023-07-12 21:30, David G. Johnston wrote:
> Right, and executeUpdate is the wrong API method to use, in the
> PostgreSQL
> world, when executing insert/update/delete with the non-SQL-standard
> returning clause. ... ISTM that you are trying to make user-error less
> painful.

In Dave's Java reproducer, no user-error has been made, because the user
supplied a plain INSERT with the RETURN_GENERATED_KEYS option, and the
RETURNING clause has been added by the JDBC driver. So the user expects
executeUpdate to be the right method, and return the row count, and
getGeneratedKeys() to then return the rows.

I've seen a possibly even more interesting result using pgjdbc-ng with
protocol.trace=true:

FetchSize=0
<P<D<S
> 1.>t.>T$>Z*
<B<E<S
> 2.>D.>D.>C.>Z*
executeUpdate result: 2
ids: 1 2

FetchSize=1
<B<E<H
> 2.>D.>s*
executeUpdate result: -1
ids: 3 <E<H
> D.>s*
4 <E<H
> C*
<C<S
> 3.>Z*

FetchSize=2
<B<E<H
> 2.>D.>D.>s*
executeUpdate result: -1
ids: 5 6 <E<H
> C*
<C<S
> 3.>Z*

FetchSize=3
<B<E<H
> 2.>D.>D.>C*
<C<S
> 3.>Z*
executeUpdate result: 2
ids: 7 8


Unless there's some interleaving of trace and stdout messages happening
here, I think pgjdbc-ng is not even collecting all the returned rows
in the suspended-cursor case before executeUpdate returns, but keeping
the cursor around for getGeneratedKeys() to use, so executeUpdate
returns -1 before even having seen the later command complete, and would
still do that even if the command complete message had the right count.

My guess is that pgjdbc-ng sees the -1 and doesn't bother looking any further

Either way pgjdbc-ng is a dead project so I'm not so concerned about it.

Dave 

Regards,
-Chap

Re: CommandStatus from insert returning when using a portal.

От
"David G. Johnston"
Дата:
On Fri, Jul 14, 2023 at 11:34 AM <chap@anastigmatix.net> wrote:
On 2023-07-12 21:30, David G. Johnston wrote:
> Right, and executeUpdate is the wrong API method to use, in the
> PostgreSQL
> world, when executing insert/update/delete with the non-SQL-standard
> returning clause. ... ISTM that you are trying to make user-error less
> painful.

In Dave's Java reproducer, no user-error has been made, because the user
supplied a plain INSERT with the RETURN_GENERATED_KEYS option, and the
RETURNING clause has been added by the JDBC driver. So the user expects
executeUpdate to be the right method, and return the row count, and
getGeneratedKeys() to then return the rows.


That makes more sense, though I don't understand how the original desire of having the count appear before the actual rows would materially benefit that feature.

I agree that the documented contract of the insert command tag says it reports the size of the entire tuple store maintained by the server during the transaction instead of just the most recent count on subsequent fetches.

David J.

Re: CommandStatus from insert returning when using a portal.

От
chap@anastigmatix.net
Дата:
On 2023-07-14 14:19, David G. Johnston wrote:
> Because of the returning they all need a portal so far as the server is
> concerned and the server will obligingly send the contents of the 
> portal
> back to the client.

Dave's pcap file, for the fetch count 0 case, does not show any
portal name used in the bind, describe, or execute messages, or
any portal close message issued by the client afterward. The server
may be using a portal in that case, but it seems more transparent
to the client when fetch count is zero.

Perhaps an easy rule would be, if the driver itself adds RETURNING
because of a RETURN_GENERATED_KEYS option, it should also force the
fetch count to zero and collect all the returned rows before
executeUpdate returns, and then it will have the right count
to return?

It seems that any approach leaving any of the rows unfetched at
the time executeUpdate returns might violate a caller's expectation
that the whole outcome is known by that point.

Regards,
-Chap



Re: CommandStatus from insert returning when using a portal.

От
Dave Cramer
Дата:



On Fri, 14 Jul 2023 at 15:40, <chap@anastigmatix.net> wrote:
On 2023-07-14 14:19, David G. Johnston wrote:
> Because of the returning they all need a portal so far as the server is
> concerned and the server will obligingly send the contents of the
> portal
> back to the client.

Dave's pcap file, for the fetch count 0 case, does not show any
portal name used in the bind, describe, or execute messages, or
any portal close message issued by the client afterward. The server
may be using a portal in that case, but it seems more transparent
to the client when fetch count is zero.


There is no portal for fetch count 0. 
 
Perhaps an easy rule would be, if the driver itself adds RETURNING
because of a RETURN_GENERATED_KEYS option, it should also force the
fetch count to zero and collect all the returned rows before
executeUpdate returns, and then it will have the right count
to return?

Well that kind of negates the whole point of using a cursor in the case where you have a large result set.

The rows are subsequently fetched in rs.next()

Solves one problem, but creates another.

Dave

Re: CommandStatus from insert returning when using a portal.

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> I agree that the documented contract of the insert command tag says it
> reports the size of the entire tuple store maintained by the server during
> the transaction instead of just the most recent count on subsequent fetches.

Where do you see that documented, exactly?  I looked in the protocol
chapter and didn't find anything definite either way.

I'm quite prepared to believe there are bugs here, since this whole
set of behaviors is unreachable via libpq: you can't get it to send
an Execute with a count other than zero (ie, fetch all), nor is it
prepared to deal with the PortalSuspended messages it'd get if it did.

I think that the behavior is arising from this bit in PortalRun:

        switch (portal->strategy)
        {
            ...
            case PORTAL_ONE_RETURNING:
            ...

                /*
                 * If we have not yet run the command, do so, storing its
                 * results in the portal's tuplestore.  But we don't do that
                 * for the PORTAL_ONE_SELECT case.
                 */
                if (portal->strategy != PORTAL_ONE_SELECT && !portal->holdStore)
                    FillPortalStore(portal, isTopLevel);

                /*
                 * Now fetch desired portion of results.
                 */
                nprocessed = PortalRunSelect(portal, true, count, dest);

                /*
                 * If the portal result contains a command tag and the caller
                 * gave us a pointer to store it, copy it and update the
                 * rowcount.
                 */
                if (qc && portal->qc.commandTag != CMDTAG_UNKNOWN)
                {
                    CopyQueryCompletion(qc, &portal->qc);
------>>>           qc->nprocessed = nprocessed;
                }

                /* Mark portal not active */
                portal->status = PORTAL_READY;

                /*
                 * Since it's a forward fetch, say DONE iff atEnd is now true.
                 */
                result = portal->atEnd;

The marked line is, seemingly intentionally, overriding the portal's
rowcount (which ought to count the whole query result) with the number
of rows processed in the current fetch.  Chap's theory that that's
always zero when this is being driven by JDBC seems plausible,
since the query status won't be returned to the client unless we
detect end-of-portal (otherwise we just send PortalSuspended).

It seems plausible to me that we should just remove that marked line.
Not sure about the compatibility implications, though.

            regards, tom lane



Re: CommandStatus from insert returning when using a portal.

От
chap@anastigmatix.net
Дата:
On 2023-07-14 15:49, Dave Cramer wrote:
> On Fri, 14 Jul 2023 at 15:40, <chap@anastigmatix.net> wrote:
>> Perhaps an easy rule would be, if the driver itself adds RETURNING
>> because of a RETURN_GENERATED_KEYS option, it should also force the
>> fetch count to zero and collect all the returned rows before
>> executeUpdate returns, and then it will have the right count
>> to return?
> 
> Well that kind of negates the whole point of using a cursor in the case
> where you have a large result set.
> 
> The rows are subsequently fetched in rs.next()

I guess it comes down, again, to the question of what kind of thing
the API client thinks it is doing, when it issues an INSERT with
the RETURN_GENERATED_KEYS option.

An API client issuing a plain INSERT knows it is the kind of thing
for which executeUpdate() is appropriate, and the complete success
or failure outcome is known when that returns.

An API client issuing its own explicit INSERT RETURNING knows it
is the kind of thing for which executeQuery() is appropriate, and
that some processing (and possibly ereporting) may be left to
occur while working through the ResultSet.

But now how about this odd hybrid, where the API client wrote
a plain INSERT, but added the RETURN_GENERATED_KEYS option?
The rewritten query is the kind of thing the server and the
driver need to treat as a query, but to the API client it still
appears the kind of thing for which executeUpdate() is suited.
The generated keys can then be examined--in the form of a
ResultSet--but one obtained with the special method
getGeneratedKeys(), rather than the usual way of getting the
ResultSet from a query.

Should the client then assume the semantics of executeUpdate
have changed for this case, the outcome isn't fully known yet,
and errors could come to light while examining the returned
keys? Or should it still assume that executeUpdate has its
usual meaning, all the work is done by that point, and the
ResultSet from getGeneratedKeys() is simply a convenient,
familiar interface for examining what came back?

I'm not sure if there's a firm answer to that one way or the
other in the formal JDBC spec, but the latter seems perhaps
safer to me.

Regards,
-Chap



Re: CommandStatus from insert returning when using a portal.

От
Dave Cramer
Дата:


On Fri, 14 Jul 2023 at 16:32, <chap@anastigmatix.net> wrote:
On 2023-07-14 15:49, Dave Cramer wrote:
> On Fri, 14 Jul 2023 at 15:40, <chap@anastigmatix.net> wrote:
>> Perhaps an easy rule would be, if the driver itself adds RETURNING
>> because of a RETURN_GENERATED_KEYS option, it should also force the
>> fetch count to zero and collect all the returned rows before
>> executeUpdate returns, and then it will have the right count
>> to return?
>
> Well that kind of negates the whole point of using a cursor in the case
> where you have a large result set.
>
> The rows are subsequently fetched in rs.next()

I guess it comes down, again, to the question of what kind of thing
the API client thinks it is doing, when it issues an INSERT with
the RETURN_GENERATED_KEYS option.

An API client issuing a plain INSERT knows it is the kind of thing
for which executeUpdate() is appropriate, and the complete success
or failure outcome is known when that returns.

An API client issuing its own explicit INSERT RETURNING knows it
is the kind of thing for which executeQuery() is appropriate, and
that some processing (and possibly ereporting) may be left to
occur while working through the ResultSet.

But now how about this odd hybrid, where the API client wrote
a plain INSERT, but added the RETURN_GENERATED_KEYS option?
The rewritten query is the kind of thing the server and the
driver need to treat as a query, but to the API client it still
appears the kind of thing for which executeUpdate() is suited.
The generated keys can then be examined--in the form of a
ResultSet--but one obtained with the special method
getGeneratedKeys(), rather than the usual way of getting the
ResultSet from a query.

Should the client then assume the semantics of executeUpdate
have changed for this case, the outcome isn't fully known yet,
and errors could come to light while examining the returned
keys? Or should it still assume that executeUpdate has its
usual meaning, all the work is done by that point, and the
ResultSet from getGeneratedKeys() is simply a convenient,
familiar interface for examining what came back?

The fly in the ointment here is when they setFetchSize and we decide to use a Portal under the covers.

I"m willing to document around this since it looks pretty unlikely that changing the behaviour in the server is a non-starter.
 

I'm not sure if there's a firm answer to that one way or the
other in the formal JDBC spec, but the latter seems perhaps
safer to me.

I'll leave the user to decide their own fate.

Dave 

Regards,
-Chap

Re: CommandStatus from insert returning when using a portal.

От
Chapman Flack
Дата:
On 2023-07-14 17:02, Dave Cramer wrote:
> The fly in the ointment here is when they setFetchSize and we decide to
> use a Portal under the covers.

A person might language-lawyer about whether setFetchSize even applies
to the kind of thing done with executeUpdate.

Hmm ... the apidoc for setFetchSize says "Gives the JDBC driver a hint
as to the number of rows that should be fetched from the database when
more rows are needed for ResultSet objects generated by this Statement."

So ... it appears to apply to any "ResultSet objects generated by this
Statement", and getGeneratedKeys returns a ResultSet, so maybe
setFetchSize should apply to it.

OTOH, setFetchSize has @since 1.2, and getGeneratedKeys @since 1.4.
At the time setFetchSize was born, the only way you could get a 
ResultSet
was from the kind of thing you'd use executeQuery for.

So when getGeneratedKeys was later added, a way of getting a ResultSet
after an executeUpdate, did they consciously intend it to come under
the jurisdiction of existing apidoc that concerned the fetch size of
a ResultSet you wanted from executeQuery?

Full employment for language lawyers.

Moreover, the apidoc does say the fetch size is "a hint", and also that
it applies "when more rows are needed" from the ResultSet.

So it's technically not a misbehavior to disregard the hint, and you're
not even disregarding the hint if you fetch all the rows at once, 
because
then more rows can't be needed. :)

Regards,
-Chap



Re: CommandStatus from insert returning when using a portal.

От
"David G. Johnston"
Дата:
On Fri, Jul 14, 2023 at 12:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> I agree that the documented contract of the insert command tag says it
> reports the size of the entire tuple store maintained by the server during
> the transaction instead of just the most recent count on subsequent fetches.

Where do you see that documented, exactly?  I looked in the protocol
chapter and didn't find anything definite either way.

On successful completion, an INSERT command returns a command tag of the form

INSERT oid count
The count is the number of rows inserted or updated.


It doesn't, nor should, have any qualifications about not applying to the returning case and definitely shouldn't change based upon use of FETCH on the unnamed portal.

I'm quite prepared to believe there are bugs here, since this whole
set of behaviors is unreachable via libpq: you can't get it to send
an Execute with a count other than zero (ie, fetch all), nor is it
prepared to deal with the PortalSuspended messages it'd get if it did.

I think that the behavior is arising from this bit in PortalRun:

                if (qc && portal->qc.commandTag != CMDTAG_UNKNOWN)
                {
                    CopyQueryCompletion(qc, &portal->qc);
------>>>           qc->nprocessed = nprocessed;
                }

I came to the same conclusion.  The original introduction of that line replaced string munging "SELECT " + nprocessed; so this code never even considered INSERT as being in scope and indeed wanted to return the per-run value in a fetch-list context.


I think I see why simple removal of that line is sufficient as the copied-from &portal->qc already has the result of executing the underlying insert query.  That said, the paranoid approach would seem to be to keep the assignment but only use it when we aren't dealing with the returning case.

diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5565f200c3..5e75141f0b 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -775,7 +775,8 @@ PortalRun(Portal portal, long count, bool isTopLevel, bool run_once,
                                if (qc && portal->qc.commandTag != CMDTAG_UNKNOWN)
                                {
                                        CopyQueryCompletion(qc, &portal->qc);
-                                       qc->nprocessed = nprocessed;
+                                       if (portal->strategy != PORTAL_ONE_RETURNING)
+                                               qc->nprocessed = nprocessed;
                                }

                                /* Mark portal not active */
 
It seems plausible to me that we should just remove that marked line.
Not sure about the compatibility implications, though.


I believe it is a bug worth fixing, save driver writers processing time getting a count when the command tag is supposed to be providing it to them using compute time already spent anyways.

David J.


Re: CommandStatus from insert returning when using a portal.

От
Chapman Flack
Дата:
On 2023-07-14 17:31, Chapman Flack wrote:
> So when getGeneratedKeys was later added, a way of getting a ResultSet
> after an executeUpdate, did they consciously intend it to come under
> the jurisdiction of existing apidoc that concerned the fetch size of
> a ResultSet you wanted from executeQuery?
> ...
> Moreover, the apidoc does say the fetch size is "a hint", and also that
> it applies "when more rows are needed" from the ResultSet.
> 
> So it's technically not a misbehavior to disregard the hint, and you're
> not even disregarding the hint if you fetch all the rows at once, 
> because
> then more rows can't be needed. :)

... and just to complete the thought, the apidoc for executeUpdate 
leaves
no wiggle room for what that method returns: for DML, it has to be the
row count.

So if the only way to get the accurate row count is to fetch all the
RETURN_GENERATED_KEYS rows at once, either to count them locally or
to find the count in the completion message that follows them, that
mandate seems stronger than any hint from setFetchSize.

If someone really does want to do a huge INSERT and get the generated
values back in increments, it might be clearer to write an explicit
INSERT RETURNING and issue it with executeQuery, where everything will
work as expected.

I am also thinking someone might possibly allocate one Statement to
use for some number of executeQuery and executeUpdate calls, and might
call setFetchSize as a hint for the queries, but not expect it to have
effects spilling over to executeUpdate.

Regards,
-Chap



Re: CommandStatus from insert returning when using a portal.

От
"David G. Johnston"
Дата:
On Fri, Jul 14, 2023 at 3:12 PM Chapman Flack <chap@anastigmatix.net> wrote:
If someone really does want to do a huge INSERT and get the generated
values back in increments, it might be clearer to write an explicit
INSERT RETURNING and issue it with executeQuery, where everything will
work as expected.


For PostgreSQL this is even moreso (i.e, huge means count > 1) since the order of rows in the returning clause is not promised to be related to the order of the rows as seen in the supplied insert command.  A manual insert returning should ask for not only any auto-generated column but also the set of columns that provide the unique natural key.

David J.

Re: CommandStatus from insert returning when using a portal.

От
Chapman Flack
Дата:
On 2023-07-14 18:22, David G. Johnston wrote:
> For PostgreSQL this is even moreso (i.e, huge means count > 1) since 
> the
> order of rows in the returning clause is not promised to be related to 
> the
> order of the rows as seen in the supplied insert command.  A manual 
> insert
> returning should ask for not only any auto-generated column but also 
> the
> set of columns that provide the unique natural key.

Yikes!

That sounds like something that (if it's feasible) the driver's
rewriting for RETURN_GENERATED_KEYS should try to do ... the
driver is already expected to be smart enough to know which
columns the generated keys are ... should it also try to rewrite
the query in some way to get a meaningful order of the results?

But then ... the apidoc for getGeneratedKeys is completely
silent on the ordering anyway. I get the feeling this whole
corner of the JDBC API may have been thought out only as far
as issuing a single-row INSERT at a time and getting its
assigned keys back.

Regards,
-Chap



Re: CommandStatus from insert returning when using a portal.

От
Alvaro Herrera
Дата:
On 2023-Jul-14, Dave Cramer wrote:

> David,
> 
> I will try to get a tcpdump file. Doing this in libpq seems challenging as
> I'm not aware of how to create a portal in psql.

You can probably have a look at src/test/modules/libpq_pipeline/libpq_pipeline.c
as a basis to write some test code for this.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"If you want to have good ideas, you must have many ideas.  Most of them
will be wrong, and what you have to learn is which ones to throw away."
                                                         (Linus Pauling)



Re: CommandStatus from insert returning when using a portal.

От
"David G. Johnston"
Дата:
On Wed, Jul 12, 2023 at 2:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dave Cramer <davecramer@gmail.com> writes:
> Obviously I am biased by the JDBC API which would like to have
> PreparedStatement.execute() return the number of rows inserted
> without having to wait to read all of the rows returned

Umm ... you do realize that we return the rows on-the-fly?
The server does not know how many rows got inserted/returned
until it's run the query to completion, at which point all
the data has already been sent to the client

Doesn't this code contradict that statement?

src/backend/tcop/pquery.c
/*
* If we have not yet run the command, do so, storing its
* results in the portal's tuplestore.  But we don't do that
* for the PORTAL_ONE_SELECT case.
*/
if (portal->strategy != PORTAL_ONE_SELECT && !portal->holdStore)
    FillPortalStore(portal, isTopLevel);
/*
* Now fetch desired portion of results.
*/
nprocessed = PortalRunSelect(portal, true, count, dest);


Not sure we'd want to lock ourselves into this implementation but at least as it stands now we could send a message with the portal size after calling FillPortalStore and prior to calling PortalRunSelect.

David J.