Обсуждение: Multiple queries in transit

Поиск
Список
Период
Сортировка

Multiple queries in transit

От
Mark Hills
Дата:
We have a user interface which fetches and displays many small pieces of 
distinct information from a PostgreSQL database.

* fetches are simple lookups across a diverse set of tables, in response to events on another data source

* uses PQsendQuery() on a non-blocking socket

But data fetches visibly take some time -- libpq doesn't allow a second 
query to be sent until the first has been fully processed. The 
back-and-forth seems to give a bottleneck on the round-trip.

Instead, it would be preferable to send multiple requests (down the TCP 
socket), and then receive multiple responses (in order).

This would allow the sending, processing and receiving response to be 
interleaved much more reasonably, and reduce the delay.

Could libpq be reasonably modified to allow this?

Looking at the libpq code (fq-exec.c), it seems almost no state needs to 
be stored until results are received, and so perhaps this limitation is 
unnecessary. The result-accumulation state is reset on sending the query; 
it could perhaps be done on receipt. Are there problems with this?

Below is a simple illustration.

Also, whilst tracing code through to pqsecure_write(), I also wondered if 
some Nagle's algorithm on the socket is also introducing an additional 
delay? I can't see special consideration in the code for this (eg. 
TCP_NODELAY)

Thoughts and suggestions appreciated, many thanks.

-- 
Mark


#include <stdio.h>
#include <libpq-fe.h>

#define QUEUE 10

void qerror(const char *label, PGconn *db)
{fprintf(stderr, "%s: %s", label, PQerrorMessage(db));
}

int main(int argc, char *argv[])
{unsigned int n;PGconn *db;
db = PQconnectdb("");if (PQstatus(db) != CONNECTION_OK) {    qerror("PQconnectdb", db);    return -1;}
/* Send queries. Important: this simple example does not cover * the case of a full transmit buffer */
for (n = 0; n < QUEUE; n++) {    fprintf(stderr, "Sending query %u...\n", n);
    if (PQsendQuery(db, "SELECT random()") != 1) {        qerror("PQsendQuery", db);        return -1;    }}
/* Receive responses */
for (n = 0; n < QUEUE; n++) {    PGresult *r;
    fprintf(stderr, "Receiving response %u...\n", n);
    r = PQgetResult(db);    if (r == NULL) {        qerror("PQgetResult", db);        return -1;    }
    fprintf(stderr, "  Result is %s\n", PQgetvalue(r, 0, 0));    PQclear(r);}
PQfinish(db);
return 0;
}


Re: Multiple queries in transit

От
Heikki Linnakangas
Дата:
On 31.10.2011 17:44, Mark Hills wrote:
> We have a user interface which fetches and displays many small pieces of
> distinct information from a PostgreSQL database.
>
> * fetches are simple lookups across a diverse set of tables,
>    in response to events on another data source
>
> * uses PQsendQuery() on a non-blocking socket
>
> But data fetches visibly take some time -- libpq doesn't allow a second
> query to be sent until the first has been fully processed. The
> back-and-forth seems to give a bottleneck on the round-trip.
>
> Instead, it would be preferable to send multiple requests (down the TCP
> socket), and then receive multiple responses (in order).
>
> This would allow the sending, processing and receiving response to be
> interleaved much more reasonably, and reduce the delay.
>
> Could libpq be reasonably modified to allow this?

I believe it's doable in theory, no-one has just gotten around to it. 
Patches are welcome.

> Also, whilst tracing code through to pqsecure_write(), I also wondered if
> some Nagle's algorithm on the socket is also introducing an additional
> delay? I can't see special consideration in the code for this (eg.
> TCP_NODELAY)

We do set TCP_NODELAY, see connectNoDelay() in fe-connect.c 

(http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/interfaces/libpq/fe-connect.c;h=ed9dce941e1d57cce51f2c21bf29769dfe2ee542;hb=HEAD#l960)

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Multiple queries in transit

От
Marti Raudsepp
Дата:
I have nothing of substance to add, but

On Mon, Oct 31, 2011 at 17:44, Mark Hills <Mark.Hills@framestore.com> wrote:
> Instead, it would be preferable to send multiple requests (down the TCP
> socket), and then receive multiple responses (in order).

HTTP calls this "pipelining". I think it's helpful to adopt this term
since the concept is already familiar to many developers.

Regards,
Marti


Re: Multiple queries in transit

От
Tom Lane
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> On 31.10.2011 17:44, Mark Hills wrote:
>> Could libpq be reasonably modified to allow this?

> I believe it's doable in theory, no-one has just gotten around to it. 
> Patches are welcome.

Can't you do that today with a multi-command string submitted to
PQsendQuery, followed by multiple calls to PQgetResult?

I'm hesitant to think about supporting the case more thoroughly than
that, or with any different semantics than that, because I think that
the error-case behavior will be entirely unintelligible/unmaintainable
unless you abandon all queries-in-flight in toto when an error happens.
Furthermore, in most apps it'd be a serious PITA to keep track of which
reply is for which query, so I doubt that such a feature is of general
usefulness.
        regards, tom lane


Re: Multiple queries in transit

От
Heikki Linnakangas
Дата:
On 31.10.2011 19:09, Tom Lane wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
>> On 31.10.2011 17:44, Mark Hills wrote:
>>> Could libpq be reasonably modified to allow this?
>
>> I believe it's doable in theory, no-one has just gotten around to it.
>> Patches are welcome.
>
> Can't you do that today with a multi-command string submitted to
> PQsendQuery, followed by multiple calls to PQgetResult?

Yes, true, although that only works with the simple query protocol. The 
extended protocol doesn't allow multi-command queries.

> I'm hesitant to think about supporting the case more thoroughly than
> that, or with any different semantics than that, because I think that
> the error-case behavior will be entirely unintelligible/unmaintainable
> unless you abandon all queries-in-flight in toto when an error happens.

Abandoning all in-flight queries seems quite reasonable to me. You could 
send a Sync message between each query to make it easier to track which 
query errored.

> Furthermore, in most apps it'd be a serious PITA to keep track of which
> reply is for which query, so I doubt that such a feature is of general
> usefulness.

I think a common use for this would be doing multiple inserts or updates 
on one go. Like, insert into a parent table, then more details into 
child tables. You don't care about getting the results back in that 
case, as long as you get an error on failure.

Another typical use case would be something like an ORM that wants to 
fetch a row from one table, and details of the same object from other 
tables. If it's just 2-3 queries, it's not that difficult to remember in 
which order they were issued.

Both of those use cases would be happy with just sending a multi-command 
string with PQsendQuery(), because you know the all queries in advance, 
but it would be nice to not be limited to simple query protocol...

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Multiple queries in transit

От
Merlin Moncure
Дата:
On Mon, Oct 31, 2011 at 12:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> On 31.10.2011 17:44, Mark Hills wrote:
>>> Could libpq be reasonably modified to allow this?
>
>> I believe it's doable in theory, no-one has just gotten around to it.
>> Patches are welcome.
>
> Can't you do that today with a multi-command string submitted to
> PQsendQuery, followed by multiple calls to PQgetResult?

Multi command string queries don't support parameterization.  The way
I do it is to keep an application managed stack of data (as an array
of record types) to send that is accumulated when the last stack is in
transit.  Then when the last response comes in you repeat.

Of course, if you could parameterize a multi command string statement,
that might be a better way to go.

merlin


Re: Multiple queries in transit

От
Mark Hills
Дата:
On Mon, 31 Oct 2011, Tom Lane wrote:

> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > On 31.10.2011 17:44, Mark Hills wrote:
> >> Could libpq be reasonably modified to allow this?
> 
> > I believe it's doable in theory, no-one has just gotten around to it. 
> > Patches are welcome.
> 
> Can't you do that today with a multi-command string submitted to
> PQsendQuery, followed by multiple calls to PQgetResult?

I remember something about this; I think I concluded that it validated 
that receiving multiple results could be done this way.

But this kind of batching can't be used with prepared queries.
> I'm hesitant to think about supporting the case more thoroughly than 
> that, or with any different semantics than that, because I think that 
> the error-case behavior will be entirely unintelligible/unmaintainable 
> unless you abandon all queries-in-flight in toto when an error happens.

Can you explain a bit more detail which errors are of most concern, do you 
mean full buffers on the client send?

Because the content of the stream going to/from the server does not 
change, I wouldn't really expect the semantics to change. For example, the 
server cannot even see that the client is behaving in this way. Are there 
any 'send' functions that are heavily reliant on some kind of 
result/receive state?

I don't disagree with the comments above though, any shift towards 
unintelligible behaviour would be very bad.

> Furthermore, in most apps it'd be a serious PITA to keep track of which 
> reply is for which query, so I doubt that such a feature is of general 
> usefulness.

In our UI case, we already have a queue. Because libpq can't pipeline 
multiple queries, we have to make our own queue of them anyway.

-- 
Mark


Re: Multiple queries in transit

От
Merlin Moncure
Дата:
On Mon, Oct 31, 2011 at 12:49 PM, Mark Hills <Mark.Hills@framestore.com> wrote:
> On Mon, 31 Oct 2011, Tom Lane wrote:
>
>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> > On 31.10.2011 17:44, Mark Hills wrote:
>> >> Could libpq be reasonably modified to allow this?
>>
>> > I believe it's doable in theory, no-one has just gotten around to it.
>> > Patches are welcome.
>>
>> Can't you do that today with a multi-command string submitted to
>> PQsendQuery, followed by multiple calls to PQgetResult?
>
> I remember something about this; I think I concluded that it validated
> that receiving multiple results could be done this way.
>
> But this kind of batching can't be used with prepared queries.
>
>> I'm hesitant to think about supporting the case more thoroughly than
>> that, or with any different semantics than that, because I think that
>> the error-case behavior will be entirely unintelligible/unmaintainable
>> unless you abandon all queries-in-flight in toto when an error happens.
>
> Can you explain a bit more detail which errors are of most concern, do you
> mean full buffers on the client send?
>
> Because the content of the stream going to/from the server does not
> change, I wouldn't really expect the semantics to change. For example, the
> server cannot even see that the client is behaving in this way. Are there
> any 'send' functions that are heavily reliant on some kind of
> result/receive state?
>
> I don't disagree with the comments above though, any shift towards
> unintelligible behaviour would be very bad.
>
>> Furthermore, in most apps it'd be a serious PITA to keep track of which
>> reply is for which query, so I doubt that such a feature is of general
>> usefulness.
>
> In our UI case, we already have a queue. Because libpq can't pipeline
> multiple queries, we have to make our own queue of them anyway.

Note, nothing is keeping you from opening up a second connection and
interleaving in that fashion, so 'libpq' is not the bottleneck, the
connection object is :-).

merlin


Re: Multiple queries in transit

От
Merlin Moncure
Дата:
On Mon, Oct 31, 2011 at 12:49 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Mon, Oct 31, 2011 at 12:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>>> On 31.10.2011 17:44, Mark Hills wrote:
>>>> Could libpq be reasonably modified to allow this?
>>
>>> I believe it's doable in theory, no-one has just gotten around to it.
>>> Patches are welcome.
>>
>> Can't you do that today with a multi-command string submitted to
>> PQsendQuery, followed by multiple calls to PQgetResult?
>
> Multi command string queries don't support parameterization.  The way
> I do it is to keep an application managed stack of data (as an array
> of record types) to send that is accumulated when the last stack is in
> transit.  Then when the last response comes in you repeat.

(offlist) in more detail, what I do here is to place action data into
a composite type and parameterize it into an array.  That array is
passed directly to a receiving query or a function if what's happening
in the server is complex.  We wrote a library for that purpose: see
here:

http://libpqtypes.esilo.com/
and especially here:
http://libpqtypes.esilo.com/man3/pqt-composites.html

so that while the connection is busy, and data is coming in from the
app, you continually PQputf() more records into the array that is
going to be shipped off to the server when the connection becomes
available.

On the query that gets to the server, it can be as simple as:
"insert into foo select unnest(%foo[])"

"select work_on_data(%foo[])"

libpqtypes sends all the data in native binary formats so is very fast.

merlin


Re: Multiple queries in transit

От
Merlin Moncure
Дата:
On Mon, Oct 31, 2011 at 1:08 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Mon, Oct 31, 2011 at 12:49 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> On Mon, Oct 31, 2011 at 12:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>>>> On 31.10.2011 17:44, Mark Hills wrote:
>>>>> Could libpq be reasonably modified to allow this?
>>>
>>>> I believe it's doable in theory, no-one has just gotten around to it.
>>>> Patches are welcome.
>>>
>>> Can't you do that today with a multi-command string submitted to
>>> PQsendQuery, followed by multiple calls to PQgetResult?
>>
>> Multi command string queries don't support parameterization.  The way
>> I do it is to keep an application managed stack of data (as an array
>> of record types) to send that is accumulated when the last stack is in
>> transit.  Then when the last response comes in you repeat.
>
> (offlist) in more detail, what I do here is to place action data into
> a composite type and parameterize it into an array.  That array is
> passed directly to a receiving query or a function if what's happening
> in the server is complex.  We wrote a library for that purpose: see
> here:
>
> http://libpqtypes.esilo.com/
> and especially here:
> http://libpqtypes.esilo.com/man3/pqt-composites.html
>
> so that while the connection is busy, and data is coming in from the
> app, you continually PQputf() more records into the array that is
> going to be shipped off to the server when the connection becomes
> available.
>
> On the query that gets to the server, it can be as simple as:
> "insert into foo select unnest(%foo[])"
>
> "select work_on_data(%foo[])"
>
> libpqtypes sends all the data in native binary formats so is very fast.

heh, sorry for the noise here :-).

merlin


Re: Multiple queries in transit

От
Dimitri Fontaine
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> I think a common use for this would be doing multiple inserts or updates on
> one go. Like, insert into a parent table, then more details into child
> tables. You don't care about getting the results back in that case, as long
> as you get an error on failure.

As of 9.1 you can use WITH to achieve that in many cases.
wCTE and INSERT|UPDATE|DELETE … RETURNING are pretty cool combined :)

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: Multiple queries in transit

От
Jeroen Vermeulen
Дата:
On 2011-11-01 00:53, Merlin Moncure wrote:
> On Mon, Oct 31, 2011 at 12:49 PM, Mark Hills<Mark.Hills@framestore.com>  wrote:

>>> Furthermore, in most apps it'd be a serious PITA to keep track of which
>>> reply is for which query, so I doubt that such a feature is of general
>>> usefulness.
>>
>> In our UI case, we already have a queue. Because libpq can't pipeline
>> multiple queries, we have to make our own queue of them anyway.

In libpqxx (the C++ API) you do get support for this kind of pipelining.  Look for the "pipeline" class.  It uses the
"concatenatequeries, 
 
retrieve multiple results" trick.

The pipeline also serves as an easy-to-manage interface for asynchronous 
querying: fire off your query, go do other things while the server is 
working, then ask for the result (at which point you'll block if necessary).

Front page: http://pqxx.org/development/libpqxx/

Pipeline class: 
http://pqxx.org/devprojects/libpqxx/doc/stable/html/Reference/a00062.html

Jeroen


Re: Multiple queries in transit

От
Marko Kreen
Дата:
On Mon, Oct 31, 2011 at 7:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> On 31.10.2011 17:44, Mark Hills wrote:
>>> Could libpq be reasonably modified to allow this?
>
>> I believe it's doable in theory, no-one has just gotten around to it.
>> Patches are welcome.
>
> Can't you do that today with a multi-command string submitted to
> PQsendQuery, followed by multiple calls to PQgetResult?

It's more annoying to to error handling on that, plus it still keeps the
blocking behaviour, just with larger blocks.

> I'm hesitant to think about supporting the case more thoroughly than
> that, or with any different semantics than that, because I think that
> the error-case behavior will be entirely unintelligible/unmaintainable
> unless you abandon all queries-in-flight in toto when an error happens.
> Furthermore, in most apps it'd be a serious PITA to keep track of which
> reply is for which query, so I doubt that such a feature is of general
> usefulness.

Thats why query queue and error handling must happen in protocol
library, not app.  And it seems doable, unless the server eats
queries or errors in some situation, breaking simple sequential
query-response mapping.  Do you know of such behaviour?

(And several queries in Simple Queriy are known exception,
we can ignore them here.)


Also I would ask for opposite feature: "multiple rows in flight".
That means that when server is sending big resultset,
the app can process it row-by-row (or by 10 rows)
without stopping the stream and re-requesting.

-- 
marko

PS. I think "full-duplex" is better than "pipeline" here, latter
seems to hint something unidirectional, except yeah,
it is used in HTTP 1.1 for similar feature.


Re: Multiple queries in transit

От
Jeroen Vermeulen
Дата:
On 2011-11-03 17:26, Marko Kreen wrote:
> On Mon, Oct 31, 2011 at 7:09 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> Can't you do that today with a multi-command string submitted to
>> PQsendQuery, followed by multiple calls to PQgetResult?
>
> It's more annoying to to error handling on that, plus it still keeps the
> blocking behaviour, just with larger blocks.

You can combine multi-command query strings with nonblocking mode, 
without any change in libpq itself.

In fact that's exactly what the libpqxx "pipeline" class does.  So if 
you're working in C++, you already have this feature at your disposal.


> Also I would ask for opposite feature: "multiple rows in flight".
> That means that when server is sending big resultset,
> the app can process it row-by-row (or by 10 rows)
> without stopping the stream and re-requesting.

Cursors.


Jeroen