Re: Pipeline mode and PQpipelineSync()

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: Pipeline mode and PQpipelineSync()
Дата
Msg-id 202106222214.ptjfmstb23mu@alvherre.pgsql
обсуждение исходный текст
Ответ на Re: Pipeline mode and PQpipelineSync()  (Boris Kolpackov <boris@codesynthesis.com>)
Ответы Re: Pipeline mode and PQpipelineSync()  (Alvaro Herrera <alvaro.herrera@2ndquadrant.com>)
Re: Pipeline mode and PQpipelineSync()  (Boris Kolpackov <boris@codesynthesis.com>)
Список pgsql-hackers
On 2021-Jun-21, Boris Kolpackov wrote:

> Alvaro Herrera <alvaro.herrera@2ndquadrant.com> writes:
> 
> > I think I should rephrase this to say that PQpipelineSync() is needed
> > where the user needs the server to start executing commands; and
> > separately indicate that it is possible (but not promised) that the
> > server would start executing commands ahead of time because $reasons.
> 
> I think always requiring PQpipelineSync() is fine since it also serves
> as an error recovery boundary. But the fact that the server waits until
> the sync message to start executing the pipeline is surprising. To me
> this seems to go contrary to the idea of a "pipeline".

But does that actually happen?  There's a very easy test we can do by
sending queries that sleep.  If my libpq program sends a "SELECT
pg_sleep(2)", then PQflush(), then sleep in the client program two more
seconds without sending the sync; and *then* send the sync, I find that
the program takes 2 seconds, not four.  This shows that both client and
server slept in parallel, even though I didn't send the Sync until after
the client was done sleeping.

In order to see this, I patched libpq_pipeline.c with the attached, and
ran it under time:

time ./libpq_pipeline  simple_pipeline -t simple.trace
simple pipeline... sent and flushed the sleep. Sleeping 2s here:
client sleep done
ok

real    0m2,008s
user    0m0,000s
sys    0m0,003s


So I see things happening as you describe in (1):

> In fact, I see the following ways the server could behave:
> 
> 1. The server starts executing queries and sending their results before
>    receiving the sync message.

I am completely at a loss on how to explain a server that behaves in any
other way, given how the protocol is designed.  There is no buffering on
the server side.

> While it can be tempting to say that this is an implementation detail,
> this affects the way one writes a client. For example, I currently have
> the following comment in my code:
> 
>   // Send queries until we get blocked. This feels like a better
>   // overall strategy to keep the server busy compared to sending one
>   // query at a time and then re-checking if there is anything to read
>   // because the results of INSERT/UPDATE/DELETE are presumably small
>   // and quite a few of them can get buffered before the server gets
>   // blocked.
> 
> This would be a good strategy for behavior (1) but not (3) (where it
> would make more sense to queue the queries on the client side).

Agreed, that's the kind of strategy I would have thought was the most
reasonable, given my understanding of how the protocol works.

I wonder if your program is being affected by something else.  Maybe the
socket is nonblocking (though I don't quite understand how that would
affect the client behavior in just this way), or your program is
buffering elsewhere.  I don't do C++ much so I can't help you with that.

> So I think it would be useful to clarify the server behavior and
> specify it in the documentation.

I'll see about improving the docs on these points.

> > Do I have it right that other than this documentation problem, you've
> > been able to use pipeline mode successfully?
> 
> So far I've only tried it in a simple prototype (single INSERT statement).
> But I am busy plugging it into ODB's bulk operation support (that we
> already have for Oracle and MSSQL) and once that's done I should be
> able to exercise things in more meaningful ways.

Fair enough.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mike
Дата:
Сообщение: Fwd: Emit namespace in post-copy output
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: disfavoring unparameterized nested loops