Обсуждение: Sending messages inside a function

Поиск
Список
Период
Сортировка

Sending messages inside a function

От
Chris Campbell
Дата:
I'm trying to write a set of C functions that mimic the COPY TO and  
COPY FROM SQL commands so I can use them to stream binary data to/ 
from the client.

This seems more efficient to me than passing BYTEAs back and forth.  
Maybe that's the best way to go, but the COPY mechanism seems perfect  
for this. Just look how straightforward the send_to_client() function  
was to write (below). I've been using large objects to accomplish  
this for the past few years, but as customers' databases get larger  
and larger (the bulk being entirely large objects), backing up and  
restoring and recovering from corruption is taking a really long  
time. So I'm going to store the data elsewhere on the server (outside  
the database) and piggy-back the PostgreSQL connection to send the  
data back and forth. This way, the database stays much smaller and  
makes backup and restore much easier.

I'm envisioning something like this:


(1) Data transfer from Server -> Client
---------------------------------------
Client: SELECT send_to_client()
Server:   Sends an 'H' message to begin a COPY TO
Server:   Sends a bunch of 'd' messages with the data
Server:   Sends a 'c' message to end the COPY operation
Client: Gets PGRES_COPY_OUT from PQresultStatus()
Client: Calls PQgetCopyData() to retrieve data until it returns -1
Client: Calls PQgetResult() to retrieve the results of the SELECT query

(2) Data transfer from Client -> Server
---------------------------------------
Client: SELECT receive_from_client()
Server:   Sends a 'G' message to begin a COPY FROM
Client: Gets PGRES_COPY_IN from PQresultStatus()
Client: Calls PQputCopyData() to send the data
Server:   Receives a bunch of 'd' messages
Server:   Receives a 'c' message to end the COPY operation
Client: Calls PQgetResult() to retrieve the results of the SELECT query


I've gotten #1 working. But the client isn't able to retrieve the  
SELECT results from calling the send_to_client() function (which  
really doesn't bother me too much -- the send_to_client() function  
returns NULL). Instead, I just call PQgetResult() until it returns  
NULL and ignore all results. However, my receive_from_client()  
function will need to return a status, so I'd like to get this  
figured out for send_to_client() before moving on to  
receive_from_client().

I'm thinking that the server sends a tuple description message ('T')  
before my function is invoked (and sends all the 'H', 'd', and 'c'  
messages) and then sends the tuple data message ('D') with the  
function's return value afterwards. So, by spitting out messages  
inside my function, I'm disconnecting the tuple data from the  
descriptions.

I get the following error message:
    server sent data ("D" message) without prior row description  
("T" message)

I could remedy this by "faking" my own 'T' message after I've  
performed the COPY operation (to match the 'D' message that's sent  
later). But that seems ugly.

If the 'T' message wasn't sent until after my function was invoked,  
that would solve all my problems (if my assumption about the 'T'  
message being sent before the function's invoked is correct). I'm  
trying to find where in the code the 'T' message is sent and try to  
delay it until after the function is actually invoked, but I'm not  
having much success. Any comments or pointers?

Here's what my send_to_client() function looks like:

Datum
send_to_client(PG_FUNCTION_ARGS)
{    // Begin a COPY TO STDOUT operation
    StringInfoData buf;
    pq_beginmessage(&buf, 'H');    pq_sendbyte(&buf, 1); // binary format (1)    pq_sendint(&buf, 0, 2); // 0
attributes   pq_endmessage(&buf);
 
    // TODO: Send the data to the client    // For now, just send a simple binary message
    char *buffer = "Hello, world!";
    pq_putmessage('d', buffer, strlen(buffer));
    // End the COPY operation
    pq_putemptymessage('c');
    PG_RETURN_NULL();
}

I've called the function from my libpq-based program ("SELECT  
send_to_client()") and then retrieved the data using PQgetCopyData()  
and it works, as long as I don't care about the results of the SELECT  
query and just throw away any PGresults I get after the COPY  
operation. But that's going to cause me problems when I implement  
receive_from_client().

Thanks!

- Chris



Re: Sending messages inside a function

От
Tom Lane
Дата:
Chris Campbell <chris@bignerdranch.com> writes:
> I'm trying to write a set of C functions that mimic the COPY TO and  
> COPY FROM SQL commands so I can use them to stream binary data to/ 
> from the client.

You cannot control the FE/BE protocol from inside a user function.

> This seems more efficient to me than passing BYTEAs back and forth.  

Why?  A byte is a byte.  There is no possible way that you will ever
save enough microseconds from this to repay the development time you
will waste trying to kluge the system to do it.

If it's the overhead of parsing the SELECT you are unhappy with,
you could look at using the "fast path" function call protocol
to invoke your functions.
        regards, tom lane


Re: Sending messages inside a function

От
Chris Campbell
Дата:
On Nov 29, 2005, at 13:28, Tom Lane wrote:

> You cannot control the FE/BE protocol from inside a user function.

"cannot" is a bit strong -- the function in my original email did it,  
and it works (with caveats). It's good to know that I "should" not,  
though. I'm still curious why I got the error message:
    server sent data ("D" message) without prior row description  
("T" message)

Is the 'T' message sent back to the client before my function was  
invoked? Can the 'T' message be suppressed until immediately before  
the first piece of tuple data is returned to the client? That way, I  
could do my COPY operation before the 'T' message is sent and no one  
would be the wiser.

>> This seems more efficient to me than passing BYTEAs back and forth.
>
> Why?  A byte is a byte.  There is no possible way that you will ever
> save enough microseconds from this to repay the development time you
> will waste trying to kluge the system to do it.

Does that still apply to BYTEAs that are tens (hundreds?) of  
megabytes? I was hoping to stream the data so that I wouldn't have to  
read it all into memory and pass it to the function (or return it  
from the function) as one big chunk. That way, the first part could  
be going out over the network while the function was reading in the  
rest of the data. It wouldn't have to wait to read the entire piece  
of data into memory before transmitting it. I could also put up a  
nice little progress bar for the transfer.

If I pass in the data as a BYTEA parameter to the function, is the  
data transmitted completely before the function is invoked? If the  
function fails before it even needs the data (due to an invalid  
argument, etc), it seems wasteful to have transmitted it across the  
wire.

I also wanted to avoid the hassle of having to massage the binary  
data (escaping values, etc) before transmitting it, but perhaps using  
the binary format in libpq is all that's needed. Can I easily turn a  
buffer of data into a BYTEA and pass it to PQexecParams()?

Thanks!

- Chris