Обсуждение: Extending COPY TO

Поиск
Список
Период
Сортировка

Extending COPY TO

От
Andrea Riciputi
Дата:
Hi all,
it’s my first time here, so please let me know if I’m doing something wrong. I’m a developer, heavy PG user, but I’ve
neverhacked it before. Last week at work we had to produce a quite big CSV data file which should be used as input by
anotherpiece of software. 

Since the file must be produced on a daily basis, is big, and it contains data stored in our PG database, letting PG
producethe file itself seemed the right approach. Unfortunately the target software is, let say, “legacy” software and
canonly accept CRLF as EOL character. Since our PG is installed on a Linux server COPY TO results in a CSV file with LF
asEOL, forcing us to pass the file a second time to convert EOL, which is inconvenient. 

My idea was to extend the COPY TO command to accept an EOL option as it already does with the DELIMITER option. To keep
itsimple we can limit the EOL choice to CR, LF or CRLF to avoid unusual output, and also keep the current behaviour
whenno EOL option is given. I was also wondering if an EOL option could be useful also for the text output format or
not.

I spent the weekend reading the COPY command source code and its grammar definition and I think I can patch it by
myself,submit the patch here and wait for your review. However before starting this in my spare time I wanted to know
ifyou, as the PG hackers community, would be against a similar proposal for any reason, and if so why. 

Thanks in advance,
Andrea


Re: Extending COPY TO

От
Heikki Linnakangas
Дата:
On 09/23/2014 09:49 AM, Andrea Riciputi wrote:
> My idea was to extend the COPY TO command to accept an EOL option as
> it already does with the DELIMITER option. To keep it simple we can
> limit the EOL choice to CR, LF or CRLF to avoid unusual output, and
> also keep the current behaviour when no EOL option is given. I was
> also wondering if an EOL option could be useful also for the text
> output format or not.

I don't think we want to go down that path. There are plenty of options 
in COPY already, and the more you add, the more complicated it gets. And 
we're never going to be able to satisfy everyone's needs.

I'd suggest doing:

COPY table TO PROGRAM 'unix2dos > /tmp/file'

- Heikki



Re: Extending COPY TO

От
Stephen Frost
Дата:
Andrea,

* Andrea Riciputi (andrea.riciputi@gmail.com) wrote:
> My idea was to extend the COPY TO command to accept an EOL option as it already does with the DELIMITER option. To
keepit simple we can limit the EOL choice to CR, LF or CRLF to avoid unusual output, and also keep the current
behaviourwhen no EOL option is given. I was also wondering if an EOL option could be useful also for the text output
formator not.
 

Have you considered using COPY TO's 'PROGRAM' option to simply pipe the
output through unix2dos..?

> I spent the weekend reading the COPY command source code and its grammar definition and I think I can patch it by
myself,submit the patch here and wait for your review. However before starting this in my spare time I wanted to know
ifyou, as the PG hackers community, would be against a similar proposal for any reason, and if so why.
 

I'm not particularly against it, though if it can be solved with the
existing 'PROGRAM' capability then it may not make sense to complicate
the COPY code further.
Thanks!
    Stephen

Re: Extending COPY TO

От
Andrea Riciputi
Дата:
Hi all,
thanks for all your answers, I see your point. And I also understand the argument according to which there always be
someother use case to satisfy. However your suggestion to use COPY TO sql TO PROGRAM doesn’t seem to me to fit well the
usecase I have in mind.  

Imagine you access PG from an application written in the language X using a driver library, both your application and
yourPG instance run on two different hosts. Now using COPY TO sql PROGRAM the output file ends up on the PG host
filesystem,while using COPY TO sql STDOUT and passing a file descriptor to PG via the driver library the output file
endsup on the application hosts, which is much more convenient from the application point of view. 

Sure you can always fix this setting up some kind of shared filesystem, but this is just the first of the issues I
couldthink of. What about the potential I/O errors that could happen while opening/writing the output file? I should
replicatethem back from the PG host to the application layer, and this is something I’m pretty sure no one wants to go
down.

So adding such a feature to PG itself seems to me still the best trade off between complexity and convenience. However,
ifyou are strongly against it, or see a better way to get around this problem, please let me know. As I wrote before,
despitebeing an heavy PG user, it’s my first time on the hackers ML and I don’t want to seem disrespectful of the
community.

Thanks,
Andrea


On 23 Sep 2014, at 08:56, Stephen Frost <sfrost@snowman.net> wrote:

> Andrea,
>
> * Andrea Riciputi (andrea.riciputi@gmail.com) wrote:
>> My idea was to extend the COPY TO command to accept an EOL option as it already does with the DELIMITER option. To
keepit simple we can limit the EOL choice to CR, LF or CRLF to avoid unusual output, and also keep the current
behaviourwhen no EOL option is given. I was also wondering if an EOL option could be useful also for the text output
formator not. 
>
> Have you considered using COPY TO's 'PROGRAM' option to simply pipe the
> output through unix2dos..?
>
>> I spent the weekend reading the COPY command source code and its grammar definition and I think I can patch it by
myself,submit the patch here and wait for your review. However before starting this in my spare time I wanted to know
ifyou, as the PG hackers community, would be against a similar proposal for any reason, and if so why. 
>
> I'm not particularly against it, though if it can be solved with the
> existing 'PROGRAM' capability then it may not make sense to complicate
> the COPY code further.
>
>     Thanks!
>
>         Stephen




Re: Extending COPY TO

От
Heikki Linnakangas
Дата:
On 09/24/2014 09:23 AM, Andrea Riciputi wrote:
> Imagine you access PG from an application written in the language X
> using a driver library, both your application and your PG instance
> run on two different hosts.

In that scenario, you'll be using the PQgetCopyData function to get the 
data. PQgetCopyData returns one row at a time; the application can 
trivially change the line-ending to whatever it wants, when writing the 
output to a file or wherever it goes.

> As I wrote before, despite being an heavy PG user, it’s my first time
> on the hackers ML and I don’t want to seem disrespectful of the
> community.

No worries; thanks for effort, even if this idea doesn't pan out.

- Heikki