Re: Let file_fdw access COPY FROM PROGRAM

Поиск
Список
Период
Сортировка
От Corey Huinker
Тема Re: Let file_fdw access COPY FROM PROGRAM
Дата
Msg-id CADkLM=eW701r3JMDf3XeznY44XK8aa+hvaVCV74zpK3BGKn7Gg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Let file_fdw access COPY FROM PROGRAM  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers
On Fri, Jun 3, 2016 at 1:06 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 3 June 2016 at 04:48, Corey Huinker <corey.huinker@gmail.com> wrote:
A while back, there was a push to make COPY gzip-aware. That didn't happen, but COPY FROM PROGRAM did, and it scratches the same itch.
 
- writing unwanted columns to a temp/work table via COPY, and then immediately re-reading them

Without wanting to take away from the value of letting file FDW access FROM PROGRAM, I think this really merits a solution that applies to COPY as well. Variants on "how do I COPY just some columns from a CSV" is a real FAQ, and it doesn't seem like it'd be excessively hard to support. We'd just need some way to pass a list of column-ordinals or header-names.
 
Not asking you to do that work, just pointing out that this particular issue applies to COPY its self as well.

I agree, they are two different but slightly overlapping issues. file_fdw needs a way to handle compressed/filtered files, and COPY needs the ability to skip columns. But right now COPY is all about getting the shape of the input from the shape of the destination table.

I had the bright idea of creating a custom datatype, call it "skip" / "filler" / "nada" / "devnull" or something like that, that would map any and all values to NULL, thus allowing COPY to naively "store" the unwanted values into nothingness. 

That idea falls apart when it hits InputFunctionCall() in fmgr.c, which prevents any text string from coercing into NULL. I didn't think I was up to the task of making InputFunctionCall respect a special case type (void?) or somehow promoting the pseudo type void into a legit column type (unknown side-effects there). I did create a type that coerced all input into the empty string, but the disk usage of that was still pretty significant.

So maybe if there was a subspecies of COPY that returned SETOF RECORD:

SELECT important_column1, important_column2
FROM copy_srf( program := 'zcat filename', format := 'csv', header := true ) AS t(bogus_col1 text, important_column1 integer, important_column2 date, bogus_col2 text, ... );

That would allow COPY to keep it's current efficiency and simplicity, while (hopefully) avoiding the unwanted data from ever hitting disk. It would also be vaguely reminiscent of SQL*Loader.
 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Albe Laurenz
Дата:
Сообщение: Re: Prepared statements and generic plans
Следующее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: Parallel pg_dump's error reporting doesn't work worth squat