Re: Extend COPY FROM with HEADER to skip multiple lines
От | Andrew Dunstan |
---|---|
Тема | Re: Extend COPY FROM with HEADER |
Дата | |
Msg-id | daf30bdc-0735-4a17-a3f2-7e41de2667c3@dunslane.net обсуждение исходный текст |
Ответ на |
Re: Extend COPY FROM with HEADER |
Ответы |
Re: Extend COPY FROM with HEADER Re: Extend COPY FROM with HEADER |
Список | pgsql-hackers |
On 2025-06-09 Mo 4:27 AM, Fujii Masao wrote: > > > On 2025/06/09 16:10, Shinya Kato wrote: >> Hi hackers, >> >> I'd like to propose a new feature for the COPY FROM command to allow >> skipping multiple header lines when loading data. This enhancement >> would enable files with multi-line headers to be loaded without any >> preprocessing, which would significantly improve usability. >> >> In real-world scenarios, it's common for data files to contain >> multiple header lines, such as file descriptions or column >> explanations. Currently, the COPY command cannot load these files >> directly, which requires users to preprocess them with tools like sed >> or tail. >> >> Although you can use "COPY t FROM PROGRAM 'tail -n +3 /path/to/file'", >> some environments do not have the tail command available. >> Additionally, this approach requires superuser privileges or >> membership in the pg_execute_server_program role. >> >> This feature also has precedent in other major RDBMS: >> - MySQL: LOAD DATA ... IGNORE N LINES [1] >> - SQL Server: BULK INSERT … WITH (FIRST ROW=N) [2] >> - Oracle SQL*Loader: sqlldr … SKIP=N [3] >> >> I have not yet created a patch, but I am willing to implement an >> extension for the HEADER option. I would like to discuss the >> specification first. >> >> The specification I have in mind is as follows: >> - Command: COPY FROM >> - Formats: text and csv >> - Option syntax: HEADER [ boolean | integer | MATCH] (Extend the >> HEADER option to accept an integer value in addition to the existing >> boolean and MATCH keywords.) >> - Behavior: Let N be the specified integer. >> - If N < 0, raise an error. >> - If N = 0 or 1, same behavior when boolean is specified. >> - If N > 1, skip the first N rows. >> >> Thoughts? > > I generally like the idea. > > However, a similar proposal was made earlier [1], and seemingly > some hackers weren't in favor of it. It's probably worth reading > that thread to understand the previous concerns. > > Regards, > > > [1] > https://postgr.es/m/CALAY4q8nGSXp0P5uf56vn-mD7reWqZP5k6PS1CGUm26X4FsYJA@mail.gmail.com I think the earlier proposal went rather further than this one, which I suspect can be implemented fairly cheaply. I don't have terribly strong feelings about it, but matching a feature implemented elsewhere has some attraction if it can be done easily. OTOH I'm a bit curious to know what software produces multi-line CSV headers. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: