Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n'
| От | Andrew Dunstan |
|---|---|
| Тема | Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n' |
| Дата | |
| Msg-id | edcb24cf-e38f-bd08-2807-2a98242f95a0@dunslane.net обсуждение исходный текст |
| Ответ на | Re: Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n' (Tom Lane <tgl@sss.pgh.pa.us>) |
| Ответы |
Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n'
|
| Список | pgsql-hackers |
On 5/5/21 2:45 PM, Tom Lane wrote: > "Joel Jacobson" <joel@compiler.org> writes: >> I think you misunderstood the problem. >> I don't want the entire file to be considered a single value. >> I want each line to become its own row, just a row with a single column. >> So I actually think COPY seems like a perfect match for the job, >> since it does precisely that, except there is no delimiter in this case. > Well, there's more to it than just the column delimiter. > > * What about \N being converted to NULL? > * What about \. being treated as EOF? > * Do you want to turn off the special behavior of backslash (ESCAPE) > altogether? > * What about newline conversions (\r\n being seen as just \n, etc)? > > I'm inclined to think that "use pg_read_file and then split at newlines" > might be a saner answer than delving into all these fine points. > Not least because people yell when you add cycles to the COPY > inner loops. +1 Also we have generally been resistant to supporting odd formats. FDWs can help here (e.g. file_text_array), but they can't use STDIN IIRC. > >> I'm currently using the pg_read_file()-hack in a project, >> and even though it can read files up to 1GB, >> using e.g. regexp_split_to_table() to split on E'\n' >> seems to need 4x as much memory, so it only >> works with files less than ~256MB. > Yeah, that's because of the conversion to "chr". But a regexp > is overkill for that anyway. Don't we have something that will > split on simple substring matches? > > Not that I know of. There is split_part but I don't think that's fit for purpose here. Do we need one, or have I missed something? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: