Re: Should CSV parsing be stricter about mid-field quotes?

Поиск
Список
Период
Сортировка
От Joel Jacobson
Тема Re: Should CSV parsing be stricter about mid-field quotes?
Дата
Msg-id 92a9ae94-6884-4860-ab5f-3995c6f2b137@app.fastmail.com
обсуждение исходный текст
Ответ на Re: Should CSV parsing be stricter about mid-field quotes?  ("Joel Jacobson" <joel@compiler.org>)
Ответы Re: Should CSV parsing be stricter about mid-field quotes?  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-hackers
On Tue, May 16, 2023, at 13:43, Joel Jacobson wrote:
>If we made midfield quoting a CSV error, those users who are currently mistaken
>about their TSV/TEXT files being CSV while also having balanced quotes in their
>data, would encounter an error rather than a silent failure, which I believe
>would be an enhancement.

Furthermore, I think it could be beneficial to add a HINT message for all type
of CSV/TEXT parsing errors, since the precise ERROR messages might just cause
the user to tinker with the options until it works, instead of carefully reading
through the documentation on the various formats.

Perhaps something like this:

HINT: Are you sure the FORMAT matches your input?

Also, the COPY documentation says nothing about TSV, and I know TEXT isn't
exactly TSV, but it's at least much more TSV than CSV, so maybe we should
describe the differences, such as \N. I think the best advise to users would be
to avoid exporting to .TSV and use .CSV instead, since I've noticed e.g.
Google Sheets to replace newlines in fields with blank space when
exporting .TSV, which effectively destroys data.

The first search results for "postgresql tsv" on Google link to postgresql.org
pages, but the COPY docs are not one of them unfortunately.

The first relevant hit is this one:

"Importing a TSV File into Postgres | by Riley Wong" [1]

Sadly, this author has also misunderstood how to properly import a .TSV file,
he got it all wrong, and doesn't understand or at least doesn't mention there
are more differences than just the delimiter:

COPY listings 
FROM '/home/ec2-user/list.tsv'
DELIMITER E'\t'
CSV HEADER;

I must confess I have used PostgreSQL for over two decades without having really
understood the detailed differences between TEXT and CSV, until recently.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Davis
Дата:
Сообщение: Re: Order changes in PG16 since ICU introduction
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: Schema variables - new implementation for Postgres 15