Should CSV parsing be stricter about mid-field quotes?

Поиск
Список
Период
Сортировка
От Joel Jacobson
Тема Should CSV parsing be stricter about mid-field quotes?
Дата
Msg-id 8aeab305-5e94-4fa5-82bf-6da6baee6e05@app.fastmail.com
обсуждение исходный текст
Ответы Re: Should CSV parsing be stricter about mid-field quotes?  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: Should CSV parsing be stricter about mid-field quotes?  (Greg Stark <stark@mit.edu>)
Re: Should CSV parsing be stricter about mid-field quotes?  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-hackers
Hi hackers,

I've come across an unexpected behavior in our CSV parser that I'd like to
bring up for discussion.

% cat example.csv
id,rating,review
1,5,"Great product, will buy again."
2,3,"I bought this for my 6" laptop but it didn't fit my 8" tablet"

% psql
CREATE TABLE reviews (id int, rating int, review text);
\COPY reviews FROM example.csv WITH CSV HEADER;
SELECT * FROM reviews;

This gives:

id | rating |                           review
----+--------+-------------------------------------------------------------
  1 |      5 | Great product, will buy again.
  2 |      3 | I bought this for my 6 laptop but it didn't fit my 8 tablet
(2 rows)

The parser currently accepts quoting within an unquoted field. This can lead to
data misinterpretation when the quote is part of the field data (e.g.,
for inches, like in the example).

Our CSV output rules quote an entire field or not at all. But the import of
fields with mid-field quotes might lead to surprising and undetected outcomes.

I think we should throw a parsing error for unescaped mid-field quotes,
and add a COPY option like ALLOW_MIDFIELD_QUOTES for cases where mid-field
quotes are necessary. The error message could suggest this option when it
encounters an unescaped mid-field quote.

I think the convenience of not having to use an extra option doesn't outweigh
the risk of undetected data integrity issues.

Thoughts?

/Joel

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: v16 regression - wrong query results with LEFT JOINs + join removal
Следующее
От: Robert Haas
Дата:
Сообщение: Re: createuser --memeber and PG 16