Re: Should CSV parsing be stricter about mid-field quotes?
От | Pavel Stehule |
---|---|
Тема | Re: Should CSV parsing be stricter about mid-field quotes? |
Дата | |
Msg-id | CAFj8pRCxAxFMbg+hgKcZ0+_+K2dELcgL2gfz8TbHzim6PfMrzQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Should CSV parsing be stricter about mid-field quotes? ("Joel Jacobson" <joel@compiler.org>) |
Список | pgsql-hackers |
čt 11. 5. 2023 v 16:04 odesílatel Joel Jacobson <joel@compiler.org> napsal:
Hi hackers,I've come across an unexpected behavior in our CSV parser that I'd like tobring up for discussion.% cat example.csvid,rating,review1,5,"Great product, will buy again."2,3,"I bought this for my 6" laptop but it didn't fit my 8" tablet"% psqlCREATE TABLE reviews (id int, rating int, review text);\COPY reviews FROM example.csv WITH CSV HEADER;SELECT * FROM reviews;This gives:id | rating | review----+--------+-------------------------------------------------------------1 | 5 | Great product, will buy again.2 | 3 | I bought this for my 6 laptop but it didn't fit my 8 tablet(2 rows)The parser currently accepts quoting within an unquoted field. This can lead todata misinterpretation when the quote is part of the field data (e.g.,for inches, like in the example).Our CSV output rules quote an entire field or not at all. But the import offields with mid-field quotes might lead to surprising and undetected outcomes.I think we should throw a parsing error for unescaped mid-field quotes,and add a COPY option like ALLOW_MIDFIELD_QUOTES for cases where mid-fieldquotes are necessary. The error message could suggest this option when itencounters an unescaped mid-field quote.I think the convenience of not having to use an extra option doesn't outweighthe risk of undetected data integrity issues.Thoughts?
+1
Pavel
/Joel
В списке pgsql-hackers по дате отправления: