Re: Should CSV parsing be stricter about mid-field quotes?
От | Joel Jacobson |
---|---|
Тема | Re: Should CSV parsing be stricter about mid-field quotes? |
Дата | |
Msg-id | 777be2db-f201-49d2-961b-0779f0f0d5ac@app.fastmail.com обсуждение исходный текст |
Ответ на | Re: Should CSV parsing be stricter about mid-field quotes? (Kirk Wolak <wolakk@gmail.com>) |
Ответы |
Re: Should CSV parsing be stricter about mid-field quotes?
Re: Should CSV parsing be stricter about mid-field quotes? |
Список | pgsql-hackers |
On Thu, May 18, 2023, at 00:18, Kirk Wolak wrote:
> Here you go. Not horrible handling. (I use DataGrip so I saved it from there
> directly as TSV, just for an extra datapoint).
>
> FWIW, if you copy/paste in windows, the data, the field with the tab gets
> split into another column in Excel. But saving it as a file, and opening it.
> Saving it as XLSX, and then having Excel save it as a TSV (versus opening a
> text file, and saving it back)
Very useful, thanks.
Interesting, DataGrip contrary to Excel doesn't quote fields with commas in TSV.
All the DataGrip/Excel TSV variants uses quoting when necessary,
contrary to Google Sheets's TSV-format, that doesn't quote fields at all.
DataGrip/Excel terminate also the last record with newline,
while Google Sheets omit the newline for the last record,
(which is bad, since then a streaming reader wouldn't know
if the last record is completed or not.)
This makes me think we probably shouldn't add a new TSV format,
since there is no consistency between vendors.
It's impossible to deduce with certainty if a TSV-field that
begins with a double quotation mark is quoted or unquoted.
Two alternative ideas:
1. How about adding a `WITHOUT QUOTE` or `QUOTE NONE` option in conjunction
with `COPY ... WITH CSV`?
Internally, it would just set
quotec = '\0';`
so it would't affect performance at all.
2. How about adding a note on the complexities of dealing with TSV files in the
COPY documentation?
/Joel
В списке pgsql-hackers по дате отправления: