Re: Patch: [BUGS] BUG #12320: json parsing with embedded double quotes
От | Andrew Dunstan |
---|---|
Тема | Re: Patch: [BUGS] BUG #12320: json parsing with embedded double quotes |
Дата | |
Msg-id | 54AD8CEF.3080904@dunslane.net обсуждение исходный текст |
Ответ на | Patch: [BUGS] BUG #12320: json parsing with embedded double quotes (Aaron Botsis <aaron@bt-r.com>) |
Ответы |
Re: Patch: [BUGS] BUG #12320: json parsing with embedded double quotes
|
Список | pgsql-hackers |
On 01/07/2015 08:25 AM, Aaron Botsis wrote: > Hi folks, I was having a problem importing json data with COPY. Lots > of things export data nicely as one json blob per line. This is > excellent for directly importing into a JSON/JSONB column for analysis. > > ...Except when there’s an embedded doublequote. Or anything that’s > escaped. COPY handles this, but by the time the escaped char hit the > JSON parser, it's not escaped anymore. This breaks the JSON parsing. > This means I need to manipulate the input data to double-escape it. > See bug #12320 for an example. Yuck. > > I propose this small patch that simply allows specifying COPY … ESCAPE > without requiring the CSV parser. It will make it much easier to > directly use json formatted export data for folks going forward. This > seemed like the simplest route. > > Usage is simply: > > postgres=# copy t1 from '/Users/nok/Desktop/queries.json'; > ERROR: invalid input syntax for type json > DETAIL: Token "root" is invalid. > CONTEXT: JSON data, line 1: ...1418066241619 AND <=1418671041621) AND > user:"root... > COPY t1, line 3, column bleh: > "{"timestamp":"2014-12-15T19:17:32.505Z","duration":7.947,"query":{"query":{"filtered":{"filter":{"qu..." > postgres=# copy t1 from '/Users/nok/Desktop/queries.json' escape ''; > COPY 1966 > > This isn't a bug. Neither CSV format nor TEXT format are partucularly suitable for json. I'm quite certain I could compose legal json that will break your proposal (for example, with an embedded newline in the white space.) It's also unnecessary. CSV format, while not designed for this, is nevertheless sufficiently flexible to allow successful import of json data meeting certain criteria (essentially no newlines), like this: copy the_table(jsonfield) from '/path/to/jsondata' csv quote e'\x01' delimiter e'\x02'; You aren't the first person to encounter this problem. See <http://adpgtech.blogspot.com/2014/09/importing-json-data.html> Maybe we need to add something like this to the docs, or to the wiki. Note too my comment in that blog post: Now this solution is a bit of a hack. I wonder if there's a case for a COPY mode that simply treats each line as a singledatum. I also wonder if we need some more specialized tools for importing JSON, possibly one or more Foreign DataWrappers. Such things could handle, say, embedded newline punctuation. cheers andrew
В списке pgsql-hackers по дате отправления: