Re: 8.0.0beta4: "copy" and "client_encoding"
От | Oliver Jowett |
---|---|
Тема | Re: 8.0.0beta4: "copy" and "client_encoding" |
Дата | |
Msg-id | 418BF397.4070000@opencloud.com обсуждение исходный текст |
Ответ на | Re: 8.0.0beta4: "copy" and "client_encoding" (mbch67@yahoo.com) |
Список | pgsql-jdbc |
mbch67@yahoo.com wrote: > 1. I set LATIN1 as the database (postgresql.conf) default client > encoding. Why does COPY, executed via JDBC not use the right encoding? > => To me it seems to be a backend problem. Should this be address in > another posting list? The postgresql.conf setting is a default that can be overridden on a per-client basis. JDBC overrides the default when it connects. This is normal. > 2. Was the decision to disable the "SET CLIENT_ENCODING" command > really a good idea? What about if I am running a server using UNICODE > to store text, my default client encoding is LATIN1 and I want to > import a Korean encoded text file using COPY via JDBC? There is no way > to tell COPY what encoding the input file based on. > In order to be compliant with PSQL I suggest to reactivate the > disabled "SET CLIENT ENCODING" for JDBC. It's a good idea in the sense that if you SET CLIENT_ENCODING, you will break the JDBC driver in nonobvious ways. The check is there as an extra safety net. I'd be OK with a URL parameter to disable the check so that expert users can SET CLIENT_ENCODING at their own risk, but I don't want the check disabled by default. It would be theoretically possible for the JDBC driver to track client_encoding and adjust the encoding it uses accordingly, but: 1) someone needs to actually implement that 2) it is not clear exactly when the encoding changes with respect to receiving the ParameterStatus message (this is only an issue if there are encodings where the contents of the ParameterStatus message would change in the new encoding) 3) it results in an extra round of transcoding (i.e. db encoding -> client encoding -> unicode, rather than just db encoding -> unicode) Given that the only thing that we've seen that depends on client_encoding so far is COPY (and even that has problems), I think the right solution is to fix COPY, not go to a lot of extra work to support arbitary client_encoding values. Are there any other cases where client_encoding needs to be modified by a JDBC user? It really seems to me that client_encoding is an implementation detail that JDBC users should not need to worry about, given that Java already has standard mechanisms for dealing with encodings (namely "turn everything into unicode strings internally"). ==== Also, a couple of workarounds for your case that don't need driver modifications: - force use of protocol version 2 by adding "?protocolVersion=2" to your connection URL; you will lose the benefits of version 3 but it should also defeat the client_encoding checks. - transcode the file from LATIN1 to UNICODE (UTF8) on the server side before issuing the COPY. -O
В списке pgsql-jdbc по дате отправления: