Re: UTF8 national character data type support WIP patch and list of open issues.
От | Tatsuo Ishii |
---|---|
Тема | Re: UTF8 national character data type support WIP patch and list of open issues. |
Дата | |
Msg-id | 20130922.072952.1977066018971837040.t-ishii@sraoss.co.jp обсуждение исходный текст |
Ответ на | Re: UTF8 national character data type support WIP patch and list of open issues. (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: UTF8 national character data type support WIP patch and list of open issues.
|
Список | pgsql-hackers |
> I think the point here is that, at least as I understand it, encoding > conversion and sanitization happens at a very early stage right now, > when we first receive the input from the client. If the user sends a > string of bytes as part of a query or bind placeholder that's not > valid in the database encoding, it's going to error out before any > type-specific code has an opportunity to get control. Look at > textin(), for example. There's no encoding check there. That means > it's already been done at that point. To make this work, someone's > going to have to figure out what to do about *that*. Until we have a > sketch of what the design for that looks like, I don't see how we can > credibly entertain more specific proposals. I don't think the bind placeholder is the case. That is processed by exec_bind_message() in postgres.c. It has enough info about the type of the placeholder, and I think we can easily deal with NCHAR. Same thing can be said to COPY case. Problem is an ordinary query (simple protocol "Q" message) as you pointed out. Encoding conversion happens at a very early stage (note that fast-path case has the same issue). If a query message contains, say, SHIFT-JIS and EUC-JP, then we are going into trouble because the encoding conversion routine (pg_client_to_server) regards that the message from client contains only one encoding. However my question is, does it really happen? Because there's any text editor which can create SHIFT-JIS and EUC-JP mixed text. So my guess is, when user want to use NCHAR as SHIFT-JIS text, the rest of query consist of either SHIFT-JIS or plain ASCII. If so, what the user need to do is, set the client encoding to SJIFT-JIS and everything should be fine. Maumau, is my guess correct? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
В списке pgsql-hackers по дате отправления: