Re: [BUGS] BUG #9210: PostgreSQL string store bug? not enforce check with correct characterSET/encoding
От | Tom Lane |
---|---|
Тема | Re: [BUGS] BUG #9210: PostgreSQL string store bug? not enforce check with correct characterSET/encoding |
Дата | |
Msg-id | 12131.1392760137@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [BUGS] BUG #9210: PostgreSQL string store bug? not enforce check with correct characterSET/encoding (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [BUGS] BUG #9210: PostgreSQL string store bug? not enforce check with correct characterSET/encoding
Re: [BUGS] BUG #9210: PostgreSQL string store bug? not enforce check with correct characterSET/encoding |
Список | pgsql-hackers |
I wrote: > digoal@126.com writes: >> select t, t::bytea from convert_from('\xeec1', 'sql_ascii') as g(t); >> [ fails to check that string is valid in database encoding ] > Hm, yeah. Normal input to the database goes through pg_any_to_server(), > which will apply a validation step if the source encoding is SQL_ASCII > and the destination encoding is something else. However, pg_convert and > some other places call pg_do_encoding_conversion() directly, and that > function will just quietly do nothing if either encoding is SQL_ASCII. > The minimum-refactoring solution to this would be to tweak > pg_do_encoding_conversion() so that if the src_encoding is SQL_ASCII but > the dest_encoding isn't, it does pg_verify_mbstr() rather than nothing. > I'm not sure if this would break anything we need to have work, > though. Thoughts? Do we want to back-patch such a change? I looked through all the callers of pg_do_encoding_conversion(), and AFAICS this change is a good idea. There are a whole bunch of places that use pg_do_encoding_conversion() to convert from the database encoding to encoding X (most usually UTF8), and right now if you do that in a SQL_ASCII database you have no assurance whatever that what is produced is actually valid in encoding X. I think we need to close that loophole. I found one place --- utf_u2e() in plperl_helpers.h --- that is aware of the lack of checking and forces a pg_verify_mbstr call for itself; but it apparently is concerned about whether the source data is actually utf8 in the first place, which I think is not really pg_do_encoding_conversion's bailiwick. I'm okay with pg_do_encoding_conversion being a no-op if src_encoding == dest_encoding. Barring objections, I will fix and back-patch this. regards, tom lane
В списке pgsql-hackers по дате отправления: