Re: The "char" type versus non-ASCII characters
От | Tom Lane |
---|---|
Тема | Re: The "char" type versus non-ASCII characters |
Дата | |
Msg-id | 2320640.1638560531@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: The "char" type versus non-ASCII characters (Andrew Dunstan <andrew@dunslane.net>) |
Ответы |
Re: The "char" type versus non-ASCII characters
|
Список | pgsql-hackers |
Andrew Dunstan <andrew@dunslane.net> writes: > On 12/3/21 14:12, Tom Lane wrote: >> I can think of at least three ways we might address this: >> >> * Forbid all non-ASCII values for type "char". This results in >> simple and portable semantics, but it might break usages that >> work okay today. >> >> * Allow such values only in single-byte server encodings. This >> is a bit messy, but it wouldn't break any cases that are not >> problematic already. >> >> * Continue to allow non-ASCII values, but change charin/charout, >> char_text, etc so that the external representation is encoding-safe >> (perhaps make it an octal or decimal number). > I don't like #2. Yeah, it's definitely messy --- for example, maybe é works in a latin1 database but is rejected when you try to restore into a DB with utf8 encoding. > Is #3 going to change the external representation only > for non-ASCII values? If so, that seems OK. Right, I envisioned that ASCII behaves the same but we'd use a numeric representation for high-bit-set values. These cases could be told apart fairly easily by charin(), since the numeric representation would always be three digits. > #1 is the simplest to implement and to understand, > and I suspect it would break very little in practice, but others might > disagree with that assessment. We'd still have to decide what to do with pg_upgrade'd non-ASCII values, so there's messiness there too. Having charout() throw an error seems not very nice. regards, tom lane
В списке pgsql-hackers по дате отправления: