Обсуждение: BUG #15373: null / utf-8
The following bug has been logged on the website: Bug reference: 15373 Logged by: Eric Hansen Email address: orchideric@yahoo.com PostgreSQL version: 10.5 Operating system: ubu 16 Description: doesn’t support converting NULL bytes to UTF-8 and therefore returns an error
On Sat, Sep 8, 2018 at 05:24:17PM +0000, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 15373 > Logged by: Eric Hansen > Email address: orchideric@yahoo.com > PostgreSQL version: 10.5 > Operating system: ubu 16 > Description: > > doesn’t support converting NULL bytes to UTF-8 and therefore returns an > error You need to encode binary values containing nulls as bytea strings. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce Momjian wrote: > On Sat, Sep 8, 2018 at 05:24:17PM +0000, PG Bug reporting form wrote: >> doesn’t support converting NULL bytes to UTF-8 and therefore returns an >> error > You need to encode binary values containing nulls as bytea strings. Actually this reminds me to log a documentation bug, or rather improvement. It is already slightly confusing that Postgres doesn't support null bytes in text columns at all and in bytea columns cannot take them in as a normal string literal. I'm assuming this is traditional and difficult to change. Now, there are two formats in which bytea column content can be specified, the "hex" and the deprecated "escape" format. This "escape" format is called that way because it allows escape sequences which are un-escaped before the data is written to the column. This step is comparable to the parsing of JSON for jsonb columns or the parsing of array syntax for any array type. However, during the parsing of an SQL there is a second, completely different, un-escaping step going on, the un-escaping of C-style escapes in string literals starting with an "E", *if* you start the string literal with an "E". For some reason the example given in the documentation for the "hex" format uses such an "escape string literal": SELECT E'\\xDEADBEEF'; I found this very confusing. Can this example be changed to a normal string literal, like this? SELECT '\xDEADBEEF'; Regards, Andre
On Sat, Sep 8, 2018 at 09:43:18PM +0200, André Hänsel wrote: > Bruce Momjian wrote: > > > On Sat, Sep 8, 2018 at 05:24:17PM +0000, PG Bug reporting form wrote: > >> doesn’t support converting NULL bytes to UTF-8 and therefore returns an > >> error > > > You need to encode binary values containing nulls as bytea strings. > > Actually this reminds me to log a documentation bug, or rather improvement. > > It is already slightly confusing that Postgres doesn't support null bytes > in text columns at all and in bytea columns cannot take them in as a normal > string literal. I'm assuming this is traditional and difficult to change. > > Now, there are two formats in which bytea column content can be specified, > the "hex" and the deprecated "escape" format. > > This "escape" format is called that way because it allows escape sequences > which are un-escaped before the data is written to the column. This step is > comparable to the parsing of JSON for jsonb columns or the parsing of array > syntax for any array type. > > However, during the parsing of an SQL there is a second, completely > different, un-escaping step going on, the un-escaping of C-style escapes in > string literals starting with an "E", *if* you start the string literal > with an "E". > > For some reason the example given in the documentation for the "hex" > format uses such an "escape string literal": > > SELECT E'\\xDEADBEEF'; > > I found this very confusing. Can this example be changed to a normal > string literal, like this? > > SELECT '\xDEADBEEF'; You know, I 100% agree with you. We used the E'' syntax so we would produce the same results whether standard_conforming_strings was true or false. However, we changed the standard_conforming_strings default to true in Postgres 9.1 on 2011-09-12, and that release has been end-of-life for a year. I think it is time to clarify our documentation examples by assuming that standard_conforming_strings is true. I will work on a patch. Thanks. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Sat, Sep 8, 2018 at 04:36:19PM -0400, 'Bruce Momjian' wrote: > On Sat, Sep 8, 2018 at 09:43:18PM +0200, André Hänsel wrote: > > For some reason the example given in the documentation for the "hex" > > format uses such an "escape string literal": > > > > SELECT E'\\xDEADBEEF'; > > > > I found this very confusing. Can this example be changed to a normal > > string literal, like this? > > > > SELECT '\xDEADBEEF'; > > You know, I 100% agree with you. We used the E'' syntax so we would > produce the same results whether standard_conforming_strings was true or > false. However, we changed the standard_conforming_strings default to > true in Postgres 9.1 on 2011-09-12, and that release has been > end-of-life for a year. > > I think it is time to clarify our documentation examples by assuming > that standard_conforming_strings is true. I will work on a patch. > Thanks. I have developed the attached documentation patch to remove the E'' syntax. standard_conforming_strings defaulted to 'on' in PG 9.1. I also found that our bytea data type section wasn't properly adjusted when we changed the the default bytea_output to 'hex' in PG 9.0. I think the only question is how far back to apply this patch. I am thinking through 9.3. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Вложения
Bruce Momjian wrote: > I have developed the attached documentation patch to remove the E'' > syntax. standard_conforming_strings defaulted to 'on' in PG 9.1. On https://www.postgresql.org/docs/current/static/arrays.html, there is a note at the bottom with an example currently using the C-style escape syntax. Your patch changes that, as agreed, but then the text of the note, as it is at the moment, doesn't make sense anymore because the note is about the doubling of backslashes that you have with C-style escape string literals.
On Fri, Sep 14, 2018 at 10:15:48AM +0200, André Hänsel wrote: > Bruce Momjian wrote: > > I have developed the attached documentation patch to remove the E'' > > syntax. standard_conforming_strings defaulted to 'on' in PG 9.1. > > On https://www.postgresql.org/docs/current/static/arrays.html, there > is a note at the bottom with an example currently using the C-style > escape syntax. Your patch changes that, as agreed, but then the text > of the note, as it is at the moment, doesn't make sense anymore > because the note is about the doubling of backslashes that you have > with C-style escape string literals. Wow, good you saw that. I removed that entire paragraph since there is no longer double-escape evaluation by default. Updated patch attached. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Вложения
On Fri, Sep 14, 2018 at 03:49:17PM -0400, 'Bruce Momjian' wrote: > On Fri, Sep 14, 2018 at 10:15:48AM +0200, André Hänsel wrote: > > Bruce Momjian wrote: > > > I have developed the attached documentation patch to remove the E'' > > > syntax. standard_conforming_strings defaulted to 'on' in PG 9.1. > > > > On https://www.postgresql.org/docs/current/static/arrays.html, there > > is a note at the bottom with an example currently using the C-style > > escape syntax. Your patch changes that, as agreed, but then the text > > of the note, as it is at the moment, doesn't make sense anymore > > because the note is about the doubling of backslashes that you have > > with C-style escape string literals. > > Wow, good you saw that. I removed that entire paragraph since there is > no longer double-escape evaluation by default. Updated patch attached. Patch applied through 9.3. Thanks for the report. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +