Обсуждение: 9.4. String Functions and Operators page does not document that encode adds line breaks

Поиск
Список
Период
Сортировка

9.4. String Functions and Operators page does not document that encode adds line breaks

От
PG Doc comments form
Дата:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/12/functions-string.html
Description:

It took me a long time to discover why a base 64 encoded SHA-512 hash was 89
characters long instead of the expected 88. The documentation does not
mention that the encode function inserts newlines after 76 characters.
Please make a note of this behavior.

By the way, this is a very poor design decision. The function has no
knowledge of how the string is going to be used. If it is going to be
displayed on an 80-character terminal, then the newline makes sense. If it
is going to be written to a PEM-encoded file, then the newline is to be
expected. But I'm inserting the result into a VARCHAR(88) column and
comparing with base-64 encoded strings from Node.js. There is no reason for
the results to be terminal or file friendly. Instead, they should be machine
friendly. The decision to add newlines should have been made on display or
on creation of the PEM file, where that information becomes available. The
workaround of trimming whitespace characters from the encoded string is ugly
and unacceptable.

Re: 9.4. String Functions and Operators page does not document thatencode adds line breaks

От
"David G. Johnston"
Дата:
On Sat, Feb 8, 2020 at 12:10 PM PG Doc comments form <noreply@postgresql.org> wrote:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/12/functions-string.html
Description:

It took me a long time to discover why a base 64 encoded SHA-512 hash was 89
characters long instead of the expected 88. The documentation does not
mention that the encode function inserts newlines after 76 characters.
Please make a note of this behavior.

Patch submissions are welcomed.  Though there is an argument for this being an implementation detail one shouldn't rely upon and therefore should not be described in user-facing documentation.
 
By the way, this is a very poor design decision.

It seems to be something we inherited some 20 years ago and are not likely to change even though I suspect you will find general agreement with your position.  Though since its isn't documented maybe changing it would be ok.

The workaround of trimming whitespace characters from the encoded string is ugly
and unacceptable.

It may be a bit ugly but when dealing with base64, specifically when decoding, whitespace of this nature is expressly allowed.  Its historical presence here, based upon MIME requirements prevalent at the time the code was written, doesn't alter its meaning and so it somewhat rightfully considered an implementation detail that is not necessary to document.

David J.

Re: 9.4. String Functions and Operators page does not document that encode adds line breaks

От
Tom Lane
Дата:
PG Doc comments form <noreply@postgresql.org> writes:
> It took me a long time to discover why a base 64 encoded SHA-512 hash was 89
> characters long instead of the expected 88. The documentation does not
> mention that the encode function inserts newlines after 76 characters.
> Please make a note of this behavior.

That was done a few weeks ago in HEAD:

https://www.postgresql.org/docs/devel/functions-binarystring.html

    The base64 format is that of RFC 2045 Section 6.8. As per the RFC,
    encoded lines are broken at 76 characters. However instead of the MIME
    CRLF end-of-line marker, only a newline is used for end-of-line. The
    decode function ignores carriage-return, newline, space, and tab
    characters. Otherwise, an error is raised when decode is supplied
    invalid base64 data — including when trailing padding is incorrect.

> By the way, this is a very poor design decision.

So far as I can tell, that RFC's requirement for line breaks has not
been removed by any later RFC.  So you're complaining to the wrong
people.

            regards, tom lane



Re: 9.4. String Functions and Operators page does not document thatencode adds line breaks

От
"David G. Johnston"
Дата:
On Sun, Feb 9, 2020 at 9:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Doc comments form <noreply@postgresql.org> writes:

The base64 format is that of RFC 2045 Section 6.8. As per the RFC,
encoded lines are broken at 76 characters
 
> By the way, this is a very poor design decision.

So far as I can tell, that RFC's requirement for line breaks has not
been removed by any later RFC.  So you're complaining to the wrong
people.

Stating direct RFC4648 compliance would require us to drop the line breaks that are only being added due to using MIME rules which ideally our general encoding function would not do.  Greenfield we probably would want base64 to be general RFC4648 and add something like base64-mime which performs the line breaking for the user per RFC 2045, base64-pem which would use that specific environments RFC rules.  Now, maybe we can add "base64-4648" or "base64-general" while leaving "base64" alone and using the MIME version of the rules?

David J.

Re: 9.4. String Functions and Operators page does not document thatencode adds line breaks

От
Alvaro Herrera
Дата:
On 2020-Feb-09, David G. Johnston wrote:

> Stating direct RFC4648 compliance would require us to drop the line breaks
> that are only being added due to using MIME rules which ideally our general
> encoding function would not do.  Greenfield we probably would want base64
> to be general RFC4648 and add something like base64-mime which performs the
> line breaking for the user per RFC 2045, base64-pem which would use that
> specific environments RFC rules.  Now, maybe we can add "base64-4648" or
> "base64-general" while leaving "base64" alone and using the MIME version of
> the rules?

Patches welcome.

I'm not sure that we *need* to preserve the historical behavior.  Many
people would probably be okay with encode('base64') returning no
newlines (since they are useless most of the time anyway), and the
minority that does can use encode('base64-rfc2045').

Another idea might be to add an optional 'flags' option to encode(),
which are given to the encoder/decoder functions.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: 9.4. String Functions and Operators page does not document thatencode adds line breaks

От
Bruce Momjian
Дата:
On Thu, Feb 27, 2020 at 03:32:56PM -0300, Alvaro Herrera wrote:
> On 2020-Feb-09, David G. Johnston wrote:
> 
> > Stating direct RFC4648 compliance would require us to drop the line breaks
> > that are only being added due to using MIME rules which ideally our general
> > encoding function would not do.  Greenfield we probably would want base64
> > to be general RFC4648 and add something like base64-mime which performs the
> > line breaking for the user per RFC 2045, base64-pem which would use that
> > specific environments RFC rules.  Now, maybe we can add "base64-4648" or
> > "base64-general" while leaving "base64" alone and using the MIME version of
> > the rules?
> 
> Patches welcome.
> 
> I'm not sure that we *need* to preserve the historical behavior.  Many
> people would probably be okay with encode('base64') returning no
> newlines (since they are useless most of the time anyway), and the
> minority that does can use encode('base64-rfc2045').
> 
> Another idea might be to add an optional 'flags' option to encode(),
> which are given to the encoder/decoder functions.

I have had this force-wrap problem using Linux command-line tools.  You
can see it when using xxd here on page 54:

    https://momjian.us/main/writings/tls.pdf#page=54

xxd allows you to specify a maximum length, so I used -cols 999 to avoid
the wrap.  Other times I used a tool to remove the newlines from the
output.  I think you should just use the existing Postgres SQL string
functions to remove the newlines.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +