Обсуждение: [PATCH] Add hints for invalid binary encoding names in encode/decode functions

Поиск
Список
Период
Сортировка

[PATCH] Add hints for invalid binary encoding names in encode/decode functions

От
Sugamoto Shinya
Дата:
Hi,

When users pass an invalid encoding name to the built-in functions
`encode()` or `decode()`, PostgreSQL currently reports only:

    ERROR:  unrecognized encoding: "xxx"

This patch adds an error hint listing the valid encoding names,
so users can immediately see the supported options.

Example:

    SELECT encode('\x01'::bytea, 'invalid');
    ERROR:  unrecognized encoding: "invalid"
    HINT:  Valid binary encodings are: "hex", "base64", "base64url", "escape".

This change applies to both `binary_encode()` and `binary_decode()` in `encode.c`,
and adds regression tests under `src/test/regress/sql/strings.sql`.

Although this is a small change, let me share a few considerations behind it:

- I extracted the valid encodings from the hint messages and used a format specifier like  
  `Valid binary encodings are: %s`, so that we avoid scattering those fixed strings
  across translation files.
- I briefly considered adding a small helper function to build the hint string,
  but decided against it since keeping the codebase simple felt preferable, and
  new binary encodings are not added very often.

I’d be happy to hear any opinions or suggestions.

Patch attached.

Best regards,  
Shinya Sugamoto
Вложения

Re: [PATCH] Add hints for invalid binary encoding names in encode/decode functions

От
Fujii Masao
Дата:
On Sat, Nov 8, 2025 at 3:26 PM Sugamoto Shinya <shinya34892@gmail.com> wrote:
>
> Hi,
>
> When users pass an invalid encoding name to the built-in functions
> `encode()` or `decode()`, PostgreSQL currently reports only:
>
>     ERROR:  unrecognized encoding: "xxx"
>
> This patch adds an error hint listing the valid encoding names,
> so users can immediately see the supported options.
>
> Example:
>
>     SELECT encode('\x01'::bytea, 'invalid');
>     ERROR:  unrecognized encoding: "invalid"
>     HINT:  Valid binary encodings are: "hex", "base64", "base64url", "escape".
>
> This change applies to both `binary_encode()` and `binary_decode()` in `encode.c`,
> and adds regression tests under `src/test/regress/sql/strings.sql`.

+1


- errmsg("unrecognized encoding: \"%s\"", namebuf)));
+ errmsg("unrecognized encoding: \"%s\"", namebuf),
+ errhint("Valid binary encodings are: %s",
+ "\"hex\", \"base64\", \"base64url\", \"escape\".")));

Since neither encode.c nor the documentation for encode() use
the term "binary encoding", I think just "encodings" is sufficient
in the hint message.

Also, the colon after "encodings”"doesn't seem necessary.


+-- invalid encoding names should report the valid options
+SELECT encode('\x01'::bytea, 'invalid');  -- error

I'd like to improve the comment like:
"report an error with a hint listing valid encodings when an invalid
encoding is specified".


> Although this is a small change, let me share a few considerations behind it:
>
> - I extracted the valid encodings from the hint messages and used a format specifier like
>   `Valid binary encodings are: %s`, so that we avoid scattering those fixed strings
>   across translation files.

I understand your point. But since new encodings are rarely added as you told,
I'm fine to list them directly in the hint instead of using %s,
similar to other hints like:

src/backend/commands/dbcommands.c:
errhint("Valid strategies are \"wal_log\" and \"file_copy\".")));

Regards,

--
Fujii Masao




> On Nov 8, 2025, at 14:25, Sugamoto Shinya <shinya34892@gmail.com> wrote:
>
> Hi,
>
> When users pass an invalid encoding name to the built-in functions
> `encode()` or `decode()`, PostgreSQL currently reports only:
>
>     ERROR:  unrecognized encoding: "xxx"
>
> This patch adds an error hint listing the valid encoding names,
> so users can immediately see the supported options.
>
> Example:
>
>     SELECT encode('\x01'::bytea, 'invalid');
>     ERROR:  unrecognized encoding: "invalid"
>     HINT:  Valid binary encodings are: "hex", "base64", "base64url", "escape".
>
> This change applies to both `binary_encode()` and `binary_decode()` in `encode.c`,
> and adds regression tests under `src/test/regress/sql/strings.sql`.
>
> Although this is a small change, let me share a few considerations behind it:
>
> - I extracted the valid encodings from the hint messages and used a format specifier like
>   `Valid binary encodings are: %s`, so that we avoid scattering those fixed strings
>   across translation files.
> - I briefly considered adding a small helper function to build the hint string,
>   but decided against it since keeping the codebase simple felt preferable, and
>   new binary encodings are not added very often.
>
> I’d be happy to hear any opinions or suggestions.
>
> Patch attached.
>
> Best regards,
> Shinya Sugamoto
> <0001-Added-error-hints-for-invalid-binary-encoding-names-.patch>

1.
```
-                 errmsg("unrecognized encoding: \"%s\"", namebuf)));
+                 errmsg("unrecognized encoding: \"%s\"", namebuf),
+                 errhint("Valid binary encodings are: %s",
+                         "\"hex\", \"base64\", \"base64url\", \"escape\".")));
```

I think hardcoding the encoding list is fragile. AFAIK, “base64url” was newly added just a couple of months ago.

The list is defined in encode.c (search for enclist in the file), I guess we can add a function to return a string with
aencoding names. 

2
```
+                 errhint("Valid binary encodings are: %s",
+                         "\"hex\", \"base64\", \"base64url\", \"escape\".")));
```

Looks like putting punctuation inside %s is not normal. By looking at other hint messages, they usually do something
like:

```
errhint("Perhaps you meant the option \"%s\".”, value)
```

But I think if you address comment 1, then this one will be addressed as well.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: [PATCH] Add hints for invalid binary encoding names in encode/decode functions

От
Daniel Gustafsson
Дата:
> On 10 Nov 2025, at 10:06, Chao Li <li.evan.chao@gmail.com> wrote:
>> On Nov 8, 2025, at 14:25, Sugamoto Shinya <shinya34892@gmail.com> wrote:

>> This patch adds an error hint listing the valid encoding names,
>> so users can immediately see the supported options.

+1.

> I think hardcoding the encoding list is fragile. AFAIK, “base64url” was newly added just a couple of months ago.
>
> The list is defined in encode.c (search for enclist in the file), I guess we can add a function to return a string
witha encoding names. 

New encodings are added very infrequently, we can revisit this if that changes
but till then I think the simplicity of a hardcoded string is preferrable.

--
Daniel Gustafsson





> On Nov 10, 2025, at 22:43, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 10 Nov 2025, at 10:06, Chao Li <li.evan.chao@gmail.com> wrote:
>>> On Nov 8, 2025, at 14:25, Sugamoto Shinya <shinya34892@gmail.com> wrote:
>
>>> This patch adds an error hint listing the valid encoding names,
>>> so users can immediately see the supported options.
>
> +1.
>
>> I think hardcoding the encoding list is fragile. AFAIK, “base64url” was newly added just a couple of months ago.
>>
>> The list is defined in encode.c (search for enclist in the file), I guess we can add a function to return a string
witha encoding names. 
>
> New encodings are added very infrequently, we can revisit this if that changes
> but till then I think the simplicity of a hardcoded string is preferrable.
>

I’m not convinced that a hardcoded string is actually better. In fact, the less often something changes, the easier it
isto miss updating it when we need to. 

If we add a helper function to build the list of supported encodings:

* The function is small and straightforward — trivial to implement.
* The function doesn’t run in any performance-critical paths, so no meaningful cost there.
* The encoding names are plain ASCII identifiers and effectively the same across languages, so there’s no real benefit
tohardcoding them for i18n purposes either. 

So I still think the helper function seems cleaner and less error-prone than a hardcoded string.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: [PATCH] Add hints for invalid binary encoding names in encode/decode functions

От
Sugamoto Shinya
Дата:
Thanks everyone for checking my proposal.
Other than how to make the hint string, every comment looks good to me. I'll take those into my patch. I appreciate it.


Given the comments you all made, I looked into the source code to see how we build `errhint` with values, and found the following.

 // contrib/pg_prewarm/pg_prewarm.c:102
 errhint("Valid prewarm types are \"prefetch\", \"read\", and \"buffer\".")

 // contrib/amcheck/verify_heapam.c:303
 errhint("Valid skip options are \"all-visible\", \"all-frozen\", and \"none\".")

 // src/backend/replication/logical/launcher.c:458
  errhint("You might need to increase \"%s\".", "max_logical_replication_workers")

// src/backend/commands/opclasscmds.c:1240
  errhint("Valid signature of operator class options parsing function is %s.",
          "(internal) RETURNS void")));

// src/backend/commands/dbcommands.c:1044
  errhint("Valid strategies are \"wal_log\" and \"file_copy\".")));


Regarding the following comment, 

>In fact, the less often something changes, the easier it is to miss updating it when we need to.
>
> If we add a helper function to build the list of supported encodings:
>
> * The function is small and straightforward — trivial to implement.
> * The function doesn’t run in any performance-critical paths, so no meaningful cost there.
> * The encoding names are plain ASCII identifiers and effectively the same across languages, so there’s no real benefit to hardcoding them for i18n purposes either.

While I partly agree with the idea of adding a helper function, introducing one just for constructing a simple hint string would lead to more code of the kind that I would prefer to avoid, since this is not our main concern.

So, although it is true that having such a helper function would improve maintainability in some sense,
considering that:
- there are already several existing cases where values are simply embedded into hint texts (as Fujii pointed out), and
- this approach does not violate our coding rules or make the code harder to understand,
I tend to think that it’s acceptable to hard-code the list of possible encodings in the hint this time.

Thoughts?

Regards,

On Tue, Nov 11, 2025 at 3:14 PM Chao Li <li.evan.chao@gmail.com> wrote:


> On Nov 10, 2025, at 22:43, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 10 Nov 2025, at 10:06, Chao Li <li.evan.chao@gmail.com> wrote:
>>> On Nov 8, 2025, at 14:25, Sugamoto Shinya <shinya34892@gmail.com> wrote:
>
>>> This patch adds an error hint listing the valid encoding names,
>>> so users can immediately see the supported options.
>
> +1.
>
>> I think hardcoding the encoding list is fragile. AFAIK, “base64url” was newly added just a couple of months ago.
>>
>> The list is defined in encode.c (search for enclist in the file), I guess we can add a function to return a string with a encoding names.
>
> New encodings are added very infrequently, we can revisit this if that changes
> but till then I think the simplicity of a hardcoded string is preferrable.
>

I’m not convinced that a hardcoded string is actually better. In fact, the less often something changes, the easier it is to miss updating it when we need to.

If we add a helper function to build the list of supported encodings:

* The function is small and straightforward — trivial to implement.
* The function doesn’t run in any performance-critical paths, so no meaningful cost there.
* The encoding names are plain ASCII identifiers and effectively the same across languages, so there’s no real benefit to hardcoding them for i18n purposes either.

So I still think the helper function seems cleaner and less error-prone than a hardcoded string.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/




Re: [PATCH] Add hints for invalid binary encoding names in encode/decode functions

От
Masahiko Sawada
Дата:
On Tue, Nov 11, 2025 at 7:14 AM Sugamoto Shinya <shinya34892@gmail.com> wrote:
>
> Thanks everyone for checking my proposal.
> Other than how to make the hint string, every comment looks good to me. I'll take those into my patch. I appreciate
it.
>
>
> Given the comments you all made, I looked into the source code to see how we build `errhint` with values, and found
thefollowing. 
>
>  // contrib/pg_prewarm/pg_prewarm.c:102
>  errhint("Valid prewarm types are \"prefetch\", \"read\", and \"buffer\".")
>
>  // contrib/amcheck/verify_heapam.c:303
>  errhint("Valid skip options are \"all-visible\", \"all-frozen\", and \"none\".")
>
>  // src/backend/replication/logical/launcher.c:458
>   errhint("You might need to increase \"%s\".", "max_logical_replication_workers")
>
> // src/backend/commands/opclasscmds.c:1240
>   errhint("Valid signature of operator class options parsing function is %s.",
>           "(internal) RETURNS void")));
>
> // src/backend/commands/dbcommands.c:1044
>   errhint("Valid strategies are \"wal_log\" and \"file_copy\".")));
>
>
> Regarding the following comment,
>
> >In fact, the less often something changes, the easier it is to miss updating it when we need to.
> >
> > If we add a helper function to build the list of supported encodings:
> >
> > * The function is small and straightforward — trivial to implement.
> > * The function doesn’t run in any performance-critical paths, so no meaningful cost there.
> > * The encoding names are plain ASCII identifiers and effectively the same across languages, so there’s no real
benefitto hardcoding them for i18n purposes either. 
>
> While I partly agree with the idea of adding a helper function, introducing one just for constructing a simple hint
stringwould lead to more code of the kind that I would prefer to avoid, since this is not our main concern. 
>
> So, although it is true that having such a helper function would improve maintainability in some sense,
> considering that:
> - there are already several existing cases where values are simply embedded into hint texts (as Fujii pointed out),
and
> - this approach does not violate our coding rules or make the code harder to understand,
> I tend to think that it’s acceptable to hard-code the list of possible encodings in the hint this time.
>
> Thoughts?

+1 for hard-coding the supported encoding list. I think it's a good
start point. We can revisit the idea of dynamically constructing the
encoding list if we're concerned about its maintenance cost.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: [PATCH] Add hints for invalid binary encoding names in encode/decode functions

От
Fujii Masao
Дата:
On Wed, Nov 12, 2025 at 10:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> +1 for hard-coding the supported encoding list. I think it's a good
> start point. We can revisit the idea of dynamically constructing the
> encoding list if we're concerned about its maintenance cost.

+1

Regards,

--
Fujii Masao



Re: [PATCH] Add hints for invalid binary encoding names in encode/decode functions

От
Sugamoto Shinya
Дата:
On Wed, Nov 12, 2025 at 10:56 AM Fujii Masao <masao.fujii@gmail.com> wrote:
On Wed, Nov 12, 2025 at 10:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> +1 for hard-coding the supported encoding list. I think it's a good
> start point. We can revisit the idea of dynamically constructing the
> encoding list if we're concerned about its maintenance cost.

+1

Regards,

--
Fujii Masao

Thanks everyone for reviewing my proposal.
I've hard-coded the valid list of encoding names. 

Also, I modified the hint string into like "Valid encodings are \"hex\", \"base64\", \"base64url\", and \"escape\"."
by adding "and" before "escape".

I attached my v2 patch to this message. Please let me know freely if you have any additional questions.

Regards,
Вложения

Re: [PATCH] Add hints for invalid binary encoding names in encode/decode functions

От
Peter Eisentraut
Дата:
On 10.11.25 10:06, Chao Li wrote:
> The list is defined in encode.c (search for enclist in the file), I guess we can add a function to return a string
witha encoding names.
 
> 
> 2
> ```
> +                 errhint("Valid binary encodings are: %s",
> +                         "\"hex\", \"base64\", \"base64url\", \"escape\".")));
> ```
> 
> Looks like putting punctuation inside %s is not normal. By looking at other hint messages, they usually do something
like:
> 
> ```
> errhint("Perhaps you meant the option \"%s\".”, value)
> ```

Yes, this is because the punctuation is itself subject to translation. 
(Most obviously, different languages use different quotation marks.)  So 
the preferred style is something like

errhint("Valid encodings are: \"%s\", \"%s\", and \"%s\", "hex", ...)

or however many you need.

See also 
<https://www.postgresql.org/message-id/202511070936.jj4o4ktd4b6l%40alvherre.pgsql> 
for a related discussion.




Re: [PATCH] Add hints for invalid binary encoding names in encode/decode functions

От
Fujii Masao
Дата:
On Wed, Nov 12, 2025 at 10:23 PM Sugamoto Shinya <shinya34892@gmail.com> wrote:
> Thanks everyone for reviewing my proposal.
> I've hard-coded the valid list of encoding names.
>
> Also, I modified the hint string into like "Valid encodings are \"hex\", \"base64\", \"base64url\", and \"escape\"."
> by adding "and" before "escape".
>
> I attached my v2 patch to this message. Please let me know freely if you have any additional questions.

Thanks for the updated patch! LGTM.

One minor comment: in v2 patch, it seems the encodings in the HINT message are
listed in the order they appear in the enclist struct. Wouldn't it be
clearer to list them
alphabetically, matching the order shown in the docs [1]: base64,
base64url, escape, and hex?

Regards,

[1] https://www.postgresql.org/docs/devel/functions-binarystring.html#ENCODE-FORMAT-BASE64

--
Fujii Masao



Re: [PATCH] Add hints for invalid binary encoding names in encode/decode functions

От
Sugamoto Shinya
Дата:
On Thu, Nov 13, 2025 at 9:58 AM Fujii Masao <masao.fujii@gmail.com> wrote:
On Wed, Nov 12, 2025 at 10:23 PM Sugamoto Shinya <shinya34892@gmail.com> wrote:
> Thanks everyone for reviewing my proposal.
> I've hard-coded the valid list of encoding names.
>
> Also, I modified the hint string into like "Valid encodings are \"hex\", \"base64\", \"base64url\", and \"escape\"."
> by adding "and" before "escape".
>
> I attached my v2 patch to this message. Please let me know freely if you have any additional questions.

Thanks for the updated patch! LGTM.

One minor comment: in v2 patch, it seems the encodings in the HINT message are
listed in the order they appear in the enclist struct. Wouldn't it be
clearer to list them
alphabetically, matching the order shown in the docs [1]: base64,
base64url, escape, and hex?

Regards,

[1] https://www.postgresql.org/docs/devel/functions-binarystring.html#ENCODE-FORMAT-BASE64

--
Fujii Masao

Thanks, Fujii and Peter.

I ordered the valid encodings alphabetically and extracted those into separate parameters.

Here is my new patch.

Regards,
Вложения

Re: [PATCH] Add hints for invalid binary encoding names in encode/decode functions

От
Daniel Gustafsson
Дата:
> On 17 Nov 2025, at 15:08, Sugamoto Shinya <shinya34892@gmail.com> wrote:

> I ordered the valid encodings alphabetically and extracted those into separate parameters.

LGTM.  Per project style I don't think we should quote the names of encodings
since they are visibly not English words (see commit a243569bf65c5 which deals
with variable names, but I think it applies here as well), but no need to send
another version of the patch just for that small adjustment.

--
Daniel Gustafsson




Re: [PATCH] Add hints for invalid binary encoding names in encode/decode functions

От
Fujii Masao
Дата:
On Mon, Nov 17, 2025 at 11:30 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 17 Nov 2025, at 15:08, Sugamoto Shinya <shinya34892@gmail.com> wrote:
>
> > I ordered the valid encodings alphabetically and extracted those into separate parameters.
>
> LGTM.  Per project style I don't think we should quote the names of encodings
> since they are visibly not English words (see commit a243569bf65c5 which deals
> with variable names, but I think it applies here as well)

It looks like the message-style rule added in commit a243569bf65c5
applies only to
configuration parameter names, right? And it seems that rule was later revised
by commit 17974ec2594. Is that correct?

Regards,

--
Fujii Masao



Re: [PATCH] Add hints for invalid binary encoding names in encode/decode functions

От
Daniel Gustafsson
Дата:
> On 17 Nov 2025, at 15:59, Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Mon, Nov 17, 2025 at 11:30 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>>
>>> On 17 Nov 2025, at 15:08, Sugamoto Shinya <shinya34892@gmail.com> wrote:
>>
>>> I ordered the valid encodings alphabetically and extracted those into separate parameters.
>>
>> LGTM.  Per project style I don't think we should quote the names of encodings
>> since they are visibly not English words (see commit a243569bf65c5 which deals
>> with variable names, but I think it applies here as well)
>
> It looks like the message-style rule added in commit a243569bf65c5
> applies only to
> configuration parameter names, right? And it seems that rule was later revised
> by commit 17974ec2594. Is that correct?

Oh right, I had forgotten about that, thanks for the reminder.  The patch as
proposed is correct then.

--
Daniel Gustafsson