Обсуждение: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

Поиск

Список

Период

Сортировка

Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

От

Sergey Prokhorenko

Дата:

23 октября 2025 г., 15:34:43

Hi pgsql-hackers,

I'm writing to propose adding two new built-in functions to PostgreSQL that provide compact UUID encoding using the base32hex format.

I'm one of the contributors to RFC 9562 (UUIDs) and to the uuidv7() implementations in PostgreSQL and several libraries. I'm writing to express my strong support for a new patch by Andrey Borodin, the developer of the built-in uuidv7() function for PostgreSQL 18. This patch adds two new functions for UUID compact text representation. These functions would be long-awaited additions to PostgreSQL's UUID functionality.

I would like to request the community to review this patch and to consider it for commit.

The patch is available here: https://github.com/x4m/postgres_g/commit/aa902bbc5dfc47d4b35f05016304a1e671abb505

_______________________________

uuid_to_base32hex ( uuid ) -> text

Encodes a UUID into a 26-character base32hex string (uppercase, no hyphens, without padding), using the alphabet 0123456789ABCDEFGHIJKLMNOPQRSTUV as specified in RFC 4648 (https://datatracker.ietf.org/doc/html/rfc4648#page-10).

To accommodate base32hex encoding (5 bits per character), the 128-bit UUID requires 130 bits total (26 characters * 5 bits). The additional 2 zero bits are appended as padding.

This compact, lexicographically sortable format preserves temporal ordering for UUIDv7, making it ideal for primary keys stored as values in JSON key-value pairs, as well as for URLs, filenames, and other space-constrained contexts.

Example:

uuid_to_base32hex('019535d9-3df7-79fb-b466-fa907fa17f9e'::uuid) -> 06AJBM9TUTSVND36VA87V8BVJO

_______________________________

base32hex_to_uuid ( text ) -> uuid

Decodes a base32hex string back into its original UUID. The input is case-insensitive. Invalid inputs return NULL. The decoding is lossless and produces a bitwise-identical UUID.

Example:

base32hex_to_uuid('06AJBM9TUTSVND36VA87V8BVJO') -> 019535d9-3df7-79fb-b466-fa907fa17f9e

_______________________________

We considered base36 but rejected it due to poor performance. Crockford's Base32 was also rejected due to its lack of native support in standard libraries, making base32hex the most practical choice.

Converter: https://tomeko.net/online_tools/base32hex.php?lang=en

Best regards,

Sergey Prokhorenko

Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

От

Dagfinn Ilmari Mannsåker

Дата:

28 октября 2025 г., 14:53:51

Jelte Fennema-Nio <postgres@jeltef.nl> writes:

> First of all, I'm definitely a proponent of being able to encode UUIDs
> using base32hex in Postgres.
>
> On Mon, 27 Oct 2025 at 23:37, Sergey Prokhorenko
> <sergeyprokhorenko@yahoo.com.au> wrote:
>>
>> Regarding the proposal to couple UUID encoding with the bytea type
>> through encode()/decode() functions: I understand the appeal of
>> reusing existing infrastructure, but this creates a conceptual
>> mismatch. UUID is a distinct semantic type in PostgreSQL, not merely
>> binary data. The bytea type has existed for decades without base32hex
>> encoding, and that's worked fine, because bytea represents arbitrary
>> binary data, not universally unique identifiers with specific
>> structural properties and needs.
>
> I think by far the first step is to make the encoding of UUIDs in
> different formats possible in Postgres. The way to do so with the
> least API impact (and thus as you noticed, least pushback), would be
> to add base32hex to the list of encoding formats in the encode/decode
> functions. Then combining that with UUID <-> bytea casting (which also
> seems totally reasonable functionality to me), would give you the
> functionality (but not the defaults you want).

+1 for adding casts.

> In a follow up patch I would personally be fine making the API to
> encode UUIDs a bit more friendly. In particular, adding an overload to
> the encode function that takes a UUID instead of a bytea seems
> reasonable to me, i.e. encode(id uuid, format text) -> text

Yeah, this works for encode, but not decode, since the argment types are
the same (text, text) in both cases.

> I'm currently less convinced about a decode_uuid function though. I
> think some perf argument (including some benchmarks) would need to be
> made to convince me of its usefulness. Because purely from an API
> friendliness lens, I feel like decode('...', 'base32hex)::uuid and
> decode_uuid('...', 'base32hex') rank basically the same.

Agreed.

> Once/if an accepted RFC actually defines a default shorter encoding
> for UUIDs we could I would definitely be in favor of adding a
> decode_uuid function with the default encoding configured as a default
> argument. As well as adding the default argument to the uuid encode
> overload function.

If it's just one new form, do we need a separate decode function?  Could
we not just make uuid_in() accept both forms (they're easily
distinguishable by length), like bytea_in accepts both the old escape
format and the new hex format?

And if the new format becomes the standard and want to change the
default output format, we would need a GUC like bytea_output anyway, to
let users control when to make the change.

- ilmari

Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

От

Jelte Fennema-Nio

Дата:

28 октября 2025 г., 15:24:49

On Tue, 28 Oct 2025 at 12:53, Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:
> If it's just one new form, do we need a separate decode function?  Could
> we not just make uuid_in() accept both forms (they're easily
> distinguishable by length), like bytea_in accepts both the old escape
> format and the new hex format?
>
> And if the new format becomes the standard and want to change the
> default output format, we would need a GUC like bytea_output anyway, to
> let users control when to make the change.

Agreed to both of those. This seems too far out to spend much time
discussing now.

Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

От

Dagfinn Ilmari Mannsåker

Дата:

28 октября 2025 г., 19:41:37

Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> writes:

> Jelte Fennema-Nio <postgres@jeltef.nl> writes:
>
>> Then combining that with UUID <-> bytea casting (which also
>> seems totally reasonable functionality to me), would give you the
>> functionality (but not the defaults you want).
>
> +1 for adding casts.

Here's a patch for that.  I'm not 100% confident about the error code
for invalid length, but that was the closest one I could find in
errcodes.txt.

I might do the base32hex encode/decode later, unless someone else beats
me to it.

- ilmari

From 1ef47968fd1852f7a409764db2c4b35f0eb948b6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Tue, 28 Oct 2025 16:33:17 +0000
Subject: [PATCH] Allow explicit casting between bytea and UUID

This enables using encode() and decode() to convert UUIDs to and from
alternative formats, such as base64.
---
 src/backend/utils/adt/bytea.c      | 26 ++++++++++++++++++++++++++
 src/include/catalog/pg_cast.dat    |  6 ++++++
 src/include/catalog/pg_proc.dat    |  7 +++++++
 src/test/regress/expected/uuid.out | 15 +++++++++++++++
 src/test/regress/sql/uuid.sql      |  4 ++++
 5 files changed, 58 insertions(+)

diff --git a/src/backend/utils/adt/bytea.c b/src/backend/utils/adt/bytea.c
index 6e7b914c563..e51190d2cb2 100644
--- a/src/backend/utils/adt/bytea.c
+++ b/src/backend/utils/adt/bytea.c
@@ -26,6 +26,7 @@
 #include "utils/fmgrprotos.h"
 #include "utils/memutils.h"
 #include "utils/sortsupport.h"
+#include "utils/uuid.h"
 #include "utils/varlena.h"
 #include "varatt.h"
 
@@ -1112,3 +1113,28 @@ int8_bytea(PG_FUNCTION_ARGS)
 {
     return int8send(fcinfo);
 }
+
+/* Cast bytea -> uuid */
+Datum
+bytea_uuid(PG_FUNCTION_ARGS)
+{
+    bytea       *v = PG_GETARG_BYTEA_PP(0);
+    int            len = VARSIZE_ANY_EXHDR(v);
+    pg_uuid_t  *uuid;
+
+    if (len != UUID_LEN)
+        ereport(ERROR,
+                errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                errmsg("invalid uuid length"));
+
+    uuid = (pg_uuid_t *) palloc(UUID_LEN);
+    memcpy(uuid->data, VARDATA_ANY(v), UUID_LEN);
+    PG_RETURN_POINTER(uuid);
+}
+
+/* Cast uuid -> bytea; can just use uuid_send() */
+Datum
+uuid_bytea(PG_FUNCTION_ARGS)
+{
+    return uuid_send(fcinfo);
+}
diff --git a/src/include/catalog/pg_cast.dat b/src/include/catalog/pg_cast.dat
index fbfd669587f..913c55ef869 100644
--- a/src/include/catalog/pg_cast.dat
+++ b/src/include/catalog/pg_cast.dat
@@ -348,6 +348,12 @@
 { castsource => 'bytea', casttarget => 'int8', castfunc => 'int8(bytea)',
   castcontext => 'e', castmethod => 'f' },
 
+# Allow explicit coercions between bytea and uuid type
+{ castsource => 'bytea', casttarget => 'uuid', castfunc => 'uuid(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'uuid', casttarget => 'bytea', castfunc => 'bytea(uuid)',
+  castcontext => 'e', castmethod => 'f' },
+
 # Allow explicit coercions between int4 and "char"
 { castsource => 'char', casttarget => 'int4', castfunc => 'int4(char)',
   castcontext => 'e', castmethod => 'f' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9121a382f76..110593eb399 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1199,6 +1199,13 @@
   proname => 'int8', prorettype => 'int8', proargtypes => 'bytea',
   prosrc => 'bytea_int8' },
 
+{ oid => '9880', descr => 'convert uuid to bytea',
+  proname => 'bytea', prorettype => 'bytea', proargtypes => 'uuid',
+  prosrc => 'uuid_bytea' },
+{ oid => '9881', descr => 'convert bytea to uuid',
+  proname => 'uuid', prorettype => 'uuid', proargtypes => 'bytea',
+  prosrc => 'bytea_uuid' },
+
 { oid => '449', descr => 'hash',
   proname => 'hashint2', prorettype => 'int4', proargtypes => 'int2',
   prosrc => 'hashint2' },
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 95392003b86..44f58454d19 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -305,5 +305,20 @@ SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
  
 (1 row)
 
+-- casts
+SELECT '11111111-1111-1111-1111-111111111111'::uuid::bytea;
+               bytea                
+------------------------------------
+ \x11111111111111111111111111111111
+(1 row)
+
+SELECT '\xffffffffffffffffffffffffffffffff'::bytea::uuid;
+                 uuid                 
+--------------------------------------
+ ffffffff-ffff-ffff-ffff-ffffffffffff
+(1 row)
+
+SELECT '\x12345678'::bytea::uuid; -- error
+ERROR:  invalid uuid length
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 465153a0341..ef81be93670 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -146,6 +146,10 @@ SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
+-- casts
+SELECT '11111111-1111-1111-1111-111111111111'::uuid::bytea;
+SELECT '\xffffffffffffffffffffffffffffffff'::bytea::uuid;
+SELECT '\x12345678'::bytea::uuid; -- error
 
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.51.1

Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

От

Masahiko Sawada

Дата:

28 октября 2025 г., 21:44:12

On Tue, Oct 28, 2025 at 9:41 AM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:
>
> Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> writes:
>
> > Jelte Fennema-Nio <postgres@jeltef.nl> writes:
> >
> >> Then combining that with UUID <-> bytea casting (which also
> >> seems totally reasonable functionality to me), would give you the
> >> functionality (but not the defaults you want).
> >
> > +1 for adding casts.
>
> Here's a patch for that.  I'm not 100% confident about the error code
> for invalid length, but that was the closest one I could find in
> errcodes.txt.

Thank you for the patch. I'll review it.

> I might do the base32hex encode/decode later, unless someone else beats
> me to it.

Andrey has shared his patch for base32hex support before[1]. While it
needs to be updated, it seems to implement sufficient function.

Regards,

[1] https://www.postgresql.org/message-id/6F76FA61-E2DC-44EF-9504-889D9BDB4EBD%40yandex-team.ru

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

От

Jelte Fennema-Nio

Дата:

29 октября 2025 г., 02:35:28

On Tue, 28 Oct 2025 at 17:41, Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:
> Here's a patch for that.

Looks good to me. Maybe add a test where not every byte is the same though.

> I'm not 100% confident about the error code
> for invalid length, but that was the closest one I could find in
> errcodes.txt.

The errorcode you chose seems acceptable to me, but I think a slightly
more fitting option would be ERRCODE_INVALID_BINARY_REPRESENTATION.
Error codes in postgres are pretty arbitrary though, so either seems
fine to me.

Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

От

Dagfinn Ilmari Mannsåker

Дата:

29 октября 2025 г., 13:51:56

Jelte Fennema-Nio <postgres@jeltef.nl> writes:

> On Tue, 28 Oct 2025 at 17:41, Dagfinn Ilmari Mannsåker
> <ilmari@ilmari.org> wrote:
>> Here's a patch for that.
>
> Looks good to me. Maybe add a test where not every byte is the same though.

Good point. I've replaced them with two randomly generated ones.

>> I'm not 100% confident about the error code
>> for invalid length, but that was the closest one I could find in
>> errcodes.txt.
>
> The errorcode you chose seems acceptable to me, but I think a slightly
> more fitting option would be ERRCODE_INVALID_BINARY_REPRESENTATION.
> Error codes in postgres are pretty arbitrary though, so either seems
> fine to me.

That does seem like a better fit. It's used mainly in recv functions,
which this basically is (but user-callable).

Updated patch attaced.

- ilmari

From 7bb7bba8d1be96783ac45df3178fb613e84c39fa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Tue, 28 Oct 2025 16:33:17 +0000
Subject: [PATCH v2] Allow explicit casting between bytea and UUID

This enables using encode() and decode() to convert UUIDs to and from
alternative formats, such as base64.
---
 src/backend/utils/adt/bytea.c      | 26 ++++++++++++++++++++++++++
 src/include/catalog/pg_cast.dat    |  6 ++++++
 src/include/catalog/pg_proc.dat    |  7 +++++++
 src/test/regress/expected/uuid.out | 15 +++++++++++++++
 src/test/regress/sql/uuid.sql      |  4 ++++
 5 files changed, 58 insertions(+)

diff --git a/src/backend/utils/adt/bytea.c b/src/backend/utils/adt/bytea.c
index 6e7b914c563..ff9e46f3015 100644
--- a/src/backend/utils/adt/bytea.c
+++ b/src/backend/utils/adt/bytea.c
@@ -26,6 +26,7 @@
 #include "utils/fmgrprotos.h"
 #include "utils/memutils.h"
 #include "utils/sortsupport.h"
+#include "utils/uuid.h"
 #include "utils/varlena.h"
 #include "varatt.h"
 
@@ -1112,3 +1113,28 @@ int8_bytea(PG_FUNCTION_ARGS)
 {
     return int8send(fcinfo);
 }
+
+/* Cast bytea -> uuid */
+Datum
+bytea_uuid(PG_FUNCTION_ARGS)
+{
+    bytea       *v = PG_GETARG_BYTEA_PP(0);
+    int            len = VARSIZE_ANY_EXHDR(v);
+    pg_uuid_t  *uuid;
+
+    if (len != UUID_LEN)
+        ereport(ERROR,
+                errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+                errmsg("invalid uuid length"));
+
+    uuid = (pg_uuid_t *) palloc(UUID_LEN);
+    memcpy(uuid->data, VARDATA_ANY(v), UUID_LEN);
+    PG_RETURN_POINTER(uuid);
+}
+
+/* Cast uuid -> bytea; can just use uuid_send() */
+Datum
+uuid_bytea(PG_FUNCTION_ARGS)
+{
+    return uuid_send(fcinfo);
+}
diff --git a/src/include/catalog/pg_cast.dat b/src/include/catalog/pg_cast.dat
index fbfd669587f..913c55ef869 100644
--- a/src/include/catalog/pg_cast.dat
+++ b/src/include/catalog/pg_cast.dat
@@ -348,6 +348,12 @@
 { castsource => 'bytea', casttarget => 'int8', castfunc => 'int8(bytea)',
   castcontext => 'e', castmethod => 'f' },
 
+# Allow explicit coercions between bytea and uuid type
+{ castsource => 'bytea', casttarget => 'uuid', castfunc => 'uuid(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'uuid', casttarget => 'bytea', castfunc => 'bytea(uuid)',
+  castcontext => 'e', castmethod => 'f' },
+
 # Allow explicit coercions between int4 and "char"
 { castsource => 'char', casttarget => 'int4', castfunc => 'int4(char)',
   castcontext => 'e', castmethod => 'f' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9121a382f76..110593eb399 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1199,6 +1199,13 @@
   proname => 'int8', prorettype => 'int8', proargtypes => 'bytea',
   prosrc => 'bytea_int8' },
 
+{ oid => '9880', descr => 'convert uuid to bytea',
+  proname => 'bytea', prorettype => 'bytea', proargtypes => 'uuid',
+  prosrc => 'uuid_bytea' },
+{ oid => '9881', descr => 'convert bytea to uuid',
+  proname => 'uuid', prorettype => 'uuid', proargtypes => 'bytea',
+  prosrc => 'bytea_uuid' },
+
 { oid => '449', descr => 'hash',
   proname => 'hashint2', prorettype => 'int4', proargtypes => 'int2',
   prosrc => 'hashint2' },
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 95392003b86..4b635336606 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -305,5 +305,20 @@ SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
  
 (1 row)
 
+-- casts
+SELECT '5b35380a-7143-4912-9b55-f322699c6770'::uuid::bytea;
+               bytea                
+------------------------------------
+ \x5b35380a714349129b55f322699c6770
+(1 row)
+
+SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
+                 uuid                 
+--------------------------------------
+ 019a2f85-9ced-7225-b99d-9c55044a2563
+(1 row)
+
+SELECT '\x1234567890abcdef'::bytea::uuid; -- error
+ERROR:  invalid uuid length
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 465153a0341..63520d0b640 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -146,6 +146,10 @@ SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
+-- casts
+SELECT '5b35380a-7143-4912-9b55-f322699c6770'::uuid::bytea;
+SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
+SELECT '\x1234567890abcdef'::bytea::uuid; -- error
 
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.51.2

Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

От

Aleksander Alekseev

Дата:

29 октября 2025 г., 14:31:29

Hi,

Thanks for the patch.

> That does seem like a better fit. It's used mainly in recv functions,
> which this basically is (but user-callable).
>
> Updated patch attaced.

Perhaps bytea_uuid() should check the UUID format. Consider the
following example:

SELECT uuid_extract_version('\x019a2f859cedffffb99d9c55044a2563'::bytea::uuid);
 uuid_extract_version
----------------------
                   15

There is no UUID version 15 according to RFC 9562, and the
documentation for uuid_extract_version() says:

"""
Extracts the version from a UUID of the variant described by RFC 9562.
For other variants, this function returns null. For example, for a
UUID generated by gen_random_uuid, this function will return 4.
"""

If I read this correctly, either bytea_uuid() should reject this, or
uuid_extract_version() should be modified to return NULL, or the
documentation for uuid_extract_version() should be altered.

-- 
Best regards,
Aleksander Alekseev

Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

От

Aleksander Alekseev

Дата:

29 октября 2025 г., 14:57:59

Hi,

> > Updated patch attaced.
>
> Perhaps bytea_uuid() should check the UUID format. Consider the
> following example:
>
> SELECT uuid_extract_version('\x019a2f859cedffffb99d9c55044a2563'::bytea::uuid);

On the flip side we allow this:

=# select '019a2f85-9ced-ffff-b99d-9c55044a2563' :: uuid;
                 uuid
--------------------------------------
 019a2f85-9ced-ffff-b99d-9c55044a2563

So I guess the patch is fine.

-- 
Best regards,
Aleksander Alekseev

Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

От

Dagfinn Ilmari Mannsåker

Дата:

29 октября 2025 г., 15:04:24

Aleksander Alekseev <aleksander@tigerdata.com> writes:

> Hi,
>
> Thanks for the patch.
>
>> That does seem like a better fit. It's used mainly in recv functions,
>> which this basically is (but user-callable).
>>
>> Updated patch attaced.
>
> Perhaps bytea_uuid() should check the UUID format. Consider the
> following example:
>
> SELECT uuid_extract_version('\x019a2f859cedffffb99d9c55044a2563'::bytea::uuid);
>  uuid_extract_version
> ----------------------
>                    15

The UUID input function doesn't do any such validation, so I don't see
why the cast should behave any differently:

# select '019a2f859cedffffb99d9c55044a2563'::uuid;
┌──────────────────────────────────────┐
│                 uuid                 │
├──────────────────────────────────────┤
│ 019a2f85-9ced-ffff-b99d-9c55044a2563 │
└──────────────────────────────────────┘

# select uuid_extract_version('019a2f859cedffffb99d9c55044a2563'::uuid);
┌──────────────────────┐
│ uuid_extract_version │
├──────────────────────┤
│                   15 │
└──────────────────────┘
(1 row)

> There is no UUID version 15 according to RFC 9562, and the
> documentation for uuid_extract_version() says:

There's no version 15 specified _yet_.

> """
> Extracts the version from a UUID of the variant described by RFC 9562.
> For other variants, this function returns null. For example, for a
> UUID generated by gen_random_uuid, this function will return 4.
> """
>
> If I read this correctly, either bytea_uuid() should reject this, or
> uuid_extract_version() should be modified to return NULL, or the
> documentation for uuid_extract_version() should be altered.

Future RFCs could define new versions of this variant, which we should
not reject or pretend don't have a version , just because we haven't
heard of them yet.  In fact, RFC 9562 defines two new versions, v7 and
v8, which by this argument PostgreSQL versions before 18 should reject.

- ilmari

Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

От

Andrey Borodin

Дата:

29 октября 2025 г., 15:19:08

> On 28 Oct 2025, at 22:44, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Andrey has shared his patch for base32hex support before[1]. While it
> needs to be updated, it seems to implement sufficient function.

I'd propose something like attached patch. It's on top of Ilmari's v2 patch with small suggestions as a step 2.

Thanks!

Best regards, Andrey Borodin.

Вложения

Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

От

Masahiko Sawada

Дата:

30 октября 2025 г., 22:10:42

On Wed, Oct 29, 2025 at 5:19 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 28 Oct 2025, at 22:44, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Andrey has shared his patch for base32hex support before[1]. While it
> > needs to be updated, it seems to implement sufficient function.
>
> I'd propose something like attached patch. It's on top of Ilmari's v2 patch with small suggestions as a step 2.
>

I've reviewed the v3 patches, and here are some review comments.

v3-0001 and v3-0002:

-                               errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
-                               errmsg("invalid uuid length"));
+                               (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+                                errmsg("invalid length for UUID"),
+                                errdetail("Expected %d bytes, got
%d.", UUID_LEN, len)));

How about the error message like "invalid input length for type uuid"?
I think "uuid" should be lower case as it indicates PostgreSQL uuid
data type, and it's better to use %s format instead of directly
writing "uuid" (see string_to_uuid() for example).

As for the errdetail message, should we add "bytea" also after "got %d"?

---
+-- casts
+SELECT '5b35380a-7143-4912-9b55-f322699c6770'::uuid::bytea;
+SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
+SELECT '\x1234567890abcdef'::bytea::uuid; -- error

We already have tests for casting bytes to integer data types in
strings.sql. I suggest moving the casting tests from bytea to uuid
into therel. For the uuid.sql file, we could add a test to verify that
a UUID value remains unchanged when it's cast to bytea and back to
UUID. For example,

SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;

---
I think we should update the documentation in the uuid section about
casting data between bytea and uuid. For references, we have a similar
description for bytea and integer[1].

v3-0003:

base32hex_encode() doesn't seem to add '=' paddings, but is it
intentional? I don't see any description in RFC 4648 that we can omit
'=' paddings.

---
I think the patch should add tests not only for uuid data type but
also for general cases like other encodings.

---
In uuid.sql tests, how about adding some tests to check if base32hex
maintains the sortability of UUIDv7 data?

---
I would suggest registering the patches to the next commit fest if not yet.

Regards,

[1] https://www.postgresql.org/docs/devel/functions-binarystring.html

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

От

Илья Чердаков

Дата:

20 ноября 2025 г., 08:46:43

31.10.2025 2:10, Masahiko Sawada wrote:

---
I think we should update the documentation in the uuid section about
casting data between bytea and uuid. For references, we have a similar
description for bytea and integer[1].

Greetings!
I briefly tested the patched version of v3. The implemented
functionality works correctly.

---
You can also add a case with the error from v3-0002
"invalid base32hex end sequence" to the tests :

+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid base32hex end sequence"),
+ errhint("Input data has non-zero padding bits.")));

---
I agree with Masahiko Sawada; information about conversions
should be added to the documentation.

--
Best regards,
Ilya Cherdakov, PostgresPro

Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

От

Aleksander Alekseev

Дата:

09 февраля, 15:19:07

Hi,

> I briefly tested the patched version of v3. The implemented
> functionality works correctly.
>
> ---
> You can also add a case with the error from v3-0002
> "invalid base32hex end sequence" to the tests :
>
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("invalid base32hex end sequence"),
> +                 errhint("Input data has non-zero padding bits.")));
>
> ---
> I agree with Masahiko Sawada; information about conversions
> should be added to the documentation.

Here is the rebased patch.

v4-0001 implements uuid <-> bytea casting
v4-0002 implements base32 encoding/decoding

Unless I missed something, 0001 is ready to be merged.

I only rebased v3 and improved the commit messages, but I didn't
account for Masahiko Sawada's feedback for 0002. Andrey, are you still
working on this or others can pick it up?

The patch is not on the commitfest, so I'm about to add it.

-- 
Best regards,
Aleksander Alekseev

Вложения

Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

От

Aleksander Alekseev

Дата:

18 февраля, 17:57:55

Hi,

> I only rebased v3 and improved the commit messages, but I didn't
> account for Masahiko Sawada's feedback for 0002. Andrey, are you still
> working on this or others can pick it up?
>
> The patch is not on the commitfest, so I'm about to add it.

Here is patch v5 where I accounted for the previous feedback from
Masahiko Sawada and also made some other changes, see below.

> How about the error message like "invalid input length for type uuid"?
> I think "uuid" should be lower case as it indicates PostgreSQL uuid
> data type, and it's better to use %s format instead of directly
> writing "uuid" (see string_to_uuid() for example).

Makes sense. Fixed.

> As for the errdetail message, should we add "bytea" also after "got %d"?

You probably meant "got %d bytes", not "got %d bytea". I believe the
current message is fine, but maybe native speakers will correct us.

> We already have tests for casting bytes to integer data types in
> strings.sql. I suggest moving the casting tests from bytea to uuid
> into therel.

I disagree on the grounds that there are zero tests related to UUID in
strings.sql; uuid.sql is a more appropriate place for these tests IMO.
However if someone seconds the idea we can easily move the tests at
any time.

> For the uuid.sql file, we could add a test to verify that
> a UUID value remains unchanged when it's cast to bytea and back to
> UUID. For example,
>
> SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;

Good point. Added.

> base32hex_encode() doesn't seem to add '=' paddings, but is it
> intentional? I don't see any description in RFC 4648 that we can omit
> '=' paddings.

You are right, both base32 and base32hex should add paddings;
substring() can be used if necessary. Fixed.

> I think the patch should add tests not only for uuid data type but
> also for general cases like other encodings.

Yes, and the good place for these tests would be closer to other tests
for encode() and decode() i.e. strings.sql. Fixed.

While working on it I noticed some inconsistencies between base32hex
implementation and our current implementation of base64. As an
example, we don't allow `=` input:

```
=# SELECT decode('=', 'base64');
ERROR:  unexpected "=" while decoding base64 sequence
```

... while base32hex did. I fixed such inconsistencies too.

> In uuid.sql tests, how about adding some tests to check if base32hex
> maintains the sortability of UUIDv7 data?

Agree. Added.

> I think we should update the documentation in the uuid section about
> casting data between bytea and uuid. For references, we have a similar
> description for bytea and integer[1].

Fair point. Fixed.

-- 
Best regards,
Aleksander Alekseev

Вложения

Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

От

Masahiko Sawada

Дата:

14 марта, 07:10:31

On Wed, Feb 18, 2026 at 6:58 AM Aleksander Alekseev
<aleksander@tigerdata.com> wrote:
>
> Hi,
>
> > I only rebased v3 and improved the commit messages, but I didn't
> > account for Masahiko Sawada's feedback for 0002. Andrey, are you still
> > working on this or others can pick it up?
> >
> > The patch is not on the commitfest, so I'm about to add it.
>
> Here is patch v5 where I accounted for the previous feedback from
> Masahiko Sawada and also made some other changes, see below.
>
> > How about the error message like "invalid input length for type uuid"?
> > I think "uuid" should be lower case as it indicates PostgreSQL uuid
> > data type, and it's better to use %s format instead of directly
> > writing "uuid" (see string_to_uuid() for example).
>
> Makes sense. Fixed.
>
> > As for the errdetail message, should we add "bytea" also after "got %d"?
>
> You probably meant "got %d bytes", not "got %d bytea". I believe the
> current message is fine, but maybe native speakers will correct us.
>
> > We already have tests for casting bytes to integer data types in
> > strings.sql. I suggest moving the casting tests from bytea to uuid
> > into therel.
>
> I disagree on the grounds that there are zero tests related to UUID in
> strings.sql; uuid.sql is a more appropriate place for these tests IMO.
> However if someone seconds the idea we can easily move the tests at
> any time.
>
> > For the uuid.sql file, we could add a test to verify that
> > a UUID value remains unchanged when it's cast to bytea and back to
> > UUID. For example,
> >
> > SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
>
> Good point. Added.
>
> > base32hex_encode() doesn't seem to add '=' paddings, but is it
> > intentional? I don't see any description in RFC 4648 that we can omit
> > '=' paddings.
>
> You are right, both base32 and base32hex should add paddings;
> substring() can be used if necessary. Fixed.
>
> > I think the patch should add tests not only for uuid data type but
> > also for general cases like other encodings.
>
> Yes, and the good place for these tests would be closer to other tests
> for encode() and decode() i.e. strings.sql. Fixed.
>
> While working on it I noticed some inconsistencies between base32hex
> implementation and our current implementation of base64. As an
> example, we don't allow `=` input:
>
> ```
> =# SELECT decode('=', 'base64');
> ERROR:  unexpected "=" while decoding base64 sequence
> ```
>
> ... while base32hex did. I fixed such inconsistencies too.
>
> > In uuid.sql tests, how about adding some tests to check if base32hex
> > maintains the sortability of UUIDv7 data?
>
> Agree. Added.
>
> > I think we should update the documentation in the uuid section about
> > casting data between bytea and uuid. For references, we have a similar
> > description for bytea and integer[1].
>
> Fair point. Fixed.
>

Thank you for updating the patch!

I've reviewed both patches and have some comments.

* 0001 patch

+       ereport(ERROR,
+               (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+                errmsg("invalid input length for type %s", "uuid"),
+                errdetail("Expected %d bytes, got %d.", UUID_LEN, len)));

I think we need to handle plural and singular forms depending on the
value. Or we can change it to "Expected size %d, got %d".

* 0002 patch:

                errhint("Valid encodings are \"%s\", \"%s\", \"%s\",
and \"%s\".",
                        "base64", "base64url", "escape", "hex")));

We need to add 'base32hex' here.

---
+                   ereport(ERROR,
+                           (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                            errmsg("unexpected \"=\" while decoding
%s sequence",
+                                   "base32hex")));

I think we can directly write 'base32hex' in the error message.

---
+   /* Verify no extra bits remain (padding bits should be zero) */
+   if (bits_in_buffer > 0 && (bits_buffer & ((1ULL << bits_in_buffer)
- 1)) != 0)
+       ereport(ERROR,
+               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                errmsg("invalid base32hex end sequence"),
+                errhint("Input data has non-zero padding bits.")));

This code checks if the remaining bits of the input data are all zero.
IIUC we don't have a similar check for base64 and base64url. For
instance, the following input data is accepted:

=# select decode('AB', 'base64');
 decode
--------
 \x00
(1 row)

I think it's better to have consistent behavior across our encoding.

I've attached a patch for the 0002 patch part that fixes the above
points (except for the last point) and has some minor fixes as well.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

fix_0002_masahiko.patch

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

Вложения

Вложения

Вложения

Вложения