Обсуждение: [PATCH] Add get_bytes() and set_bytes() functions

Поиск

Список

Период

Сортировка

[PATCH] Add get_bytes() and set_bytes() functions

От

Aleksander Alekseev

Дата:

14 августа 2024 г., 11:01:22

Hi,

While discussing another patch [1] it was discovered that we don't
have a convenient way of casting a bytea to an integer / bigint and
vice versa, extracting integers larger than one byte from byteas, etc.

For instance, casting '\x11223344' to 0x11223344 may look like this:

```
WITH vals AS (
      SELECT '\x11223344'::bytea AS x
)
SELECT
    (get_byte(x, 0) :: bigint << 24) |
    (get_byte(x, 1) << 16) |
    (get_byte(x, 2) << 8) |
    get_byte(x, 3)
FROM vals;
```

There seems to be a demand for this functionality [2][3] and it costs
us nothing to maintain it, so I propose adding it.

The proposed patch adds get_bytes() and set_bytes() functions. The
semantics is similar to get_byte() and set_byte() we already have but
the functions operate with bigints rather than bytes and the user can
specify the size of the integer. This allows working with int2s,
int4s, int8s or even int5s if needed.

Examples:

```
SELECT get_bytes('\x1122334455667788'::bytea, 1, 2) = 0x2233;
 ?column?
----------
 t

SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
     set_bytes
--------------------
 \x11aabb4455667788
```

Thoughts?

[1]: https://postgr.es/m/CAJ7c6TNMTGnqnG%3DyXXUQh9E88JDckmR45H2Q%2B%3DucaCLMOW1QQw%40mail.gmail.com
[2]: https://stackoverflow.com/questions/32944267/postgresql-converting-bytea-to-bigint
[3]: https://postgr.es/m/AANLkTikip9xs8iXc8e%2BMgz1T1701i8Xk6QtbVB3KJQzX%40mail.gmail.com
-- 
Best regards,
Aleksander Alekseev

Вложения

v1-0001-Add-get_bytes-and-set_bytes-functions.patch

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

"Joel Jacobson"

Дата:

14 августа 2024 г., 11:21:49

On Wed, Aug 14, 2024, at 13:01, Aleksander Alekseev wrote:
> The proposed patch adds get_bytes() and set_bytes() functions. The
> semantics is similar to get_byte() and set_byte() we already have but
> the functions operate with bigints rather than bytes and the user can
> specify the size of the integer. This allows working with int2s,
> int4s, int8s or even int5s if needed.

+1

I wanted this myself many times.

I wonder if get_bytes() and set_bytes() will behave differently
on little-endian vs big-endian systems?

If so, then I think it would be nice to enforce a consistent byte order
(e.g., big-endian), to ensure consistent behavior across platforms.

Regards,
Joel

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

Aleksander Alekseev

Дата:

14 августа 2024 г., 11:31:31

Hi,

> +1
>
> I wanted this myself many times.
>
> I wonder if get_bytes() and set_bytes() will behave differently
> on little-endian vs big-endian systems?
>
> If so, then I think it would be nice to enforce a consistent byte order
> (e.g., big-endian), to ensure consistent behavior across platforms.

No, the returned value will not depend on the CPU endiness. Current
implementation uses big-endian / network order which in my humble
opinion is what most users would expect.

I believe we also need reverse(bytea) and repeat(bytea, integer)
functions e.g. for those who want little-endian. However I want to
propose them separately when we are done with this patch.

-- 
Best regards,
Aleksander Alekseev

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

"Joel Jacobson"

Дата:

14 августа 2024 г., 15:34:06

On Wed, Aug 14, 2024, at 13:31, Aleksander Alekseev wrote:
>> I wonder if get_bytes() and set_bytes() will behave differently
>> on little-endian vs big-endian systems?
> No, the returned value will not depend on the CPU endiness. Current
> implementation uses big-endian / network order which in my humble
> opinion is what most users would expect.

Nice.

I've reviewed and tested the patch.
It looks straight-forward to me.
I don't see any potential problems.
I've marked it Ready for Committer.

> I believe we also need reverse(bytea) and repeat(bytea, integer)
> functions e.g. for those who want little-endian. However I want to
> propose them separately when we are done with this patch.

I agree those functions would be nice too.

I also think it would be nice to provide these convenience functions:
to_bytes(bigint) -> bytea
from_bytes(bytea) -> bigint

Since if not having a current bytea value,
and just wanting to convert a bigint to bytea,
then one would need to construct an zeroed bytea
of the proper size first, to then use set_bytes().

And if just wanting to convert the entire bytea to a bigint,
then one would need to pass 0 as offset and the length
of the bytea as size.

Regards,

Joel

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

David Fetter

Дата:

14 августа 2024 г., 17:43:51

On Wed, Aug 14, 2024 at 02:34:06PM +0200, Joel Jacobson wrote:
> On Wed, Aug 14, 2024, at 13:31, Aleksander Alekseev wrote:
> >> I wonder if get_bytes() and set_bytes() will behave differently
> >> on little-endian vs big-endian systems?
> > No, the returned value will not depend on the CPU endiness. Current
> > implementation uses big-endian / network order which in my humble
> > opinion is what most users would expect.
> 
> Nice.

Indeed!

> I've reviewed and tested the patch.
> It looks straight-forward to me.
> I don't see any potential problems.
> I've marked it Ready for Committer.
> 
> > I believe we also need reverse(bytea) and repeat(bytea, integer)
> > functions e.g. for those who want little-endian. However I want to
> > propose them separately when we are done with this patch.
> 
> I agree those functions would be nice too.
> 
> I also think it would be nice to provide these convenience functions:
> to_bytes(bigint) -> bytea
> from_bytes(bytea) -> bigint

Along with these, would it make sense to have other forms of these
that won't choke at 63 bits, e.g. NUMERIC or TEXT?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

"Joel Jacobson"

Дата:

14 августа 2024 г., 18:39:32

On Wed, Aug 14, 2024, at 16:43, David Fetter wrote:
>> I also think it would be nice to provide these convenience functions:
>> to_bytes(bigint) -> bytea
>> from_bytes(bytea) -> bigint
>
> Along with these, would it make sense to have other forms of these
> that won't choke at 63 bits, e.g. NUMERIC or TEXT?

I wonder what would be good names for such functions though?

Since NUMERIC can have decimal digits, and since BYTEA can already be casted to
TEXT, which will just be \x followed by the hex digits, maybe such names should
include "integer" or some other word, to indicate what is being returned?

It's already quite easy to convert to NUMERIC though,
for users who are aware of tricks like this:
SELECT ('0x'||encode('\xCAFEBABEDEADBEEF'::bytea,'hex'))::numeric;
       numeric
----------------------
 14627333968688430831
(1 row)

But, I think it would be better to provide functions,
since many users probably have to google+stackoverflow or gpt
to learn such tricks, which are not in the official documentation.

Regards,
Joel

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

David Fetter

Дата:

14 августа 2024 г., 19:31:44

On Wed, Aug 14, 2024 at 05:39:32PM +0200, Joel Jacobson wrote:
> On Wed, Aug 14, 2024, at 16:43, David Fetter wrote:
> >> I also think it would be nice to provide these convenience functions:
> >> to_bytes(bigint) -> bytea
> >> from_bytes(bytea) -> bigint
> >
> > Along with these, would it make sense to have other forms of these
> > that won't choke at 63 bits, e.g. NUMERIC or TEXT?
> 
> I wonder what would be good names for such functions though?

decimal_to_bytes(numeric), decimal_to_bytes(text) on one side
decimal_from_bytes(bytea, typeoid)

Naming Things™ is one of the hard problems in computer science. Bad
joke that includes cache coherency and off-by-one included by
reference.

> Since NUMERIC can have decimal digits, and since BYTEA can already be casted to
> TEXT, which will just be \x followed by the hex digits, maybe such names should
> include "integer" or some other word, to indicate what is being returned?
> 
> It's already quite easy to convert to NUMERIC though,
> for users who are aware of tricks like this:
> SELECT ('0x'||encode('\xCAFEBABEDEADBEEF'::bytea,'hex'))::numeric;
>        numeric
> ----------------------
>  14627333968688430831
> (1 row)
> 
> But, I think it would be better to provide functions,
> since many users probably have to google+stackoverflow or gpt
> to learn such tricks, which are not in the official documentation.

As usual, I see "official documentation lacks helpful and/or
non-obvious examples" as a problem best approached by making good the
lack. I am aware that my ideas about pedagogy, documentation, etc. are
not shared universally, but they're widely shared by people whose main
interaction with documents is trying to get help from them.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

"Joel Jacobson"

Дата:

14 августа 2024 г., 20:25:51

On Wed, Aug 14, 2024, at 18:31, David Fetter wrote:
> On Wed, Aug 14, 2024 at 05:39:32PM +0200, Joel Jacobson wrote:
>> On Wed, Aug 14, 2024, at 16:43, David Fetter wrote:
>> >> I also think it would be nice to provide these convenience functions:
>> >> to_bytes(bigint) -> bytea
>> >> from_bytes(bytea) -> bigint
>> >
>> > Along with these, would it make sense to have other forms of these
>> > that won't choke at 63 bits, e.g. NUMERIC or TEXT?
>>
>> I wonder what would be good names for such functions though?
>
> decimal_to_bytes(numeric), decimal_to_bytes(text) on one side
> decimal_from_bytes(bytea, typeoid)

I assume decimal_to_bytes() will only accept integer numerics,
that is, that don't have a decimal digits part?
Hmm, it's perhaps then a bit counter intuitive that the name
contains "decimal", since some people might associate the word
"decimal" stronger with "decimal digits" rather than the radix/base 10.

What do we want to happen if passing a numeric with decimal digits,
to decimal_to_bytes()? It must be an error, right?

Example: SELECT decimal_to_bytes(1.23);

> Naming Things™ is one of the hard problems in computer science. Bad
> joke that includes cache coherency and off-by-one included by
> reference.

So true :)

> As usual, I see "official documentation lacks helpful and/or
> non-obvious examples" as a problem best approached by making good the
> lack. I am aware that my ideas about pedagogy, documentation, etc. are
> not shared universally, but they're widely shared by people whose main
> interaction with documents is trying to get help from them.

Well spoken.

Regards,
Joel

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

"Joel Jacobson"

Дата:

15 августа 2024 г., 07:19:36

On Wed, Aug 14, 2024, at 19:25, Joel Jacobson wrote:
> What do we want to happen if passing a numeric with decimal digits,
> to decimal_to_bytes()? It must be an error, right?
>
> Example: SELECT decimal_to_bytes(1.23);

Hmm, an error feels quite ugly on second thought.
Would be nicer if all numerics could be represented,
in a bytea representation that is meaningful to other systems.

I think we need a tuple somehow.

Example:

SELECT numeric_to_bytes(223195403574957);
(\xcafebabedeadbeef,0,false)

SELECT numeric_to_bytes(-223195403574957);
(\xcafebabedeadbeef,0,true)

SELECT numeric_to_bytes(2231954035749.57);
(\xcafebabedeadbeef,2,false)

SELECT numeric_from_bytes('\xcafebabedeadbeef'::bytea,0,false);
223195403574957

SELECT numeric_from_bytes('\xcafebabedeadbeef'::bytea,0,true);
-223195403574957

SELECT numeric_from_bytes('\xcafebabedeadbeef'::bytea,2,false);
2231954035749.57

But then what about Inf,-Inf,NaN?

Regards,
Joel

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

Dean Rasheed

Дата:

15 августа 2024 г., 13:14:55

On Thu, 15 Aug 2024 at 05:20, Joel Jacobson <joel@compiler.org> wrote:
>
> On Wed, Aug 14, 2024, at 19:25, Joel Jacobson wrote:
> > What do we want to happen if passing a numeric with decimal digits,
> > to decimal_to_bytes()? It must be an error, right?
> >
> > Example: SELECT decimal_to_bytes(1.23);
>
> Hmm, an error feels quite ugly on second thought.
> Would be nicer if all numerics could be represented,
>
> But then what about Inf,-Inf,NaN?
>

Perhaps we should also add casts between bytea and the integer/numeric
types. That might be easier to use than functions in some
circumstances.

When casting a numeric to an integer, the result is rounded to the
nearest integer, and NaN/Inf generate errors, so we should probably do
the same here.

Regards,
Dean

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

Aleksander Alekseev

Дата:

15 августа 2024 г., 13:58:03

Hi,

> Perhaps we should also add casts between bytea and the integer/numeric
> types. That might be easier to use than functions in some
> circumstances.
>
> When casting a numeric to an integer, the result is rounded to the
> nearest integer, and NaN/Inf generate errors, so we should probably do
> the same here.

Yes, I was also thinking about adding NUMERIC versions of get_bytes()
/ set_bytes(). This would allow converting more than 8 bytes to/from
an integer. I dropped this idea because I thought there would be not
much practical use for it. On the flip side you never know who uses
Postgres and for what purpose.

I will add corresponding casts unless the idea will get a push-back
from the community. IMO the existence of these casts will at least not
make things worse.

-- 
Best regards,
Aleksander Alekseev

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

Yugo Nagata

Дата:

16 августа 2024 г., 10:49:36

On Thu, 15 Aug 2024 13:58:03 +0300
Aleksander Alekseev <aleksander@timescale.com> wrote:

> Hi,
> 
> > Perhaps we should also add casts between bytea and the integer/numeric
> > types. That might be easier to use than functions in some
> > circumstances.
> >
> > When casting a numeric to an integer, the result is rounded to the
> > nearest integer, and NaN/Inf generate errors, so we should probably do
> > the same here.
> 
> Yes, I was also thinking about adding NUMERIC versions of get_bytes()
> / set_bytes(). This would allow converting more than 8 bytes to/from
> an integer. I dropped this idea because I thought there would be not
> much practical use for it. On the flip side you never know who uses
> Postgres and for what purpose.
> 
> I will add corresponding casts unless the idea will get a push-back
> from the community. IMO the existence of these casts will at least not
> make things worse.

When we add such casts between bytea and the integer/numeric types,
one of the problems mentioned the first of the thread, that is, 
"we don't have a convenient way of casting a bytea to an integer / bigint
and vice versa", would seem be resolved. 

On the other hand, I suppose get_bytes() and set_bytes() are still useful
for extracting bytes from byteas, etc. If casting is no longer the main
purpose of these functions, are variations that get_bytes returns bytea
instead of bigint, and set_bytes receives bytea as the newvalue argument
useful? I wonder it would eliminate the restrict that size cannot be larger
than 8.

Here are my very trivial comments on the patch.

+ * this routine treats "bytea" as an array of bytes.

Maybe, the sentence should start with "This ... ".

+    while(size)
+    {

I wonder inserting a space after "while" is the standard style.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

Peter Eisentraut

Дата:

16 августа 2024 г., 13:11:18

On 14.08.24 13:01, Aleksander Alekseev wrote:
> The proposed patch adds get_bytes() and set_bytes() functions. The
> semantics is similar to get_byte() and set_byte() we already have but
> the functions operate with bigints rather than bytes and the user can
> specify the size of the integer. This allows working with int2s,
> int4s, int8s or even int5s if needed.
> 
> Examples:
> 
> ```
> SELECT get_bytes('\x1122334455667788'::bytea, 1, 2) = 0x2233;
>   ?column?
> ----------
>   t
> 
> SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
>       set_bytes
> --------------------
>   \x11aabb4455667788
> ```

I think these functions do about three things at once, and I don't think 
they address the originally requested purpose very well.

Converting between integers and byte arrays of matching size seems like 
reasonable functionality.  (You can already do one half of that by 
calling int2send(), int4send(), and int8send(), but the other direction 
(intXrecv()) is not user-callable).

The other things are extracting that byte array from a larger byte array 
and sticking it back into a larger byte array; those seem like separate 
operations.  There is already substr() for bytea for the first part, and 
there might be another string-like operationg for the second part, or 
maybe we could add one.

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

Yugo Nagata

Дата:

16 августа 2024 г., 13:32:24

On Fri, 16 Aug 2024 11:41:37 +0300
Aleksander Alekseev <aleksander@timescale.com> wrote:

> Hi,
> 
> > When we add such casts between bytea and the integer/numeric types,
> > one of the problems mentioned the first of the thread, that is,
> > "we don't have a convenient way of casting a bytea to an integer / bigint
> > and vice versa", would seem be resolved.
> >
> > On the other hand, I suppose get_bytes() and set_bytes() are still useful
> > for extracting bytes from byteas, etc. If casting is no longer the main
> > purpose of these functions, are variations that get_bytes returns bytea
> > instead of bigint, and set_bytes receives bytea as the newvalue argument
> > useful? I wonder it would eliminate the restrict that size cannot be larger
> > than 8.
> 
> No, casting between bytea and numeric will not replace get_bytes() /
> set_bytes() for performance reasons.
> 
> Consider the case when you want to extract an int4 from a bytea.
> get_bytes() is going to be very fast while substr() -> ::numeric ->
> ::integer chain will require unnecessary copying and conversions.
> Casting between bytea and numeric is only useful when one has to deal
> with integers larger than 8 bytes. Whether this happens often is a
> debatable question.

Thank you for explanation. I understood the performance drawback.

I supposed interfaces similar to lo_get, lo_put, loread, lowrite of
large objects since they might be useful to access or modify a part of
bytea like a binary file read by pg_read_binary_file. 

> 
> > Here are my very trivial comments on the patch.
> >
> > + * this routine treats "bytea" as an array of bytes.
> >
> > Maybe, the sentence should start with "This ... ".
> >
> > +       while(size)
> > +       {
> >
> > I wonder inserting a space after "while" is the standard style.
> 
> Thanks, fixed.

Should we fix the comment on byteaGetByte in passing, too?

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

Aleksander Alekseev

Дата:

19 августа 2024 г., 17:10:15

Hi Peter,

Thanks for the feedback.

> I think these functions do about three things at once, and I don't think
> they address the originally requested purpose very well.

The amount of things that the function does is a matter of
interpretation. I can claim that it does one thing ("extracts an
integer from a bytea"), or as many things as there are lines of code.
IMO the actual question is whether this is a good user interface or
not. Since we already have get_byte() / set_byte() the interface is
arguably OK.

> Converting between integers and byte arrays of matching size seems like
> reasonable functionality.  (You can already do one half of that by
> calling int2send(), int4send(), and int8send(), but the other direction
> (intXrecv()) is not user-callable).
>
> The other things are extracting that byte array from a larger byte array
> and sticking it back into a larger byte array; those seem like separate
> operations.  There is already substr() for bytea for the first part, and
> there might be another string-like operationg for the second part, or
> maybe we could add one.

If I understand correctly, you propose doing (1):

```
SELECT substr('\x1122334455667788'::bytea, 2, 2) :: int2;
```

... instead of:

```
SELECT get_bytes('\x1122334455667788'::bytea, 1, 2)
```

... and (2):

```
WITH vals AS (
    SELECT '\x1122334455667788'::bytea AS x
) SELECT substr(x, 1, 1) || int2send(1234::int2) || substr(x, 4, 5) FROM vals;
```

... instead of:

```
SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
```

There is nothing to do for query (2), it already works. It's not much
better than the query from my first email though.

To clarify, supporting bytea<->integer (and/or bytea<->numeric) casts
doesn't strike me as a terrible idea but it doesn't address the issue
I'm proposing to solve.

-- 
Best regards,
Aleksander Alekseev

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

Peter Eisentraut

Дата:

21 августа 2024 г., 19:39:21

On 19.08.24 16:10, Aleksander Alekseev wrote:
> To clarify, supporting bytea<->integer (and/or bytea<->numeric) casts
> doesn't strike me as a terrible idea but it doesn't address the issue
> I'm proposing to solve.

What is the issue you are proposing to solve?

You linked to a couple of threads and stackoverflow pages, and you 
concluded from that that we should add get_bytes() and set_bytes() 
functions.  It's not obvious how you get from the former to the latter, 
and I don't think the functions you propose are well-designed in isolation.

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

Dean Rasheed

Дата:

13 января, 21:24:37

On Mon, 13 Jan 2025 at 17:36, Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Besides fixing opr_sanity test I corrected error messages:
>
> -ERROR:  bytea size 3 out of valid range, 0..2
> +ERROR:  bytea out of valid range, ''..'\xFFFF'
>

"smallint out of range", "integer out of range" and "bigint out of
range" would be more consistent with existing error messages.

Regards,
Dean

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

Alvaro Herrera

Дата:

13 января, 22:23:33

On 2025-Jan-13, Dean Rasheed wrote:

> On Mon, 13 Jan 2025 at 17:36, Aleksander Alekseev
> <aleksander@timescale.com> wrote:
> >
> > Besides fixing opr_sanity test I corrected error messages:
> >
> > -ERROR:  bytea size 3 out of valid range, 0..2
> > +ERROR:  bytea out of valid range, ''..'\xFFFF'
> 
> "smallint out of range", "integer out of range" and "bigint out of
> range" would be more consistent with existing error messages.

But these don't show the acceptable range. We have these that do:

#: utils/adt/varbit.c:1824 utils/adt/varbit.c:1882
#, c-format
msgid "bit index %d out of valid range (0..%d)"

#: utils/adt/varlena.c:3218 utils/adt/varlena.c:3285
#, c-format
msgid "index %d out of valid range, 0..%d"

#: utils/adt/varlena.c:3249 utils/adt/varlena.c:3321
#, c-format
msgid "index %lld out of valid range, 0..%lld"

#: utils/misc/guc.c:3130
#, c-format
msgid "%d%s%s is outside the valid range for parameter \"%s\" (%d .. %d)"


The quoting in Aleksander's proposal is not great.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Update: super-fast reaction on the Postgres bugs mailing list. The report
was acknowledged [...], and a fix is under discussion.
The wonders of open-source !"
             https://twitter.com/gunnarmorling/status/1596080409259003906

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

Dean Rasheed

Дата:

14 января, 00:53:22

On Mon, 13 Jan 2025 at 19:23, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> But these don't show the acceptable range. We have these that do:
>
> #: utils/adt/varbit.c:1824 utils/adt/varbit.c:1882
> #, c-format
> msgid "bit index %d out of valid range (0..%d)"
>
> #: utils/adt/varlena.c:3218 utils/adt/varlena.c:3285
> #, c-format
> msgid "index %d out of valid range, 0..%d"
>
> #: utils/adt/varlena.c:3249 utils/adt/varlena.c:3321
> #, c-format
> msgid "index %lld out of valid range, 0..%lld"
>
> #: utils/misc/guc.c:3130
> #, c-format
> msgid "%d%s%s is outside the valid range for parameter \"%s\" (%d .. %d)"
>

Those are all instances of a value that's outside a specific range
that you might not otherwise know, rather than being out of range of
the type itself. For that, we generally don't say what the range of
the type is. For example, we currently do:

select repeat('1', 50)::bit(50)::int;
ERROR:  integer out of range

Regards,
Dean

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

Dean Rasheed

Дата:

17 января, 19:29:07

On Tue, 14 Jan 2025 at 13:25, Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Thanks. I agree that the proposed error messages look nicer than the
> one I used in v6. Here is the corrected patch.
>

This should use ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE, rather than
ERRCODE_INVALID_PARAMETER_VALUE, for consistency with other similar
errors.

The bytea -> int[248] cast functions should not be marked as leakproof
-- see the docs on the CREATE FUNCTION page: functions that raise
errors for some input values but not others, are not leakproof. This
is why, for example, the int -> bigint cast is leakproof, but the
bigint -> int cast is not.

Functions working with int8 values should normally go in
utils/adt/int8.c, not utils/adt/int.c. However, I think that
utils/adt/varlena.c would be a better place for all these functions,
because they have more to do with bytea than integer types, and this
would allow them to be kept together, similar to how all the bit <->
integer cast functions are in utils/adt/varbit.c.

There's no documentation for these new casts. The obvious place to put
it would be in section 9.5 "Binary String Functions and Operators",
which would be consistent with the idea that these are being regarded
primarily as bytea operations, rather than integer operations (just as
the bit <-> integer casts are documented in 9.6 "Bit String Functions
and Operators").

Regards,
Dean

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

Peter Eisentraut

Дата:

22 января, 21:44:52

On 20.01.25 15:01, Aleksander Alekseev wrote:
>  > This should use ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE, rather than
>  > ERRCODE_INVALID_PARAMETER_VALUE, for consistency with other similar
>  > errors.
>  >
>  > The bytea -> int[248] cast functions should not be marked as leakproof
>  > -- see the docs on the CREATE FUNCTION page: functions that raise
>  > errors for some input values but not others, are not leakproof. This
>  > is why, for example, the int -> bigint cast is leakproof, but the
>  > bigint -> int cast is not.
>  >
>  > Functions working with int8 values should normally go in
>  > utils/adt/int8.c, not utils/adt/int.c. However, I think that
>  > utils/adt/varlena.c would be a better place for all these functions,
>  > because they have more to do with bytea than integer types, and this
>  > would allow them to be kept together, similar to how all the bit <->
>  > integer cast functions are in utils/adt/varbit.c.
>  >
>  > There's no documentation for these new casts. The obvious place to put
>  > it would be in section 9.5 "Binary String Functions and Operators",
>  > which would be consistent with the idea that these are being regarded
>  > primarily as bytea operations, rather than integer operations (just as
>  > the bit <-> integer casts are documented in 9.6 "Bit String Functions
>  > and Operators").
> 
> Many thanks for your great feedback. Here is the corrected patch.

These casts appear to use a particular endianness, but they don't 
document which one, and there is no explanation anywhere why that one is 
the right one.

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

Aleksander Alekseev

Дата:

23 января, 17:30:19

Hi Peter,

> These casts appear to use a particular endianness, but they don't
> document which one, and there is no explanation anywhere why that one is
> the right one.

Right, I choose network order / big-endian. I agree that it would make
sense to emphasise this fact in the documentation below the examples.
Unfortunately I have no good reason for this particular design choice
other than "well this is how it is custom to represent numbers in a
consistent manner between different platforms but other than that the
choice was rather arbitrary". Should I just rephrase it a bit and add
to the documentation?

-- 
Best regards,
Aleksander Alekseev

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

Dean Rasheed

Дата:

23 января, 18:46:47

On Thu, 23 Jan 2025 at 14:30, Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi Peter,
>
> > These casts appear to use a particular endianness, but they don't
> > document which one, and there is no explanation anywhere why that one is
> > the right one.
>
> Right, I choose network order / big-endian. I agree that it would make
> sense to emphasise this fact in the documentation below the examples.
> Unfortunately I have no good reason for this particular design choice
> other than "well this is how it is custom to represent numbers in a
> consistent manner between different platforms but other than that the
> choice was rather arbitrary". Should I just rephrase it a bit and add
> to the documentation?
>

IMO big-endian is the most convenient byte-ordering to use here,
because then the string representation of the bytea is consistent with
the hex representation of the integer. It's also consistent with the
integer-to-bit casts, which output the most significant bits first,
starting with the sign bit.

As far as the docs go, it's important to document precisely what
format is used, but I don't think it needs to explain why that choice
was made. It should also mention the size of the result and that it's
the two's complement representation, since there are other possible
representations of integers. So I think it would be sufficient for the
initial paragraph to say something like "Casting an integer to a bytea
produces 2, 4, or 8 bytes, depending on the width of the integer type.
The result is the two's complement representation of the integer, with
the most significant byte first.", and then list the examples to
demonstrate that.

Regards,
Dean

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

Dean Rasheed

Дата:

01 марта, 14:30:47

On Fri, 24 Jan 2025 at 13:00, Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Thank you. Here is the corrected patch.
>

This looks pretty good to me. I have a just a couple of minor comments:

* The loop in bytea_integer() can be written more simply as a "for"
loop. Given that it's only a few lines of code, it might as well just
be coded directly in each cast function, which avoids the need to go
via a 64-bit integer for every case. In addition, it should use the
BITS_PER_BYTE macro rather than "8". Doing that leads to code that's
consistent with bittoint4() and bittoint8().

* In pg_proc.dat, it's not necessary to write "proleakproof => 'f'",
because that's the default, and no other function does that.

* I think it's worth using slightly more non-trivial doc examples,
including positive and negative cases, to make the behaviour more
obvious.

* I also tweaked the regression tests a bit, and copied the existing
test style which displays both the expected and actual results from
each test.

With those updates, I think this is ready for commit, which I'll do in
a day or two, if there are no further comments.

Regards,
Dean

Вложения

v10-0001-Allow-casting-between-bytea-and-integer-types.patch

Re: [PATCH] Add get_bytes() and set_bytes() functions

От

Dean Rasheed

Дата:

07 марта, 13:08:41

On Sat, 1 Mar 2025 at 11:30, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> With those updates, I think this is ready for commit, which I'll do in
> a day or two, if there are no further comments.
>

Committed.

Regards,
Dean

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [PATCH] Add get_bytes() and set_bytes() functions

Вложения

Вложения