Обсуждение: to_hex() for negative inputs

Поиск
Список
Период
Сортировка

to_hex() for negative inputs

От
Dean Rasheed
Дата:
I only recently realised that to_hex() converts its input to unsigned
before converting it to hex (something that's not mentioned in the
docs):

  to_hex(-1) -> ffffffff

I think that's something that some users might find surprising,
especially if they were expecting to be able to use it to output
values that could be read back in, now that we support non-decimal
integer input.

So I think the docs should be a little more explicit about this. I
think the following should suffice:

---
  to_hex ( integer ) -> text
  to_hex ( bigint ) -> text

    Converts the number to its equivalent two's complement hexadecimal
    representation.

    to_hex(1234) -> 4d2
    to_hex(-1234) -> fffffb2e
---

instead of the existing example with 2147483647, which doesn't add much.

I also think it might be useful for it to gain a couple of boolean options:

1). An option to output a signed value (defaulting to false, to
preserve the current two's complement output).

2). An option to output the base prefix "0x" (which comes after the
sign, making it inconvenient for the user to add themselves).

I've also been idly wondering about whether we should have a numeric
variant (for integers only, since non-integers won't necessarily
terminate in hex, aren't accepted as inputs, and don't seem
particularly useful anyway). I don't think two's complement output
makes much sense for numeric, so perhaps it should only have the
prefix option, and always output signed hex strings.

Thoughts?

Regards,
Dean



Re: to_hex() for negative inputs

От
Aleksander Alekseev
Дата:
Hi Dean,

> I only recently realised that to_hex() converts its input to unsigned
> before converting it to hex (something that's not mentioned in the
> docs):

Technically the documentation is accurate [1]:

"""
Converts the number to its equivalent hexadecimal representation.
"""

But I agree that adding an example with negative numbers will not
hurt. Would you like to submit a patch?

> I also think it might be useful for it to gain a couple of boolean options:

Adding extra arguments for something the user can implement
(him/her)self doesn't seem to be a great idea. With this approach we
may end up with hundreds of arguments one day.

> 1). An option to output a signed value (defaulting to false, to
> preserve the current two's complement output).

This in particular can be done like this:

```
=# select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x))
from ( values (123), (-123) ) as s(x);
 ?column?
----------
 7b
 -7b
(2 rows)
```

> 2). An option to output the base prefix "0x" (which comes after the
> sign, making it inconvenient for the user to add themselves).

Ditto:

```
=# select '0x' || to_hex(x) from ( values (123), (-123) ) as s(x);
  ?column?
------------
 0x7b
 0xffffff85
```

[1]: https://www.postgresql.org/docs/current/functions-string.html

-- 
Best regards,
Aleksander Alekseev



Re: to_hex() for negative inputs

От
Dean Rasheed
Дата:
On Tue, 24 Jan 2023 at 13:43, Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Adding extra arguments for something the user can implement
> (him/her)self doesn't seem to be a great idea. With this approach we
> may end up with hundreds of arguments one day.
>

I don't see how a couple of extra arguments will expand to hundreds.

> =# select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x))
> from ( values (123), (-123) ) as s(x);
>

Which is broken for INT_MIN:

select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x))
from ( values (-2147483648) ) as s(x);

ERROR:  integer out of range

Part of the reason for implementing this in core would be to save
users from such easy-to-overlook bugs.

Regards,
Dean



Re: to_hex() for negative inputs

От
Aleksander Alekseev
Дата:
Hi Dean,

> I don't see how a couple of extra arguments will expand to hundreds.

Maybe I was exaggerating, but the point is that adding extra flags for
every possible scenario is a disadvantageous approach in general.
There is no need to increase the code base, the amount of test cases
and the amount of documentation every time someone has an idea "in
rare cases I also may want to do A or B, let's add a flag for this".

> Which is broken for INT_MIN:
>
> select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x))
> from ( values (-2147483648) ) as s(x);
>
> ERROR:  integer out of range

I'm afraid the behavior of something like to_hex(X, with_sign => true)
is going to be exactly the same. There is no safe and consistent way
to calculate abs(INT_MIN).

-- 
Best regards,
Aleksander Alekseev



Re: to_hex() for negative inputs

От
Dean Rasheed
Дата:
On Wed, 25 Jan 2023 at 09:02, Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> > I don't see how a couple of extra arguments will expand to hundreds.
>
> Maybe I was exaggerating, but the point is that adding extra flags for
> every possible scenario is a disadvantageous approach in general.
> There is no need to increase the code base, the amount of test cases
> and the amount of documentation every time someone has an idea "in
> rare cases I also may want to do A or B, let's add a flag for this".
>

OK, but the point was that we've just added support for accepting hex
inputs in a particular format, so I think it would be useful if
to_hex() could produce outputs compatible with that.

> > Which is broken for INT_MIN:
> >
> > select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x))
> > from ( values (-2147483648) ) as s(x);
> >
> > ERROR:  integer out of range
>
> I'm afraid the behavior of something like to_hex(X, with_sign => true)
> is going to be exactly the same. There is no safe and consistent way
> to calculate abs(INT_MIN).
>

Of course there is. This is easy to code in C using unsigned ints,
without resorting to abs() (yes, I'm aware that abs() is undefined for
INT_MIN).

Regards,
Dean



Re: to_hex() for negative inputs

От
Aleksander Alekseev
Дата:
Hi Dean,

> Of course there is. This is easy to code in C using unsigned ints,
> without resorting to abs() (yes, I'm aware that abs() is undefined for
> INT_MIN).

So in your opinion what is the expected result of to_hex(INT_MIN,
with_sign => true)?

-- 
Best regards,
Aleksander Alekseev



Re: to_hex() for negative inputs

От
Dean Rasheed
Дата:
On Wed, 25 Jan 2023 at 10:57, Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> > Of course there is. This is easy to code in C using unsigned ints,
> > without resorting to abs() (yes, I'm aware that abs() is undefined for
> > INT_MIN).
>
> So in your opinion what is the expected result of to_hex(INT_MIN,
> with_sign => true)?
>

"-80000000" or "-0x80000000", depending on whether the prefix is
requested. The latter is legal input for an integer, which is the
point (to allow round-tripping):

SELECT int '-0x80000000';
    int4
-------------
 -2147483648
(1 row)

SELECT pg_typeof(-0x80000000);
 pg_typeof
-----------
 integer
(1 row)

Regards,
Dean



Re: to_hex() for negative inputs

От
Aleksander Alekseev
Дата:
Hi Dean,

> > So in your opinion what is the expected result of to_hex(INT_MIN,
> > with_sign => true)?
> >
>
> "-80000000" or "-0x80000000", depending on whether the prefix is
> requested.

Whether this is the right result is very debatable. 0x80000000 is a
binary representation of -2147483648:

```
(gdb) ptype cur_timeout
type = int
(gdb) p cur_timeout = 0x80000000
$1 = -2147483648
(gdb) p/x cur_timeout
$2 = 0x80000000
```

So what you propose to return is -(-2147483648). For some users this
may be a wanted result, for some it may be not. Personally I would
prefer to get an ERROR in this case. And this is exactly how you end
up with even more flags.

I believe it would be better to let the user write the exact query
depending on what he/she wants.

-- 
Best regards,
Aleksander Alekseev



Re: to_hex() for negative inputs

От
Peter Eisentraut
Дата:
On 24.01.23 14:10, Dean Rasheed wrote:
> I also think it might be useful for it to gain a couple of boolean options:
> 
> 1). An option to output a signed value (defaulting to false, to
> preserve the current two's complement output).

I find the existing behavior so strange, I would rather give up and 
invent a whole new function family with correct behavior, which could 
then also support octal and binary, and perhaps any base.  This could be 
something generally named, like "convert" or "format".

> 2). An option to output the base prefix "0x" (which comes after the
> sign, making it inconvenient for the user to add themselves).

This could also be handled with a "format"-like function.




Re: to_hex() for negative inputs

От
Dean Rasheed
Дата:
On Wed, 25 Jan 2023 at 21:43, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 24.01.23 14:10, Dean Rasheed wrote:
> > I also think it might be useful for it to gain a couple of boolean options:
> >
> > 1). An option to output a signed value (defaulting to false, to
> > preserve the current two's complement output).
>
> I find the existing behavior so strange, I would rather give up and
> invent a whole new function family with correct behavior, which could
> then also support octal and binary, and perhaps any base.  This could be
> something generally named, like "convert" or "format".
>
> > 2). An option to output the base prefix "0x" (which comes after the
> > sign, making it inconvenient for the user to add themselves).
>
> This could also be handled with a "format"-like function.
>

Yeah, making a break from the existing to_hex() functions might not be
a bad idea.

My only concern with something general like convert() or format() is
that it'll end up being hard to use, with lots of different formatting
options, like to_char(). In fact Oracle's to_char() has an 'X'
formatting option to output hexadecimal, though it's pretty limited
(e.g., it doesn't support negative inputs). MySQL has conv() that'll
convert between any two bases, but it does bizarre things for negative
inputs or inputs larger than 2^64.

TBH, I'm not that interested in bases other than hex (I mean who
really uses octal anymore?), and I don't particularly care about
formats other than the format we accept as input. For that, just
adding a numeric_to_hex() function would suffice. That works fine for
int4 and int8 inputs too, and doesn't preclude adding a more general
base-conversion / formatting function later, if there's demand.

Regards,
Dean