Обсуждение: Document if width_bucket's low and high are inclusive/exclusive

Поиск
Список
Период
Сортировка

Document if width_bucket's low and high are inclusive/exclusive

От
Ben Peachey Higdon
Дата:
The current documentation for width_bucket (https://www.postgresql.org/docs/current/functions-math.html) does not mention if the range’s low and high are inclusive or exclusive.

Returns the number of the bucket in which operand falls in a histogram having count equal-width buckets spanning the range low to high. Returns 0 or count+1 for an input outside that range.

I had assumed that both the low and high were inclusive but actually the low is inclusive while the high is exclusive.

For example:
SELECT width_bucket(0, 0, 1, 4)

returns 1, the first of 4 bins

SELECT width_bucket(1, 0, 1, 4) 

returns 5, because the high was outside the exclusive bound of high = 1

Thank you!

Re: Document if width_bucket's low and high are inclusive/exclusive

От
Robert Treat
Дата:
On Fri, Feb 28, 2025 at 7:15 AM Ben Peachey Higdon
<bpeacheyhigdon@gmail.com> wrote:
> The current documentation for width_bucket (https://www.postgresql.org/docs/current/functions-math.html) does not
mentionif the range’s low and high are inclusive or exclusive. 
>
> Returns the number of the bucket in which operand falls in a histogram having count equal-width buckets spanning the
rangelow to high. Returns 0 or count+1 for an input outside that range. 
>
> I had assumed that both the low and high were inclusive but actually the low is inclusive while the high is
exclusive.
>

I'm not sure it's the most ground breaking thing, but would probably
save a bunch of future people from having to gin up an example to test
it, so I'd probably update it per the following patch.


Robert Treat
https://xzilla.net

Вложения

Re: Document if width_bucket's low and high are inclusive/exclusive

От
Tom Lane
Дата:
Robert Treat <rob@xzilla.net> writes:
> On Fri, Feb 28, 2025 at 7:15 AM Ben Peachey Higdon
> <bpeacheyhigdon@gmail.com> wrote:
>> The current documentation for width_bucket (https://www.postgresql.org/docs/current/functions-math.html) does not
mentionif the range’s low and high are inclusive or exclusive. 

> I'm not sure it's the most ground breaking thing, but would probably
> save a bunch of future people from having to gin up an example to test
> it, so I'd probably update it per the following patch.

Seems reasonable, but do we need to do anything with the other
version of width_bucket (the one taking an array of lower bounds)?
Perhaps this change provides enough context, but I'm unsure.

            regards, tom lane



Re: Document if width_bucket's low and high are inclusive/exclusive

От
Robert Treat
Дата:
On Wed, Jun 18, 2025 at 4:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Treat <rob@xzilla.net> writes:
> > On Fri, Feb 28, 2025 at 7:15 AM Ben Peachey Higdon
> > <bpeacheyhigdon@gmail.com> wrote:
> >> The current documentation for width_bucket (https://www.postgresql.org/docs/current/functions-math.html) does not
mentionif the range’s low and high are inclusive or exclusive. 
>
> > I'm not sure it's the most ground breaking thing, but would probably
> > save a bunch of future people from having to gin up an example to test
> > it, so I'd probably update it per the following patch.
>
> Seems reasonable, but do we need to do anything with the other
> version of width_bucket (the one taking an array of lower bounds)?
> Perhaps this change provides enough context, but I'm unsure.
>

Since they are all lower bounds, they all operate the same way, so it
isn't quite as clear that it needs documenting. Are you thinking
something like this?

Returns the number of the bucket in which operand falls given an array
listing the lower bounds (inclusive) of the buckets


Robert Treat
https://xzilla.net



Re: Document if width_bucket's low and high are inclusive/exclusive

От
Tom Lane
Дата:
Robert Treat <rob@xzilla.net> writes:
> Since they are all lower bounds, they all operate the same way, so it
> isn't quite as clear that it needs documenting. Are you thinking
> something like this?

> Returns the number of the bucket in which operand falls given an array
> listing the lower bounds (inclusive) of the buckets

Yeah, though I might write "inclusive lower bounds" rather than use
parens.  What's bugging me though is the lack of any mention of the
bucket upper bounds: you have to deduce that the upper bounds must
be exclusive if the lower bounds are inclusive.  If that's obvious
here, why is it non-obvious for the other case?  Maybe instead of
the parenthetical form you suggested, add a sentence like

    Buckets have inclusive lower bounds, and therefore exclusive
    upper bounds.

and then we could either rely on the reader remembering that,
or else repeat it, for the second form of width_bucket.

Another thing I just remembered (think I knew it once) is the
behavior of the first form when low > high.  It's not an error!
I think we need to document that, perhaps along the lines of

    If low > high, the behavior is mirror-reversed, with bucket 1
    now being the one just below low, and the inclusive bounds
    now being on the upper side.

plus an example.

            regards, tom lane



Re: Document if width_bucket's low and high are inclusive/exclusive

От
Tom Lane
Дата:
I wrote:
> Another thing I just remembered (think I knew it once) is the
> behavior of the first form when low > high.  It's not an error!

So concretely, how about the attached?  In addition to what we
mentioned so far, I made the sentence about out-of-range cases
more explicit.

            regards, tom lane

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8d7d9a2f3e8..11676b63c82 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1824,13 +1824,24 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
         which <parameter>operand</parameter> falls in a histogram
         having <parameter>count</parameter> equal-width buckets spanning the
         range <parameter>low</parameter> to <parameter>high</parameter>.
-        Returns <literal>0</literal>
+        The buckets have inclusive lower bounds, and therefore exclusive
+        upper bounds.
+        Returns <literal>0</literal> for an input less
+        than <parameter>low</parameter>,
         or <literal><parameter>count</parameter>+1</literal> for an input
-        outside that range.
+        greater than or equal to <parameter>high</parameter>.
+        If <parameter>low</parameter> > <parameter>high</parameter>,
+        the behavior is mirror-reversed, with bucket <literal>1</literal>
+        now being the one just below <parameter>low</parameter>, and the
+        inclusive bounds now being on the upper side.
        </para>
        <para>
         <literal>width_bucket(5.35, 0.024, 10.06, 5)</literal>
         <returnvalue>3</returnvalue>
+       </para>
+       <para>
+        <literal>width_bucket(9, 10, 0, 10)</literal>
+        <returnvalue>2</returnvalue>
        </para></entry>
       </row>

@@ -1842,8 +1853,8 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
        <para>
         Returns the number of the bucket in
         which <parameter>operand</parameter> falls given an array listing the
-        lower bounds of the buckets.  Returns <literal>0</literal> for an
-        input less than the first lower
+        inclusive lower bounds of the buckets.
+        Returns <literal>0</literal> for an input less than the first lower
         bound.  <parameter>operand</parameter> and the array elements can be
         of any type having standard comparison operators.
         The <parameter>thresholds</parameter> array <emphasis>must be

Re: Document if width_bucket's low and high are inclusive/exclusive

От
Dean Rasheed
Дата:
On Fri, 20 Jun 2025 at 22:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> So concretely, how about the attached?
>

LGTM (though I'm not sure it really needs the word "therefore" in the
first hunk).

There are also a couple of code comments that need fixing --
width_bucket_float8() comes with the following comment:

 * 'bound1' and 'bound2' are the lower and upper bounds of the
 * histogram's range, respectively. 'count' is the number of buckets
 * in the histogram. width_bucket() returns an integer indicating the
 * bucket number that 'operand' belongs to in an equiwidth histogram
 * with the specified characteristics. An operand smaller than the
 * lower bound is assigned to bucket 0. An operand greater than the
 * upper bound is assigned to an additional bucket (with number
 * count+1). We don't allow "NaN" for any of the float8 inputs, and we
 * don't allow either of the histogram bounds to be +/- infinity.

so at the very least, that should be made to say "greater than or
equal to", instead of "greater than". Similarly for
width_bucket_numeric().

Also, since PG14, type numeric has supported infinity, so its comment
should probably include that last part about not allowing +/- infinity
in the histogram bounds.

Regards,
Dean



Re: Document if width_bucket's low and high are inclusive/exclusive

От
Tom Lane
Дата:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On Fri, 20 Jun 2025 at 22:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So concretely, how about the attached?

> LGTM (though I'm not sure it really needs the word "therefore" in the
> first hunk).

OK, done that way.

> There are also a couple of code comments that need fixing --

Good points, also done.

While looking at those comments, I also noted that there is a
strange inconsistency between width_bucket_array and
width_bucket_float8/width_bucket_numeric.  Namely, the latter
two reject an "operand" that is NaN, while width_bucket_array
goes out of its way to accept it and treat it in our usual
fashion as sorting higher than all non-NaNs.

Clearly these functions must reject NaN histogram bounds, for
the same reason they reject infinite bounds.  But I don't see
any reason why they couldn't treat a NaN operand as valid.
Should we change them?  (I imagine this'd be a HEAD-only
change, and probably v19 material at this point.)

            regards, tom lane



Re: Document if width_bucket's low and high are inclusive/exclusive

От
Dean Rasheed
Дата:
On Sat, 21 Jun 2025 at 18:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> While looking at those comments, I also noted that there is a
> strange inconsistency between width_bucket_array and
> width_bucket_float8/width_bucket_numeric.  Namely, the latter
> two reject an "operand" that is NaN, while width_bucket_array
> goes out of its way to accept it and treat it in our usual
> fashion as sorting higher than all non-NaNs.
>
> Clearly these functions must reject NaN histogram bounds, for
> the same reason they reject infinite bounds.  But I don't see
> any reason why they couldn't treat a NaN operand as valid.
> Should we change them?  (I imagine this'd be a HEAD-only
> change, and probably v19 material at this point.)
>

Yes, I think that's a good idea (for v19 I would have thought).
Allowing the operand to be NaN definitely seems preferable to throwing
an error, since the operand might well come from data in a table
containing NaNs.

Regards,
Dean



Re: Document if width_bucket's low and high are inclusive/exclusive

От
Tom Lane
Дата:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On Sat, 21 Jun 2025 at 18:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Clearly these functions must reject NaN histogram bounds, for
>> the same reason they reject infinite bounds.  But I don't see
>> any reason why they couldn't treat a NaN operand as valid.
>> Should we change them?  (I imagine this'd be a HEAD-only
>> change, and probably v19 material at this point.)

> Yes, I think that's a good idea (for v19 I would have thought).
> Allowing the operand to be NaN definitely seems preferable to throwing
> an error, since the operand might well come from data in a table
> containing NaNs.

I started a new thread for that, since it's no longer docs material:

https://www.postgresql.org/message-id/2822872.1750540911%40sss.pgh.pa.us

            regards, tom lane