Обсуждение: [DOCS] Window function docs

Поиск
Список
Период
Сортировка

[DOCS] Window function docs

От
Bruce Momjian
Дата:
As part of writing a talk about window functions, I have done some
cleanups of the window function docs, attached, that I would like to
apply to head.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Вложения

Re: [DOCS] Window function docs

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> As part of writing a talk about window functions, I have done some
> cleanups of the window function docs, attached, that I would like to
> apply to head.

I think this change is just wrong:

     aggregate function, but the <literal>OVER</> clause causes it to be
-    treated as a window function and computed across an appropriate set of
-    rows.)
+    treated as a window function and computed across the entire partition.)
    </para>

avg() is going to be evaluated across the current frame, not the whole
partition.  I don't mind trying to be more precise than "appropriate set",
but it doesn't help to be apparently precise but wrong.

Not really convinced by the s/regular/normal/g changes.  That doesn't seem
to clarify much either.  Maybe it would help to define "normal" near here.
There's a definition in func.sgml (that they are not ordered-set or
hypothetical-set aggregates) but that's too far away.

In this bit:

-   definition.  Rows that are not distinct in the <literal>ORDER BY</>
-   ordering are said to be <firstterm>peers</>; the four ranking functions
-   are defined so that they give the same answer for any two peer rows.
+   definition.  Rows that are not distinct based on <literal>ORDER BY</>
+   ordering are said to be <firstterm>peers</>.  The four ranking
+   functions (includes <function>cume_dist</>) are defined so that they
+   give the same answer for any two peer rows.

"based on" seems little better than the previous wording.  Maybe it'd be
better to say "Rows that are not distinct when considering only the
<literal>ORDER BY</> columns are said to be ..."

    sometimes also <function>nth_value</>.  You can redefine the frame by
-   adding a suitable frame specification (<literal>RANGE</> or
-   <literal>ROWS</>) to the <literal>OVER</> clause.
-   See <xref linkend="syntax-window-functions"> for more information
-   about frame specifications.
+   modifying the frame specification;  see <xref
+   linkend="syntax-window-functions"> for more information about frame
+   specifications.
   </para>

Don't really like this change; it makes it vaguer for no apparent benefit.

+  <para>
+   <function>cume_dist</> computes the ratio of partition rows that
+   are less than or equal to the current row and its peers, while
+   <function>percent_rank</> computes the ratio of lesser partition rows,
+   assuming the current row is does not exist in the partition.
+  </para>

The first one probably ought to be "computes the fraction of partition
rows".  The second definition seems nonsensical --- ratio to what?  Or
probably better, fraction of what?  (And "is does not" needs help too.)

     the user.  Also, any built-in or user-defined normal aggregate function
-    can be used as a window function.  Ordered-set aggregates presently
-    cannot be used as window functions, however.
+    can be used as a window function.  However, ordered-set aggregate
+    cannot presently be used as window functions.
    </para>

Grammar (don't lose the "s" in ordered-set aggregates).

            regards, tom lane


Re: [DOCS] Window function docs

От
"David G. Johnston"
Дата:
On Thu, Apr 6, 2017 at 2:49 PM, Bruce Momjian <bruce@momjian.us> wrote:
As part of writing a talk about window functions, I have done some
cleanups of the window function docs, attached, that I would like to
apply to head.

+1 Overall.  Some minor technical items noted below.  The only conceptual item is whether (and, if so, how) to deal with partitions vs. frames.

​-    But unlike regular aggregate functions, use of a window function does not
+    But unlike normal aggregate functions, use of a window function does not

How broadly did you look for usage of normal versus regular?  The window function doc page uses "normal" once, vis-a-vis "ordered-set", I haven't looked further.  Note on that page the intro paragraph doesn't mention hypothetical-set aggregates as being distinct from ordered-set aggregates yet the tables and the docs here do.


+    treated as a window function and computed across the entire partition.)

Would introducing the concept of frame here trade precision for comprehension?

-    <firstterm>window frame</>.  Many (but not all) window functions act only
+    <firstterm>window frame</>.  Some window functions act only

This is a semantic change as to whether the dominate behavior is to act on frames or partitions.  IIUC very few window functions, when presented with a ORDER BY'd OVER and thus a limited frame, will still act on the entire partition instead of the just the frame they are provided.  In particular don't all normal aggregates (and user-defined ones) operate on frames?  The cume_dist function noted later is one of the few exceptions since it effectively operates on both - the frame for the numerator and the partition for the denominator.

+   aggregate function (i.e. not ordered-set or hypothetical-set aggregates)

I, and others, subscribe to the believe that both i.e., and e.g., should have trailing commas...


+   functions (includes <function>cume_dist</>) are defined so that they

s/b "including <function>cume_dist</>"

+   give the same answer for any two peer rows.

for all peer rows?

+   assuming the current row is does not exist in the partition.

remove "is"




Re: [DOCS] Window function docs

От
Bruce Momjian
Дата:
On Thu, Apr  6, 2017 at 06:13:44PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > As part of writing a talk about window functions, I have done some
> > cleanups of the window function docs, attached, that I would like to
> > apply to head.
>
> I think this change is just wrong:
>
>      aggregate function, but the <literal>OVER</> clause causes it to be
> -    treated as a window function and computed across an appropriate set of
> -    rows.)
> +    treated as a window function and computed across the entire partition.)
>     </para>
>
> avg() is going to be evaluated across the current frame, not the whole
> partition.  I don't mind trying to be more precise than "appropriate set",
> but it doesn't help to be apparently precise but wrong.

Good point.  I have adjusted the text to say "window frame".

> Not really convinced by the s/regular/normal/g changes.  That doesn't seem
> to clarify much either.  Maybe it would help to define "normal" near here.
> There's a definition in func.sgml (that they are not ordered-set or
> hypothetical-set aggregates) but that's too far away.

Yeah, I went back and forth on that, and it seems like a mine-field.
The problem is that SUM() is both a non-window aggregate, and a window
function, and there isn't a clear terminology for non-window aggregates,
so changed the patch to just call them that.  Hopefully the dual nature
of this is now clearer.

> In this bit:
>
> -   definition.  Rows that are not distinct in the <literal>ORDER BY</>
> -   ordering are said to be <firstterm>peers</>; the four ranking functions
> -   are defined so that they give the same answer for any two peer rows.
> +   definition.  Rows that are not distinct based on <literal>ORDER BY</>
> +   ordering are said to be <firstterm>peers</>.  The four ranking
> +   functions (includes <function>cume_dist</>) are defined so that they
> +   give the same answer for any two peer rows.
>
> "based on" seems little better than the previous wording.  Maybe it'd be
> better to say "Rows that are not distinct when considering only the
> <literal>ORDER BY</> columns are said to be ..."

OK, I used your wording.

>     sometimes also <function>nth_value</>.  You can redefine the frame by
> -   adding a suitable frame specification (<literal>RANGE</> or
> -   <literal>ROWS</>) to the <literal>OVER</> clause.
> -   See <xref linkend="syntax-window-functions"> for more information
> -   about frame specifications.
> +   modifying the frame specification;  see <xref
> +   linkend="syntax-window-functions"> for more information about frame
> +   specifications.
>    </para>
>
> Don't really like this change; it makes it vaguer for no apparent benefit.

Reverted.  I had changed it because it is more the frame definition that
we want to highlight here, not ROW vs. RANGE, but there is no clearer
way to specify that clause.

> +  <para>
> +   <function>cume_dist</> computes the ratio of partition rows that
> +   are less than or equal to the current row and its peers, while
> +   <function>percent_rank</> computes the ratio of lesser partition rows,
> +   assuming the current row is does not exist in the partition.
> +  </para>
>
> The first one probably ought to be "computes the fraction of partition
> rows".  The second definition seems nonsensical --- ratio to what?  Or
> probably better, fraction of what?  (And "is does not" needs help too.)

Updated.

>      the user.  Also, any built-in or user-defined normal aggregate function
> -    can be used as a window function.  Ordered-set aggregates presently
> -    cannot be used as window functions, however.
> +    can be used as a window function.  However, ordered-set aggregate
> +    cannot presently be used as window functions.
>     </para>
>
> Grammar (don't lose the "s" in ordered-set aggregates).

Thanks for the review.  Updated patch attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Вложения

Re: [DOCS] Window function docs

От
Bruce Momjian
Дата:
On Thu, Apr  6, 2017 at 03:53:24PM -0700, David G. Johnston wrote:
> On Thu, Apr 6, 2017 at 2:49 PM, Bruce Momjian <bruce@momjian.us> wrote:
>
>     As part of writing a talk about window functions, I have done some
>     cleanups of the window function docs, attached, that I would like to
>     apply to head.
>
>
> +1 Overall.  Some minor technical items noted below.  The only conceptual item
> is whether (and, if so, how) to deal with partitions vs. frames.
>
> ​-    But unlike regular aggregate functions, use of a window function does not
> +    But unlike normal aggregate functions, use of a window function does not
>
> How broadly did you look for usage of normal versus regular?  The window
> function doc page uses "normal" once, vis-a-vis "ordered-set", I haven't looked
> further.  Note on that page the intro paragraph doesn't mention

I did a grep looking for 'regular.*agg'.

> hypothetical-set aggregates as being distinct from ordered-set aggregates yet
> the tables and the docs here do.

Do you have any suggested text for that?

> https://www.postgresql.org/docs/current/static/functions-aggregate.html
>
> +    treated as a window function and computed across the entire partition.)
>
> Would introducing the concept of frame here trade precision for comprehension?

I modified this to say "window frame" as part of the patch for Tom.

> -    <firstterm>window frame</>.  Many (but not all) window functions act only
> +    <firstterm>window frame</>.  Some window functions act only
>
> This is a semantic change as to whether the dominate behavior is to act on
> frames or partitions.  IIUC very few window functions, when presented with a
> ORDER BY'd OVER and thus a limited frame, will still act on the entire
> partition instead of the just the frame they are provided.  In particular don't

If you look at slide 79 here you can get a good overview of what
operates on frames vs. partitions:

    http://momjian.us/main/writings/pgsql/window.pdf

> all normal aggregates (and user-defined ones) operate on frames?  The cume_dist
> function noted later is one of the few exceptions since it effectively operates
> on both - the frame for the numerator and the partition for the denominator.

Actually, cume_dist does not operate on frames because if you specify
ORDER BY x ROWS CURRENT ROW, cume_dist still operates on the peer and
lesser rows.  So, the frame clause is ignored by cume_dist --- in a way
it has RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW wired into the
window function.

> +   aggregate function (i.e. not ordered-set or hypothetical-set aggregates)
>
> I, and others, subscribe to the believe that both i.e., and e.g., should have
> trailing commas...
>
> http://english.stackexchange.com/questions/6894/
> should-there-be-a-comma-after-i-e

Done.

> +   functions (includes <function>cume_dist</>) are defined so that they
>
> s/b "including <function>cume_dist</>"

Done.

> +   give the same answer for any two peer rows.
>
> for all peer rows?

Much better, done.

> +   assuming the current row is does not exist in the partition.
>
> remove "is"

Fixed in Tom's patch.

Updated patch attached.  Thanks for the review.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Вложения

Re: [DOCS] Window function docs

От
"David G. Johnston"
Дата:
On Thu, Apr 6, 2017 at 6:14 PM, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Apr  6, 2017 at 03:53:24PM -0700, David G. Johnston wrote:

> hypothetical-set aggregates as being distinct from ordered-set aggregates yet
> the tables and the docs here do.

Do you have any suggested text for that?

- ​The built-in ordered-set aggregate functions are listed in Table 9-53 and Table 9-54.
The built-in ordered-set aggregate functions are listed in Table 9-53 while the built-in hypothetical-set ones are in Table 9-54.

That's a direct translation.  Reading some more, though, I think that:

+ The built-in within-group aggregate functions are listed in Table 9-53 (ordered-set) and Table 9-54 (hypothetical-set).

So now we'd have:

grouping-aggregates:
normal aggregates (no suffix modifiers, though can embed ORDER BY in the aggregate function argument area)

within-group aggregates (requires WITHIN GROUP suffix modifier)
ordered-set aggregates -> given parameters determine a record value, possibly imputed (within the defined group)
hypothetical-set aggregates -> given a (virtual) record determine a parameter (within the defined group)

non-grouping aggregates:
window aggregates (requires OVER suffix modifier)

The CREATE AGGREGATE only names those three specific types in the notes section.

David J.

Re: [DOCS] Window function docs

От
Bruce Momjian
Дата:
On Thu, Apr  6, 2017 at 06:43:38PM -0700, David G. Johnston wrote:
> On Thu, Apr 6, 2017 at 6:14 PM, Bruce Momjian <bruce@momjian.us> wrote:
>
>     On Thu, Apr  6, 2017 at 03:53:24PM -0700, David G. Johnston wrote:
>
>     > hypothetical-set aggregates as being distinct from ordered-set aggregates
>     yet
>     > the tables and the docs here do.
>
>     Do you have any suggested text for that?
>
>
> - ​The built-in ordered-set aggregate functions are listed in Table 9-53 and
> Table 9-54.
> + The built-in ordered-set aggregate functions are listed in Table 9-53 while
> the built-in hypothetical-set ones are in Table 9-54.
>
> That's a direct translation.  Reading some more, though, I think that:
>
> + The built-in within-group aggregate functions are listed in Table 9-53
> (ordered-set) and Table 9-54 (hypothetical-set).

I went with this, for simplicity:

   The built-in within-group ordered-set aggregate functions
   are listed in <xref linkend="functions-orderedset-table">
   while the built-in within-group hypothetical-set ones are in <xref
   linkend="functions-hypothetical-table">.

> So now we'd have:
>
> grouping-aggregates:
> normal aggregates (no suffix modifiers, though can embed ORDER BY in the
> aggregate function argument area)
>
> within-group aggregates (requires WITHIN GROUP suffix modifier)
> ordered-set aggregates -> given parameters determine a record value, possibly
> imputed (within the defined group)
> hypothetical-set aggregates -> given a (virtual) record determine a parameter
> (within the defined group)
>
> non-grouping aggregates:
> window aggregates (requires OVER suffix modifier)
>
> The CREATE AGGREGATE only names those three specific types in the notes
> section.

Yes, very complicated.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: [DOCS] Window function docs

От
Bruce Momjian
Дата:
On Thu, Apr  6, 2017 at 06:13:44PM -0400, Tom Lane wrote:
> avg() is going to be evaluated across the current frame, not the whole
> partition.  I don't mind trying to be more precise than "appropriate set",
> but it doesn't help to be apparently precise but wrong.
>
> Not really convinced by the s/regular/normal/g changes.  That doesn't seem
> to clarify much either.  Maybe it would help to define "normal" near here.
> There's a definition in func.sgml (that they are not ordered-set or
> hypothetical-set aggregates) but that's too far away.

This change is not on my patch yes but what do you think about changing
"regular" and "normal" aggregate designations to "generic"?  That seems
clearer because the aggregates work in non-window and window contexts.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: [DOCS] Window function docs

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> On Thu, Apr  6, 2017 at 06:13:44PM -0400, Tom Lane wrote:
>> Not really convinced by the s/regular/normal/g changes.  That doesn't seem
>> to clarify much either.  Maybe it would help to define "normal" near here.
>> There's a definition in func.sgml (that they are not ordered-set or
>> hypothetical-set aggregates) but that's too far away.

> This change is not on my patch yes but what do you think about changing
> "regular" and "normal" aggregate designations to "generic"?  That seems
> clearer because the aggregates work in non-window and window contexts.

I don't think "generic" is le mot juste here.  What's more generic about
avg() than about mode(), for instance?

My dictionary says "generic" means "characteristic of a class or group
of things", which doesn't seem to have anything to do with this.

I actually think "regular" might be a better choice than "normal" for this
purpose, mainly because somebody looking for statistical aggregates might
think that "normal" has something to do with normal distributions.  So
maybe we should leave that one group of uses alone and s/normal/regular/
in section 9.20.  But in any case it would help to define the term
explicitly in both sections.

            regards, tom lane


Re: [DOCS] Window function docs

От
Bruce Momjian
Дата:
On Fri, Apr  7, 2017 at 01:06:53PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > This change is not on my patch yes but what do you think about changing
> > "regular" and "normal" aggregate designations to "generic"?  That seems
> > clearer because the aggregates work in non-window and window contexts.
>
> I don't think "generic" is le mot juste here.  What's more generic about
> avg() than about mode(), for instance?
>
> My dictionary says "generic" means "characteristic of a class or group
> of things", which doesn't seem to have anything to do with this.
>
> I actually think "regular" might be a better choice than "normal" for this
> purpose, mainly because somebody looking for statistical aggregates might
> think that "normal" has something to do with normal distributions.  So
> maybe we should leave that one group of uses alone and s/normal/regular/
> in section 9.20.  But in any case it would help to define the term
> explicitly in both sections.

I don't think adding another defined term in this dense text is going to
help so I just specified "general-purpose and statistical aggregates"
where needed;  patch attached.

I didn't touch any mention of "normal" in reference to
pg_aggregate.aggfnoid where normal/"n" has a specific meaning.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Вложения

Re: [DOCS] Window function docs

От
Bruce Momjian
Дата:
On Mon, Apr 10, 2017 at 03:15:58PM -0400, Bruce Momjian wrote:
> On Fri, Apr  7, 2017 at 01:06:53PM -0400, Tom Lane wrote:
> > I actually think "regular" might be a better choice than "normal" for this
> > purpose, mainly because somebody looking for statistical aggregates might
> > think that "normal" has something to do with normal distributions.  So
> > maybe we should leave that one group of uses alone and s/normal/regular/
> > in section 9.20.  But in any case it would help to define the term
> > explicitly in both sections.
>
> I don't think adding another defined term in this dense text is going to
> help so I just specified "general-purpose and statistical aggregates"
> where needed;  patch attached.
>
> I didn't touch any mention of "normal" in reference to
> pg_aggregate.aggfnoid where normal/"n" has a specific meaning.

Patch applied.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +