Обсуждение: Frames vs partitions: is SQL2008 completely insane?

Поиск
Список
Период
Сортировка

Frames vs partitions: is SQL2008 completely insane?

От
Tom Lane
Дата:
According to SQL2008 section 7.11 <window clause>, general rule 5, the
default definition of window framing in a window that has an ordering
clause but no framing (RANGE/ROWS) clause is that the window frame for
a given row R runs from the first row of its partition through the last
peer of R.

Section 6.10's general rules define the results of LEAD, LAG,
FIRST_VALUE, LAST_VALUE, NTH_VALUE in terms of the rows available in the
window frame of the current window, not its partition.

Meanwhile, section 6.10 <window function> syntax rule 6 says that
LEAD/LAG must use a window that has an ordering clause and no
framing clause.

This means that without an explicit framing clause, none of these
functions can "look beyond" the last peer of the current row; and
what's worse, LEAD/LAG seem to be explicitly forbidden from looking
further than that even if we had an implementation of framing clauses.

This seems to be less than sane.  I would certainly expect that LEAD(x)
gives you the next value of x regardless of peer-row status, since
LAG(x) gives you the prior value of x regardless of peer row status.
It is also simply bizarre for FIRST_VALUE to give you the partition's
first row when LAST_VALUE doesn't give you the partition's last row.

Are there any errata for SQL2008 yet?  Can anyone check the actual
behavior of DB2 or other DBMS's that claim to implement these functions?

I notice that the current patch code seems to implement
first/last/nth_value using the frame, but lead/lag using the partition,
which doesn't conform to spec AFAICS ... but lead/lag on the frame
doesn't actually appear to be a useful definition so I'd rather go
with that than with what the letter of the spec seems to say.

Lastly, for a simple aggregate used with an OVER clause, the current
patch seems to define the aggregate as being taken over the frame
rather than the partition, but I cannot find anything in SQL2008 that
lends any support to *either* definition.

Comments?  This all seems rather badly broken.
        regards, tom lane


Re: Frames vs partitions: is SQL2008 completely insane?

От
Tom Lane
Дата:
I wrote:
> Lastly, for a simple aggregate used with an OVER clause, the current
> patch seems to define the aggregate as being taken over the frame
> rather than the partition, but I cannot find anything in SQL2008 that
> lends any support to *either* definition.

Never mind that --- I found it in 10.9 syntax rule 4.b.

But what this seems to boil down to is that LEAD() and LAST_VALUE()
are completely useless unless you're allowed to specify a nondefault
framing clause ... and don't mind ignoring the clearly-insane
restriction of 6.10 syntax rule 6.b.

The minimum extra functionality needed to make these functions useful
would seem to be to allow UNBOUNDED FOLLOWING ...
        regards, tom lane


Re: Frames vs partitions: is SQL2008 completely insane?

От
"Hitoshi Harada"
Дата:
2008/12/27 Tom Lane <tgl@sss.pgh.pa.us>:
> I notice that the current patch code seems to implement
> first/last/nth_value using the frame, but lead/lag using the partition,
> which doesn't conform to spec AFAICS ... but lead/lag on the frame
> doesn't actually appear to be a useful definition so I'd rather go
> with that than with what the letter of the spec seems to say.

In 4.15, it says:

The lead and lag functions each take three arguments, a <value
expression> VE, an <exact numeric literal>
OFFSET, and a <value expression> DEFAULT. For each row R within the
*window partition P of R* defined by
a window structure descriptor, the lag function returns the value of
VE evaluated on a row that is OFFSET
number of rows before R within P,

for lead/lag, and

returns the value of VE evaluated on the n-th row from the first (if
FROM FIRST is specified or implied) or the last (if FROM LAST is
specified) row of the *window frame* of R
defined by a window structure descriptor

for nth_value, added * by me.

I understand lead/lag can affect all rows in the partition whereas
first/last/nth_value does only rows in the frame.

I guess that's why 6.10 rule 6.b forbids frame caluse in lead/lag but
actually we can ignore frame though the frame is specified in the
window attached with lead/lag, and it is better if you call aggregate
with frame and lead/lag on the same window spec, which allows us to
optimize it by calling them on the same node. It violates the spec but
we'd better to extend the standard like the offset argument of
lead/lag.


Regards,

-- 
Hitoshi Harada


Re: Frames vs partitions: is SQL2008 completely insane?

От
Tom Lane
Дата:
"Hitoshi Harada" <umi.tanuki@gmail.com> writes:
> 2008/12/27 Tom Lane <tgl@sss.pgh.pa.us>:
>> I notice that the current patch code seems to implement
>> first/last/nth_value using the frame, but lead/lag using the partition,
>> which doesn't conform to spec AFAICS ...

> In 4.15, it says:

> The lead and lag functions each take three arguments, a <value
> expression> VE, an <exact numeric literal>
> OFFSET, and a <value expression> DEFAULT. For each row R within the
> *window partition P of R* defined by
> a window structure descriptor, the lag function returns the value of
> VE evaluated on a row that is OFFSET
> number of rows before R within P,

Well, that's interesting, but I think the controlling definition is in
6.10 general rule 1b, which very clearly states that the frame is to be
used for lead/lag (and the adjacent rules say the same for all the other
standard window functions).

The wording in 4.15 does seem like evidence that the spec authors may
have misspoke in 6.10, but we're never going to settle it from the text
of the spec.  Can anyone check what DB2 and Oracle do here?

In any case, both sections agree that last_value works on the frame,
which makes it effectively useless with the default frame definition.
So I'm still thinking that we need at least a subset of frame support.
I'm tempted to propose that we just handle the CURRENT ROW and UNBOUNDED
options for each end of the frame.
        regards, tom lane


Re: Frames vs partitions: is SQL2008 completely insane?

От
"Hitoshi Harada"
Дата:
2008/12/28 Tom Lane <tgl@sss.pgh.pa.us>:
> "Hitoshi Harada" <umi.tanuki@gmail.com> writes:
>> 2008/12/27 Tom Lane <tgl@sss.pgh.pa.us>:
>>> I notice that the current patch code seems to implement
>>> first/last/nth_value using the frame, but lead/lag using the partition,
>>> which doesn't conform to spec AFAICS ...
>
>> In 4.15, it says:
>
>> The lead and lag functions each take three arguments, a <value
>> expression> VE, an <exact numeric literal>
>> OFFSET, and a <value expression> DEFAULT. For each row R within the
>> *window partition P of R* defined by
>> a window structure descriptor, the lag function returns the value of
>> VE evaluated on a row that is OFFSET
>> number of rows before R within P,
>
> Well, that's interesting, but I think the controlling definition is in
> 6.10 general rule 1b, which very clearly states that the frame is to be
> used for lead/lag (and the adjacent rules say the same for all the other
> standard window functions).
>
> The wording in 4.15 does seem like evidence that the spec authors may
> have misspoke in 6.10, but we're never going to settle it from the text
> of the spec.  Can anyone check what DB2 and Oracle do here?

I tested on Oracle 10.2.0, and the results are:

select depname, empno, salary,
lead(salary, 1) over (order by salary),
lag(salary, 1) over (order by salary),
first_value(salary) over (order by salary),
last_value(salary) over (order by salary)
from empsalary;

DEPNAME     EMPNO     SALARY     LEAD(SALARY,1)OVER(ORDERBYSALARY)    
LAG(SALARY,1)OVER(ORDERBYSALARY)    
FIRST_VALUE(SALARY)OVER(ORDERBYSALARY)    
LAST_VALUE(SALARY)OVER(ORDERBYSALARY)
personnel     5     3500     3900              3500     3500
personnel     2     3900     4200     3500     3500     3900
develop     7     4200     4500     3900     3500     4200
develop     9     4500     4800     4200     3500     4500
sales     4     4800     4800     4500     3500     4800
sales     3     4800     5000     4800     3500     4800
sales     1     5000     5200     4800     3500     5000
develop     10     5200     5200     5000     3500     5200
develop     11     5200     6000     5200     3500     5200
develop     8     6000              5200     3500     6000

which means the section 4.15 is true. Could anyone try DB2?

> In any case, both sections agree that last_value works on the frame,
> which makes it effectively useless with the default frame definition.
> So I'm still thinking that we need at least a subset of frame support.
> I'm tempted to propose that we just handle the CURRENT ROW and UNBOUNDED
> options for each end of the frame.

If we can afford it, I and many users are so glad with it. In my
opinion it is not so painful if we had done concrete fundamentals of
window functions, but isn't it up to our time for the release?


Regards,

-- 
Hitoshi Harada


Re: Frames vs partitions: is SQL2008 completely insane?

От
Ron Mayer
Дата:
Hitoshi Harada wrote:
> 2008/12/28 Tom Lane <tgl@sss.pgh.pa.us>:
>> "Hitoshi Harada" <umi.tanuki@gmail.com> writes:
>>> 2008/12/27 Tom Lane <tgl@sss.pgh.pa.us>:
>>>> which doesn't conform to spec AFAICS ...
>>> ....4.15...says:
>> interesting...6.10 general rule 1b, which very clearly states ...
>> ... 4.15 does seem like evidence that the spec authors may
>> have misspoke in 6.10
> Oracle... results are: ... which means the section 4.15 is true

ISTM ISO should hire you guys (or the postgres project as a whole)
to proof-read their specs before they publish them.



Re: Frames vs partitions: is SQL2008 completely insane?

От
"David Rowley"
Дата:
Hitoshi Harada wrote:
> I tested on Oracle 10.2.0, and the results are:
> 
> select depname, empno, salary,
> lead(salary, 1) over (order by salary),
> lag(salary, 1) over (order by salary),
> first_value(salary) over (order by salary),
> last_value(salary) over (order by salary)
> from empsalary;
> 
> DEPNAME     EMPNO     SALARY     LEAD(SALARY,1)OVER(ORDERBYSALARY)
> LAG(SALARY,1)OVER(ORDERBYSALARY)
> FIRST_VALUE(SALARY)OVER(ORDERBYSALARY)
> LAST_VALUE(SALARY)OVER(ORDERBYSALARY)
> personnel     5     3500     3900              3500     3500
> personnel     2     3900     4200     3500     3500     3900
> develop     7     4200     4500     3900     3500     4200
> develop     9     4500     4800     4200     3500     4500
> sales     4     4800     4800     4500     3500     4800
> sales     3     4800     5000     4800     3500     4800
> sales     1     5000     5200     4800     3500     5000
> develop     10     5200     5200     5000     3500     5200
> develop     11     5200     6000     5200     3500     5200
> develop     8     6000              5200     3500     6000
> 
> which means the section 4.15 is true. Could anyone try DB2?

DB2 9.5 results:

Using the empsalary table from the regression test in the patch:

select depname, empno, salary,
lead(salary, 1) over (order by salary),
lag(salary, 1) over (order by salary),
first_value(salary) over (order by salary),
last_value(salary) over (order by salary)
from empsalary;

personnel    5    3500    3900        3500    3500
personnel    2    3900    4200    3500    3500    3900
develop    7    4200    4500    3900    3500    4200
develop    9    4500    4800    4200    3500    4500
sales    4    4800    4800    4500    3500    4800
sales    3    4800    5000    4800    3500    4800
sales    1    5000    5200    4800    3500    5000
develop    10    5200    5200    5000    3500    5200
develop    11    5200    6000    5200    3500    5200
develop    8    6000        5200    3500    6000


Which matches with your Oracle results. So either they both got it wrong by
one copying the other... <sarcasm> Of course we all know it couldn't be
Oracle copying IBM, that would never happen... </sarcasm>

David




Re: Frames vs partitions: is SQL2008 completely insane?

От
"David Rowley"
Дата:
Hitoshi Harada wrote:
> I tested on Oracle 10.2.0, and the results are:
> 
> select depname, empno, salary,
> lead(salary, 1) over (order by salary),
> lag(salary, 1) over (order by salary),
> first_value(salary) over (order by salary),
> last_value(salary) over (order by salary)
> from empsalary;
> 
> DEPNAME     EMPNO     SALARY     LEAD(SALARY,1)OVER(ORDERBYSALARY)
> LAG(SALARY,1)OVER(ORDERBYSALARY)
> FIRST_VALUE(SALARY)OVER(ORDERBYSALARY)
> LAST_VALUE(SALARY)OVER(ORDERBYSALARY)
> personnel     5     3500     3900              3500     3500
> personnel     2     3900     4200     3500     3500     3900
> develop     7     4200     4500     3900     3500     4200
> develop     9     4500     4800     4200     3500     4500
> sales     4     4800     4800     4500     3500     4800
> sales     3     4800     5000     4800     3500     4800
> sales     1     5000     5200     4800     3500     5000
> develop     10     5200     5200     5000     3500     5200
> develop     11     5200     6000     5200     3500     5200
> develop     8     6000              5200     3500     6000
> 
> which means the section 4.15 is true. Could anyone try DB2?

DB2 9.5 results:

Using the empsalary table from the regression test in the patch:

select depname, empno, salary,
lead(salary, 1) over (order by salary),
lag(salary, 1) over (order by salary),
first_value(salary) over (order by salary),
last_value(salary) over (order by salary)
from empsalary;

personnel    5    3500    3900        3500    3500
personnel    2    3900    4200    3500    3500    3900
develop    7    4200    4500    3900    3500    4200
develop    9    4500    4800    4200    3500    4500
sales    4    4800    4800    4500    3500    4800
sales    3    4800    5000    4800    3500    4800
sales    1    5000    5200    4800    3500    5000
develop    10    5200    5200    5000    3500    5200
develop    11    5200    6000    5200    3500    5200
develop    8    6000        5200    3500    6000


Which matches with your Oracle results. So either they both got it wrong by
one copying the other... <sarcasm> Of course we all know it couldn't be
Oracle copying IBM, that would never happen... </sarcasm>

David




Re: Frames vs partitions: is SQL2008 completely insane?

От
Tom Lane
Дата:
"David Rowley" <dgrowley@gmail.com> writes:
> Hitoshi Harada wrote:
>> I tested on Oracle 10.2.0, and the results are:
>> ...
>> which means the section 4.15 is true. Could anyone try DB2?

> DB2 9.5 results [ are the same ]

OK, good, that means the reference to the frame in 6.10 rule 1b is just
a copy-and-pasteo.  (I wonder if it got fixed in the final spec?)
I guess this makes sense because we also see that 6.10 syntax rule 6b
forbids a framing spec on lead/lag, which makes sense if these functions
ignore the frame and no sense otherwise.

I also realized after more thought that the way to get frame = partition
with a default frame spec is to omit any ORDER BY in the window spec.
So my concern about LAST_VALUE being useless without framing ability is
unfounded, and I withdraw the complaint that we need to put in some
minimal framing features.  But we'll have to be careful to document all
this properly.  Also, it does seem that it might be worthwhile to try to
ensure that combinations of windows that have the same PARTITION list
and empty vs nonempty ORDER BY get optimized well.  Right now it's
dependent on ordering of the WindowClause whether you pay an extra sort
for that case or not.
        regards, tom lane


Re: Frames vs partitions: is SQL2008 completely insane?

От
Peter Eisentraut
Дата:
On Saturday 27 December 2008 20:32:10 Ron Mayer wrote:
> ISTM ISO should hire you guys (or the postgres project as a whole)
> to proof-read their specs before they publish them.

The way it really works though, effectively, is that vendors hire ISO to 
publish their specs.

Having a few inconsistencies in 2000 pages of language specification with 20 
years of legacy around it isn't so bad IMO, considering that there are really 
only a handful of guys working on this with any intensity.

If we cared enough, we could submit these sorts of issues to the committee for 
clarification or correction.  If anyone is convinced enough about this 
particular case, I can try to relay it and see what happens.


Re: Frames vs partitions: is SQL2008 completely insane?

От
Bruce Momjian
Дата:
Tom Lane wrote:
> According to SQL2008 section 7.11 <window clause>, general rule 5, the
> default definition of window framing in a window that has an ordering
> clause but no framing (RANGE/ROWS) clause is that the window frame for
> a given row R runs from the first row of its partition through the last
> peer of R.
> 
> Section 6.10's general rules define the results of LEAD, LAG,
> FIRST_VALUE, LAST_VALUE, NTH_VALUE in terms of the rows available in the
> window frame of the current window, not its partition.
> 
> Meanwhile, section 6.10 <window function> syntax rule 6 says that
> LEAD/LAG must use a window that has an ordering clause and no
> framing clause.
> 
> This means that without an explicit framing clause, none of these
> functions can "look beyond" the last peer of the current row; and
> what's worse, LEAD/LAG seem to be explicitly forbidden from looking
> further than that even if we had an implementation of framing clauses.
> 
> This seems to be less than sane.  I would certainly expect that LEAD(x)
> gives you the next value of x regardless of peer-row status, since
> LAG(x) gives you the prior value of x regardless of peer row status.
> It is also simply bizarre for FIRST_VALUE to give you the partition's
> first row when LAST_VALUE doesn't give you the partition's last row.
> 
> Are there any errata for SQL2008 yet?  Can anyone check the actual
> behavior of DB2 or other DBMS's that claim to implement these functions?
> 
> I notice that the current patch code seems to implement
> first/last/nth_value using the frame, but lead/lag using the partition,
> which doesn't conform to spec AFAICS ... but lead/lag on the frame
> doesn't actually appear to be a useful definition so I'd rather go
> with that than with what the letter of the spec seems to say.
> 
> Lastly, for a simple aggregate used with an OVER clause, the current
> patch seems to define the aggregate as being taken over the frame
> rather than the partition, but I cannot find anything in SQL2008 that
> lends any support to *either* definition.
> 
> Comments?  This all seems rather badly broken.

Was this dealt with?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Frames vs partitions: is SQL2008 completely insane?

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> Was this dealt with?

Yes.
        regards, tom lane